2011/9/25 Mathieu Blondel <[email protected]>:
> Can you elaborate what you have in mind?
>
> You could require the user to implement the appropriate "score"
> function and use it to monitor convergence.

What I'm trying to do is PU learning: fitting a binary classifier on a
set of positive samples and a set of unlabeled samples; *no
negatives*. Liu et al. [1, 2, 3] have a method for this called I-EM
that first assumes all unlabeled examples are negative to fit an
initial classifier, then iteratively executes roughly the following
loop body (where unlabeled is a vector of indices and 1 denotes
positive):

    # E-step
    y_pred = clf.predict(X)
    y_posneg[unlabeled] = (y_pred == 1)[unlabeled]

    # M-step
    clf = clone(self.clf)
    clf.fit(X, y_posneg)

    if parameters_changed() < self.tol:
        break

My problem is in the parameters_changed() implementation. I could
inspect clf if it's linear, but that would (a) make the algorithm less
general and (b) would require a little bit work if it also has to
inspect the intercept.

My solution so far has been to check convergence by checking for
changes in predict_proba output, but I don't know if that's valid.

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to