2011/9/25 Mathieu Blondel <[email protected]>:
> On Sun, Sep 25, 2011 at 7:05 PM, Lars Buitinck <[email protected]> wrote:
>
> That seems very similar to Kamal Nigam's semi-supervised Naive-Bayes.

That's right. The first difference is the initialization, where Nigam
starts from a labeled set containing all classes, while Liu initially
assumes the unlabeled set contains the negative examples. The second
difference is convergence, see below.

> In theory, I think that EM guarantees convergence in likelihood but
> not in parameters or probabilities. In practice, I don't know
> (monitoring likelihood is slow though...). Here's a related post on
> the LingPipe blog (especially the comments at the end):
>
> http://lingpipe-blog.com/2011/01/04/monitoring-convergence-of-em-for-map-estimates-with-priors/

This is news for me. However, Liu (and I believe Nigam, in an earlier
paper) checks convergence based on the parameters, so apparently this
is good enough. And [thinking out loud] the prediction probabilities
of NB would converge iff the parameters converge, right?

I could of course restrict the algo to linear classifiers, if need be.

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to