2011/9/25 Mathieu Blondel <[email protected]>: > On Sun, Sep 25, 2011 at 7:05 PM, Lars Buitinck <[email protected]> wrote: > > That seems very similar to Kamal Nigam's semi-supervised Naive-Bayes.
That's right. The first difference is the initialization, where Nigam starts from a labeled set containing all classes, while Liu initially assumes the unlabeled set contains the negative examples. The second difference is convergence, see below. > In theory, I think that EM guarantees convergence in likelihood but > not in parameters or probabilities. In practice, I don't know > (monitoring likelihood is slow though...). Here's a related post on > the LingPipe blog (especially the comments at the end): > > http://lingpipe-blog.com/2011/01/04/monitoring-convergence-of-em-for-map-estimates-with-priors/ This is news for me. However, Liu (and I believe Nigam, in an earlier paper) checks convergence based on the parameters, so apparently this is good enough. And [thinking out loud] the prediction probabilities of NB would converge iff the parameters converge, right? I could of course restrict the algo to linear classifiers, if need be. -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2dcopy2 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
