LDA for multi label classification was: Mahout Book

Robin Anil Thu, 08 Oct 2009 12:06:40 -0700

Posting to the dev list.

Great Paper Thanks!. Looks like L-LDA could be used to create some
interesting examples.
The Paper shows L-LDA could be used to creating word-tag model for accurate
tag(s) prediction given a document of words. I will complete reading and
tell


How much work is need to transform/build on top of current LDA
implementation to L-LDA. any thoughts?

Robin

On Thu, Oct 8, 2009 at 11:50 PM, David Hall <d...@cs.berkeley.edu> wrote:

> The short answer is, that it probably won't help all that much. Naive
> Bayes is unreasonably good when you have enough data.
>
> The long answer is, I have a paper with Dan Ramage and Ramesh
> Nallapati that talks about how to do it.
>
> www.aclweb.org/anthology-new/D/D09/D09-1026.pdf
>
> In some sense, "Labeled-LDA" is a kind of Naive Bayes where you can
> have more than one class per document. If you have exactly one class
> per document, then LDA reduces to Naive Bayes (or the unsupervised
> variant of naive bayes which is basically k-means in multinomial
> space). If instead you wanted to project W words to K topics, with K >
> numWords, then there is something to do...
>
> That something is something like:
>
> 1) get p(topic|word,document) for each word in each document (which is
> output by LDAInference). Those are your expected counts for each
> topic.
>
> 2)For each class, do something like:
> p(topic|class) \propto  \sum_{document with that class,word}
> p(topic|word,document)
>
> Then just apply bayes rule to do classification:
>
> p(class|topics,document) \propto p(class) \prod p(topic|class,document)
>
> -- David
>
> On Thu, Oct 8, 2009 at 11:07 AM, Robin Anil <robin.a...@gmail.com> wrote:
> > Thanks. Didnt see that, Fixed it!.
> > I have a query
> > How is the LDA topic model used to improve a classifier. Say Naive Bayes?
> If
> > its possible, then I would like to integrate it into mahout.
> > Given m classes and the associated documents, One can build m topic
> models
> > right. (set of topics(words) under each label and the associated
> probability
> > distribution of words).
> > How can i use that info weight the most relevant topic of a class ?
> >
> >
>
> >> LDA has two meanings: linear discriminant analysis and latent
> >> dirichlet allocation. My code is the latter. The former is a kind of
> >> classification. You say linear discriminant analysis in the outline.
> >>
>
>

LDA for multi label classification was: Mahout Book

Reply via email to