Posting to the dev list. Great Paper Thanks!. Looks like L-LDA could be used to create some interesting examples. The Paper shows L-LDA could be used to creating word-tag model for accurate tag(s) prediction given a document of words. I will complete reading and tell
How much work is need to transform/build on top of current LDA implementation to L-LDA. any thoughts? Robin On Thu, Oct 8, 2009 at 11:50 PM, David Hall <d...@cs.berkeley.edu> wrote: > The short answer is, that it probably won't help all that much. Naive > Bayes is unreasonably good when you have enough data. > > The long answer is, I have a paper with Dan Ramage and Ramesh > Nallapati that talks about how to do it. > > www.aclweb.org/anthology-new/D/D09/D09-1026.pdf > > In some sense, "Labeled-LDA" is a kind of Naive Bayes where you can > have more than one class per document. If you have exactly one class > per document, then LDA reduces to Naive Bayes (or the unsupervised > variant of naive bayes which is basically k-means in multinomial > space). If instead you wanted to project W words to K topics, with K > > numWords, then there is something to do... > > That something is something like: > > 1) get p(topic|word,document) for each word in each document (which is > output by LDAInference). Those are your expected counts for each > topic. > > 2)For each class, do something like: > p(topic|class) \propto \sum_{document with that class,word} > p(topic|word,document) > > Then just apply bayes rule to do classification: > > p(class|topics,document) \propto p(class) \prod p(topic|class,document) > > -- David > > On Thu, Oct 8, 2009 at 11:07 AM, Robin Anil <robin.a...@gmail.com> wrote: > > Thanks. Didnt see that, Fixed it!. > > I have a query > > How is the LDA topic model used to improve a classifier. Say Naive Bayes? > If > > its possible, then I would like to integrate it into mahout. > > Given m classes and the associated documents, One can build m topic > models > > right. (set of topics(words) under each label and the associated > probability > > distribution of words). > > How can i use that info weight the most relevant topic of a class ? > > > > > > >> LDA has two meanings: linear discriminant analysis and latent > >> dirichlet allocation. My code is the latter. The former is a kind of > >> classification. You say linear discriminant analysis in the outline. > >> > >