Re: LDA for multi label classification was: Mahout Book

zhao zhendong Fri, 16 Oct 2009 04:09:25 -0700

I have seen the implementation of L-LDA using Java,
Stanford Topic Modeling Toolbox <http://nlp.stanford.edu/software/tmt/>
Does any one know whether they provide the source code or not?


Thanks,
Maxim
On Fri, Oct 16, 2009 at 12:39 PM, David Hall <d...@cs.berkeley.edu> wrote:

> Sorry, this slipped out of my inbox and I just found it!
>
> On Thu, Oct 8, 2009 at 12:05 PM, Robin Anil <robin.a...@gmail.com> wrote:
> > Posting to the dev list.
> > Great Paper Thanks!. Looks like L-LDA could be used to create some
> > interesting examples.
>
> Thanks!
>
> > The Paper shows L-LDA could be used to creating word-tag model for
> accurate
> > tag(s) prediction given a document of words. I will complete reading and
> > tell
> > How much work is need to transform/build on top of current LDA
> > implementation to L-LDA. any thoughts?
>
> Umm, cool! In the paper we used Gibbs sampling to do the inference,
> and the implementation in Mahout uses variational inference (because
> it distributes better). I don't see any obvious problems in terms of
> math, and so the rest is just fitting it in the system.
>
> I think a small amount of refactoring would be in order to make things
> more generic, and then it shouldn't be too hard to plug in. I'll add
> it to my list, but I'm swamped for quite some time.
>
> -- David
>
> > Robin
> > On Thu, Oct 8, 2009 at 11:50 PM, David Hall <d...@cs.berkeley.edu>
> wrote:
> >>
> >> The short answer is, that it probably won't help all that much. Naive
> >> Bayes is unreasonably good when you have enough data.
> >>
> >> The long answer is, I have a paper with Dan Ramage and Ramesh
> >> Nallapati that talks about how to do it.
> >>
> >> www.aclweb.org/anthology-new/D/D09/D09-1026.pdf
> >>
> >> In some sense, "Labeled-LDA" is a kind of Naive Bayes where you can
> >> have more than one class per document. If you have exactly one class
> >> per document, then LDA reduces to Naive Bayes (or the unsupervised
> >> variant of naive bayes which is basically k-means in multinomial
> >> space). If instead you wanted to project W words to K topics, with K >
> >> numWords, then there is something to do...
> >>
> >> That something is something like:
> >>
> >> 1) get p(topic|word,document) for each word in each document (which is
> >> output by LDAInference). Those are your expected counts for each
> >> topic.
> >>
> >> 2)For each class, do something like:
> >> p(topic|class) \propto  \sum_{document with that class,word}
> >> p(topic|word,document)
> >>
> >> Then just apply bayes rule to do classification:
> >>
> >> p(class|topics,document) \propto p(class) \prod p(topic|class,document)
> >>
> >> -- David
> >>
> >> On Thu, Oct 8, 2009 at 11:07 AM, Robin Anil <robin.a...@gmail.com>
> wrote:
> >> > Thanks. Didnt see that, Fixed it!.
> >> > I have a query
> >> > How is the LDA topic model used to improve a classifier. Say Naive
> >> > Bayes? If
> >> > its possible, then I would like to integrate it into mahout.
> >> > Given m classes and the associated documents, One can build m topic
> >> > models
> >> > right. (set of topics(words) under each label and the associated
> >> > probability
> >> > distribution of words).
> >> > How can i use that info weight the most relevant topic of a class ?
> >> >
> >> >
> >>
> >> >> LDA has two meanings: linear discriminant analysis and latent
> >> >> dirichlet allocation. My code is the latter. The former is a kind of
> >> >> classification. You say linear discriminant analysis in the outline.
> >> >>
> >>
> >
> >
>



-- 
-------------------------------------------------------------

Zhen-Dong Zhao (Maxim)

<><<><><><><><><><>><><><><><>>>>>>

Department of Computer Science
School of Computing
National University of Singapore

><><><><><><><><><><><><><><><><<<<
Homepage:http://zhaozhendong.googlepages.com
Mail: zhaozhend...@gmail.com
>>>>>>><><><><><><><><<><>><><<<<<<

Re: LDA for multi label classification was: Mahout Book

Reply via email to