Re: LDA for multi label classification was: Mahout Book

David Hall Fri, 16 Oct 2009 12:12:08 -0700

On Fri, Oct 16, 2009 at 4:08 AM, zhao zhendong <zhaozhend...@gmail.com> wrote:
> I have seen the implementation of L-LDA using Java,
> Stanford Topic Modeling Toolbox <http://nlp.stanford.edu/software/tmt/>
> Does any one know whether they provide the source code or not?


I'm pretty sure it's scala, no? It's definitely open source. Like I
said, however, this implementation is almost certainly Gibbs sampling
based, which has consequences for parallelization (or rather, the
Rao-Blackwellization does.)

-- David
>
> Thanks,
> Maxim
> On Fri, Oct 16, 2009 at 12:39 PM, David Hall <d...@cs.berkeley.edu> wrote:
>
>> Sorry, this slipped out of my inbox and I just found it!
>>
>> On Thu, Oct 8, 2009 at 12:05 PM, Robin Anil <robin.a...@gmail.com> wrote:
>> > Posting to the dev list.
>> > Great Paper Thanks!. Looks like L-LDA could be used to create some
>> > interesting examples.
>>
>> Thanks!
>>
>> > The Paper shows L-LDA could be used to creating word-tag model for
>> accurate
>> > tag(s) prediction given a document of words. I will complete reading and
>> > tell
>> > How much work is need to transform/build on top of current LDA
>> > implementation to L-LDA. any thoughts?
>>
>> Umm, cool! In the paper we used Gibbs sampling to do the inference,
>> and the implementation in Mahout uses variational inference (because
>> it distributes better). I don't see any obvious problems in terms of
>> math, and so the rest is just fitting it in the system.
>>
>> I think a small amount of refactoring would be in order to make things
>> more generic, and then it shouldn't be too hard to plug in. I'll add
>> it to my list, but I'm swamped for quite some time.
>>
>> -- David
>>
>> > Robin
>> > On Thu, Oct 8, 2009 at 11:50 PM, David Hall <d...@cs.berkeley.edu>
>> wrote:
>> >>
>> >> The short answer is, that it probably won't help all that much. Naive
>> >> Bayes is unreasonably good when you have enough data.
>> >>
>> >> The long answer is, I have a paper with Dan Ramage and Ramesh
>> >> Nallapati that talks about how to do it.
>> >>
>> >> www.aclweb.org/anthology-new/D/D09/D09-1026.pdf
>> >>
>> >> In some sense, "Labeled-LDA" is a kind of Naive Bayes where you can
>> >> have more than one class per document. If you have exactly one class
>> >> per document, then LDA reduces to Naive Bayes (or the unsupervised
>> >> variant of naive bayes which is basically k-means in multinomial
>> >> space). If instead you wanted to project W words to K topics, with K >
>> >> numWords, then there is something to do...
>> >>
>> >> That something is something like:
>> >>
>> >> 1) get p(topic|word,document) for each word in each document (which is
>> >> output by LDAInference). Those are your expected counts for each
>> >> topic.
>> >>
>> >> 2)For each class, do something like:
>> >> p(topic|class) \propto  \sum_{document with that class,word}
>> >> p(topic|word,document)
>> >>
>> >> Then just apply bayes rule to do classification:
>> >>
>> >> p(class|topics,document) \propto p(class) \prod p(topic|class,document)
>> >>
>> >> -- David
>> >>
>> >> On Thu, Oct 8, 2009 at 11:07 AM, Robin Anil <robin.a...@gmail.com>
>> wrote:
>> >> > Thanks. Didnt see that, Fixed it!.
>> >> > I have a query
>> >> > How is the LDA topic model used to improve a classifier. Say Naive
>> >> > Bayes? If
>> >> > its possible, then I would like to integrate it into mahout.
>> >> > Given m classes and the associated documents, One can build m topic
>> >> > models
>> >> > right. (set of topics(words) under each label and the associated
>> >> > probability
>> >> > distribution of words).
>> >> > How can i use that info weight the most relevant topic of a class ?
>> >> >
>> >> >
>> >>
>> >> >> LDA has two meanings: linear discriminant analysis and latent
>> >> >> dirichlet allocation. My code is the latter. The former is a kind of
>> >> >> classification. You say linear discriminant analysis in the outline.
>> >> >>
>> >>
>> >
>> >
>>
>
>
>
> --
> -------------------------------------------------------------
>
> Zhen-Dong Zhao (Maxim)
>
> <><<><><><><><><><>><><><><><>>>>>>
>
> Department of Computer Science
> School of Computing
> National University of Singapore
>
>><><><><><><><><><><><><><><><><<<<
> Homepage:http://zhaozhendong.googlepages.com
> Mail: zhaozhend...@gmail.com
>>>>>>>><><><><><><><><<><>><><<<<<<
>

Re: LDA for multi label classification was: Mahout Book

Reply via email to