Re: online lda learning algorithm

Dirk Weissenborn Mon, 26 Mar 2012 17:14:34 -0700

Ok thanks,

maybe one question. Is it possible to train an already trained model on
just a few new documents with the provided algorithms or do you have to
train through the whole corpus again. What I mean is whether you can train
an lda model incrementally or not?
What I would actually like to do is training a topical classifier on top of
a lda model. Do you have any experience with that? I mean by changing the
lda model, the inputs for the classifier would also change. Do I have to
train a classifier from scratch again or can I reuse the classifier trained
on top of the older lda model and just ajust that one slightly?


2012/3/27 Dirk Weissenborn <[email protected]>

> no problem. I ll post it
>
>
> 2012/3/27 Jake Mannix <[email protected]>
>
>> Hey Dirk,
>>
>>   Do you mind continuing this discussion on the mailing list?  Lots of
>> our users may ask this kind of question in the future...
>>
>> On Mon, Mar 26, 2012 at 3:36 PM, Dirk Weissenborn <
>> [email protected]> wrote:
>>
>>> Ok thanks,
>>>
>>> maybe one question. Is it possible to train an already trained model on
>>> just a few new documents with the provided algorithms or do you have to
>>> train through the whole corpus again. What I mean is whether you can train
>>> an lda model incrementally or not?
>>> What I would actually like to do is training a topical classifier on top
>>> of a lda model. Do you have any experience with that? I mean by changing
>>> the lda model, the inputs for the classifier would also change. Do I have
>>> to train a classifier from scratch again or can I reuse the classifier
>>> trained on top of the older lda model and just ajust that one?
>>>
>>>
>>> 2012/3/26 Jake Mannix <[email protected]>
>>>
>>>> On Mon, Mar 26, 2012 at 12:58 PM, Dirk Weissenborn <
>>>> [email protected]> wrote:
>>>>
>>>> > Thank you for the quick response! It is possible that I need it in
>>>> not too
>>>> > far future maybe I ll implement on top what already exists, which
>>>> should
>>>> > not be that hard as you mentioned. I ll provide a patch when the time
>>>> > comes.
>>>> >
>>>>
>>>> Feel free to email any questions about using the
>>>> InMemoryCollapsedVariationalBayes0
>>>> class - it's mainly been used for testing, so far, but if you want to
>>>> take
>>>> that class
>>>> and clean it up and look into fixing the online learning aspect of it,
>>>> that'd be
>>>> excellent.  Let me know if you make any progress, because I'll probably
>>>> be
>>>> looking
>>>> to work on this at some point as well, but I won't if you're already
>>>> working on it. :)
>>>>
>>>>
>>>> >
>>>> > 2012/3/26 Jake Mannix <[email protected]>
>>>> >
>>>> > > Hi Dirk,
>>>> > >
>>>> > >  This has not been implemented in Mahout, but the version of
>>>> map-reduce
>>>> > > (batch)-learned
>>>> > > LDA which is done via (approximate+collapsed-) variational bayes
>>>> [1] is
>>>> > > reasonably easily
>>>> > > modifiable to the methods in this paper, as the LDA learner we
>>>> currently
>>>> > do
>>>> > > via iterative
>>>> > > MR passes is essentially an ensemble learner: each subset of the
>>>> data
>>>> > > partially trains a
>>>> > > full LDA model starting from the aggregate (summed) counts of all
>>>> of the
>>>> > > data from
>>>> > > previous iterations (see essentially the method named "approximately
>>>> > > distributed LDA" /
>>>> > > AD-LDA in Ref-[2]).
>>>> > >
>>>> > >  The method in the paper you refer to turns traditional VB (the
>>>> slower,
>>>> > > uncollapsed kind,
>>>> > > with the nasty digamma functions all over the place) into a
>>>> streaming
>>>> > > learner, by accreting
>>>> > > the word-counts of each document onto the model you're using for
>>>> > inference
>>>> > > on the next
>>>> > > documents.  The same exact idea can be done on the CVB0 inference
>>>> > > technique, almost
>>>> > > without change - as VB differs from CVB0 only in the E-step, not the
>>>> > > M-step.
>>>> > >
>>>> > >  The problem which comes up when I've considered doing this kind of
>>>> thing
>>>> > > in the past
>>>> > > is that if you do this in a distributed fashion, each member of the
>>>> > > ensemble starts learning
>>>> > > different topics simultaneously, and then the merge gets trickier.
>>>>  You
>>>> > can
>>>> > > avoid this by
>>>> > > doing some of the techniques mentioned in [2] for HDP, where you
>>>> swap
>>>> > > topic-ids on
>>>> > > merge to make sure they match up, but I haven't investigated that
>>>> very
>>>> > > thoroughly.  The
>>>> > > other way to avoid this problem is to use the parameter denoted
>>>> \rho_t in
>>>> > > Hoffman et al -
>>>> > > this parameter is telling us how much to weight the model as it was
>>>> up
>>>> > > until now, against
>>>> > > the updates from the latest document (alternatively, how much to
>>>> "decay"
>>>> > > previous
>>>> > > documents).  If you don't let the topics drift *too much* during
>>>> parallel
>>>> > > learning, you could
>>>> > > probably make sure that they match up just fine on each merge, while
>>>> > still
>>>> > > speeding up
>>>> > > the process faster than fully batch learning.
>>>> > >
>>>> > >  So yeah, this is a great idea, but getting it to work in a
>>>> distributed
>>>> > > fashion is tricky.
>>>> > > In a non-distributed form, this idea is almost completely
>>>> implemented in
>>>> > > the class
>>>> > > InMemoryCollapsedVariationalBayes0.  I say "almost" because it's
>>>> > > technically in
>>>> > > there already, as a parameter choice (initialModelCorpusFraction !=
>>>> 0),
>>>> > but
>>>> > > I don't
>>>> > > think it's working properly yet.  If you're interested in the
>>>> problem,
>>>> > > playing with this
>>>> > > class would be a great place to start!
>>>> > >
>>>> > > Refrences:
>>>> > > 1)
>>>> > >
>>>> http://eprints.pascal-network.org/archive/00006729/01/AsuWelSmy2009a.pdf
>>>> > > 2) http://www.csee.ogi.edu/~zak/cs506-pslc/dist_lda.pdf
>>>> > >
>>>> > > On Mon, Mar 26, 2012 at 11:54 AM, Dirk Weissenborn <
>>>> > > [email protected]> wrote:
>>>> > >
>>>> > > > Hello,
>>>> > > >
>>>> > > > I wanted to ask whether there is already an online learning
>>>> algorithm
>>>> > > > implementation for lda or not?
>>>> > > >
>>>> > > > http://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010b.pdf
>>>> > > >
>>>> > > > cheers,
>>>> > > > Dirk
>>>> > > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > --
>>>> > >
>>>> > >  -jake
>>>> > >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>  -jake
>>>>
>>>
>>>
>>
>>
>> --
>>
>>   -jake
>>
>>
>

Re: online lda learning algorithm

Reply via email to