Re: online lda learning algorithm

Dirk Weissenborn Mon, 26 Mar 2012 12:59:33 -0700

Thank you for the quick response! It is possible that I need it in not too
far future maybe I ll implement on top what already exists, which should
not be that hard as you mentioned. I ll provide a patch when the time comes.


2012/3/26 Jake Mannix <[email protected]>

> Hi Dirk,
>
>  This has not been implemented in Mahout, but the version of map-reduce
> (batch)-learned
> LDA which is done via (approximate+collapsed-) variational bayes [1] is
> reasonably easily
> modifiable to the methods in this paper, as the LDA learner we currently do
> via iterative
> MR passes is essentially an ensemble learner: each subset of the data
> partially trains a
> full LDA model starting from the aggregate (summed) counts of all of the
> data from
> previous iterations (see essentially the method named "approximately
> distributed LDA" /
> AD-LDA in Ref-[2]).
>
>  The method in the paper you refer to turns traditional VB (the slower,
> uncollapsed kind,
> with the nasty digamma functions all over the place) into a streaming
> learner, by accreting
> the word-counts of each document onto the model you're using for inference
> on the next
> documents.  The same exact idea can be done on the CVB0 inference
> technique, almost
> without change - as VB differs from CVB0 only in the E-step, not the
> M-step.
>
>  The problem which comes up when I've considered doing this kind of thing
> in the past
> is that if you do this in a distributed fashion, each member of the
> ensemble starts learning
> different topics simultaneously, and then the merge gets trickier.  You can
> avoid this by
> doing some of the techniques mentioned in [2] for HDP, where you swap
> topic-ids on
> merge to make sure they match up, but I haven't investigated that very
> thoroughly.  The
> other way to avoid this problem is to use the parameter denoted \rho_t in
> Hoffman et al -
> this parameter is telling us how much to weight the model as it was up
> until now, against
> the updates from the latest document (alternatively, how much to "decay"
> previous
> documents).  If you don't let the topics drift *too much* during parallel
> learning, you could
> probably make sure that they match up just fine on each merge, while still
> speeding up
> the process faster than fully batch learning.
>
>  So yeah, this is a great idea, but getting it to work in a distributed
> fashion is tricky.
> In a non-distributed form, this idea is almost completely implemented in
> the class
> InMemoryCollapsedVariationalBayes0.  I say "almost" because it's
> technically in
> there already, as a parameter choice (initialModelCorpusFraction != 0), but
> I don't
> think it's working properly yet.  If you're interested in the problem,
> playing with this
> class would be a great place to start!
>
> Refrences:
> 1)
> http://eprints.pascal-network.org/archive/00006729/01/AsuWelSmy2009a.pdf
> 2) http://www.csee.ogi.edu/~zak/cs506-pslc/dist_lda.pdf
>
> On Mon, Mar 26, 2012 at 11:54 AM, Dirk Weissenborn <
> [email protected]> wrote:
>
> > Hello,
> >
> > I wanted to ask whether there is already an online learning algorithm
> > implementation for lda or not?
> >
> > http://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010b.pdf
> >
> > cheers,
> > Dirk
> >
>
>
>
> --
>
>  -jake
>

Re: online lda learning algorithm

Reply via email to