Hi Dirk,

  This has not been implemented in Mahout, but the version of map-reduce
(batch)-learned
LDA which is done via (approximate+collapsed-) variational bayes [1] is
reasonably easily
modifiable to the methods in this paper, as the LDA learner we currently do
via iterative
MR passes is essentially an ensemble learner: each subset of the data
partially trains a
full LDA model starting from the aggregate (summed) counts of all of the
data from
previous iterations (see essentially the method named "approximately
distributed LDA" /
AD-LDA in Ref-[2]).

  The method in the paper you refer to turns traditional VB (the slower,
uncollapsed kind,
with the nasty digamma functions all over the place) into a streaming
learner, by accreting
the word-counts of each document onto the model you're using for inference
on the next
documents.  The same exact idea can be done on the CVB0 inference
technique, almost
without change - as VB differs from CVB0 only in the E-step, not the M-step.

  The problem which comes up when I've considered doing this kind of thing
in the past
is that if you do this in a distributed fashion, each member of the
ensemble starts learning
different topics simultaneously, and then the merge gets trickier.  You can
avoid this by
doing some of the techniques mentioned in [2] for HDP, where you swap
topic-ids on
merge to make sure they match up, but I haven't investigated that very
thoroughly.  The
other way to avoid this problem is to use the parameter denoted \rho_t in
Hoffman et al -
this parameter is telling us how much to weight the model as it was up
until now, against
the updates from the latest document (alternatively, how much to "decay"
previous
documents).  If you don't let the topics drift *too much* during parallel
learning, you could
probably make sure that they match up just fine on each merge, while still
speeding up
the process faster than fully batch learning.

  So yeah, this is a great idea, but getting it to work in a distributed
fashion is tricky.
In a non-distributed form, this idea is almost completely implemented in
the class
InMemoryCollapsedVariationalBayes0.  I say "almost" because it's
technically in
there already, as a parameter choice (initialModelCorpusFraction != 0), but
I don't
think it's working properly yet.  If you're interested in the problem,
playing with this
class would be a great place to start!

Refrences:
1) http://eprints.pascal-network.org/archive/00006729/01/AsuWelSmy2009a.pdf
2) http://www.csee.ogi.edu/~zak/cs506-pslc/dist_lda.pdf

On Mon, Mar 26, 2012 at 11:54 AM, Dirk Weissenborn <
[email protected]> wrote:

> Hello,
>
> I wanted to ask whether there is already an online learning algorithm
> implementation for lda or not?
>
> http://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010b.pdf
>
> cheers,
> Dirk
>



-- 

  -jake

Reply via email to