On Mon, Mar 26, 2012 at 12:58 PM, Dirk Weissenborn <
[email protected]> wrote:

> Thank you for the quick response! It is possible that I need it in not too
> far future maybe I ll implement on top what already exists, which should
> not be that hard as you mentioned. I ll provide a patch when the time
> comes.
>

Feel free to email any questions about using the
InMemoryCollapsedVariationalBayes0
class - it's mainly been used for testing, so far, but if you want to take
that class
and clean it up and look into fixing the online learning aspect of it,
that'd be
excellent.  Let me know if you make any progress, because I'll probably be
looking
to work on this at some point as well, but I won't if you're already
working on it. :)


>
> 2012/3/26 Jake Mannix <[email protected]>
>
> > Hi Dirk,
> >
> >  This has not been implemented in Mahout, but the version of map-reduce
> > (batch)-learned
> > LDA which is done via (approximate+collapsed-) variational bayes [1] is
> > reasonably easily
> > modifiable to the methods in this paper, as the LDA learner we currently
> do
> > via iterative
> > MR passes is essentially an ensemble learner: each subset of the data
> > partially trains a
> > full LDA model starting from the aggregate (summed) counts of all of the
> > data from
> > previous iterations (see essentially the method named "approximately
> > distributed LDA" /
> > AD-LDA in Ref-[2]).
> >
> >  The method in the paper you refer to turns traditional VB (the slower,
> > uncollapsed kind,
> > with the nasty digamma functions all over the place) into a streaming
> > learner, by accreting
> > the word-counts of each document onto the model you're using for
> inference
> > on the next
> > documents.  The same exact idea can be done on the CVB0 inference
> > technique, almost
> > without change - as VB differs from CVB0 only in the E-step, not the
> > M-step.
> >
> >  The problem which comes up when I've considered doing this kind of thing
> > in the past
> > is that if you do this in a distributed fashion, each member of the
> > ensemble starts learning
> > different topics simultaneously, and then the merge gets trickier.  You
> can
> > avoid this by
> > doing some of the techniques mentioned in [2] for HDP, where you swap
> > topic-ids on
> > merge to make sure they match up, but I haven't investigated that very
> > thoroughly.  The
> > other way to avoid this problem is to use the parameter denoted \rho_t in
> > Hoffman et al -
> > this parameter is telling us how much to weight the model as it was up
> > until now, against
> > the updates from the latest document (alternatively, how much to "decay"
> > previous
> > documents).  If you don't let the topics drift *too much* during parallel
> > learning, you could
> > probably make sure that they match up just fine on each merge, while
> still
> > speeding up
> > the process faster than fully batch learning.
> >
> >  So yeah, this is a great idea, but getting it to work in a distributed
> > fashion is tricky.
> > In a non-distributed form, this idea is almost completely implemented in
> > the class
> > InMemoryCollapsedVariationalBayes0.  I say "almost" because it's
> > technically in
> > there already, as a parameter choice (initialModelCorpusFraction != 0),
> but
> > I don't
> > think it's working properly yet.  If you're interested in the problem,
> > playing with this
> > class would be a great place to start!
> >
> > Refrences:
> > 1)
> > http://eprints.pascal-network.org/archive/00006729/01/AsuWelSmy2009a.pdf
> > 2) http://www.csee.ogi.edu/~zak/cs506-pslc/dist_lda.pdf
> >
> > On Mon, Mar 26, 2012 at 11:54 AM, Dirk Weissenborn <
> > [email protected]> wrote:
> >
> > > Hello,
> > >
> > > I wanted to ask whether there is already an online learning algorithm
> > > implementation for lda or not?
> > >
> > > http://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010b.pdf
> > >
> > > cheers,
> > > Dirk
> > >
> >
> >
> >
> > --
> >
> >  -jake
> >
>



-- 

  -jake

Reply via email to