Thank you for the quick response! It is possible that I need it in not too far future maybe I ll implement on top what already exists, which should not be that hard as you mentioned. I ll provide a patch when the time comes.
2012/3/26 Jake Mannix <[email protected]> > Hi Dirk, > > This has not been implemented in Mahout, but the version of map-reduce > (batch)-learned > LDA which is done via (approximate+collapsed-) variational bayes [1] is > reasonably easily > modifiable to the methods in this paper, as the LDA learner we currently do > via iterative > MR passes is essentially an ensemble learner: each subset of the data > partially trains a > full LDA model starting from the aggregate (summed) counts of all of the > data from > previous iterations (see essentially the method named "approximately > distributed LDA" / > AD-LDA in Ref-[2]). > > The method in the paper you refer to turns traditional VB (the slower, > uncollapsed kind, > with the nasty digamma functions all over the place) into a streaming > learner, by accreting > the word-counts of each document onto the model you're using for inference > on the next > documents. The same exact idea can be done on the CVB0 inference > technique, almost > without change - as VB differs from CVB0 only in the E-step, not the > M-step. > > The problem which comes up when I've considered doing this kind of thing > in the past > is that if you do this in a distributed fashion, each member of the > ensemble starts learning > different topics simultaneously, and then the merge gets trickier. You can > avoid this by > doing some of the techniques mentioned in [2] for HDP, where you swap > topic-ids on > merge to make sure they match up, but I haven't investigated that very > thoroughly. The > other way to avoid this problem is to use the parameter denoted \rho_t in > Hoffman et al - > this parameter is telling us how much to weight the model as it was up > until now, against > the updates from the latest document (alternatively, how much to "decay" > previous > documents). If you don't let the topics drift *too much* during parallel > learning, you could > probably make sure that they match up just fine on each merge, while still > speeding up > the process faster than fully batch learning. > > So yeah, this is a great idea, but getting it to work in a distributed > fashion is tricky. > In a non-distributed form, this idea is almost completely implemented in > the class > InMemoryCollapsedVariationalBayes0. I say "almost" because it's > technically in > there already, as a parameter choice (initialModelCorpusFraction != 0), but > I don't > think it's working properly yet. If you're interested in the problem, > playing with this > class would be a great place to start! > > Refrences: > 1) > http://eprints.pascal-network.org/archive/00006729/01/AsuWelSmy2009a.pdf > 2) http://www.csee.ogi.edu/~zak/cs506-pslc/dist_lda.pdf > > On Mon, Mar 26, 2012 at 11:54 AM, Dirk Weissenborn < > [email protected]> wrote: > > > Hello, > > > > I wanted to ask whether there is already an online learning algorithm > > implementation for lda or not? > > > > http://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010b.pdf > > > > cheers, > > Dirk > > > > > > -- > > -jake >
