Hi Dirk, This has not been implemented in Mahout, but the version of map-reduce (batch)-learned LDA which is done via (approximate+collapsed-) variational bayes [1] is reasonably easily modifiable to the methods in this paper, as the LDA learner we currently do via iterative MR passes is essentially an ensemble learner: each subset of the data partially trains a full LDA model starting from the aggregate (summed) counts of all of the data from previous iterations (see essentially the method named "approximately distributed LDA" / AD-LDA in Ref-[2]).
The method in the paper you refer to turns traditional VB (the slower, uncollapsed kind, with the nasty digamma functions all over the place) into a streaming learner, by accreting the word-counts of each document onto the model you're using for inference on the next documents. The same exact idea can be done on the CVB0 inference technique, almost without change - as VB differs from CVB0 only in the E-step, not the M-step. The problem which comes up when I've considered doing this kind of thing in the past is that if you do this in a distributed fashion, each member of the ensemble starts learning different topics simultaneously, and then the merge gets trickier. You can avoid this by doing some of the techniques mentioned in [2] for HDP, where you swap topic-ids on merge to make sure they match up, but I haven't investigated that very thoroughly. The other way to avoid this problem is to use the parameter denoted \rho_t in Hoffman et al - this parameter is telling us how much to weight the model as it was up until now, against the updates from the latest document (alternatively, how much to "decay" previous documents). If you don't let the topics drift *too much* during parallel learning, you could probably make sure that they match up just fine on each merge, while still speeding up the process faster than fully batch learning. So yeah, this is a great idea, but getting it to work in a distributed fashion is tricky. In a non-distributed form, this idea is almost completely implemented in the class InMemoryCollapsedVariationalBayes0. I say "almost" because it's technically in there already, as a parameter choice (initialModelCorpusFraction != 0), but I don't think it's working properly yet. If you're interested in the problem, playing with this class would be a great place to start! Refrences: 1) http://eprints.pascal-network.org/archive/00006729/01/AsuWelSmy2009a.pdf 2) http://www.csee.ogi.edu/~zak/cs506-pslc/dist_lda.pdf On Mon, Mar 26, 2012 at 11:54 AM, Dirk Weissenborn < [email protected]> wrote: > Hello, > > I wanted to ask whether there is already an online learning algorithm > implementation for lda or not? > > http://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010b.pdf > > cheers, > Dirk > -- -jake
