Re: online lda learning algorithm

Dirk Weissenborn Tue, 27 Mar 2012 07:30:54 -0700

I found this paper on supervised lda.
http://www.cs.princeton.edu/~blei/papers/BleiMcAuliffe2007.pdf


Could an implementation of supervised lda in your opinion be done easily
given the already existent implementation?

2012/3/27 Dirk Weissenborn <[email protected]>

> The fact that the dictionary stays fixed could be a problem for what I
> want to do. Would it be hard to change it in the code?
>
> I now of one kind of supervised lda, called disclda... do you know others?
>
> If the model is slightly changed (training on just few new inputs compared
> to the overall number of documents the model was already trained on), I
> think retraining a classifier should converge fast, taking the old model
> parameters as starting parameters, so I guess this probably would not be to
> much of a drawback.
> The reason I want to do this is, that lda works really good on sparse
> datasets for training, and it has been shown that a composition of lda and
> a classifier is much faster at the classification task, with similar
> results to tf-idf based classifiers.
>
> 2012/3/27 Jake Mannix <[email protected]>
>
>> On Mon, Mar 26, 2012 at 5:13 PM, Dirk Weissenborn <
>> [email protected]> wrote:
>>
>> > Ok thanks,
>> >
>> > maybe one question. Is it possible to train an already trained model on
>> > just a few new documents with the provided algorithms or do you have to
>> > train through the whole corpus again. What I mean is whether you can
>> train
>> > an lda model incrementally or not?
>> >
>>
>> So the answer to *this* above question is that yes, you can incrementally
>> train your LDA model with new documents, and in fact you can do this
>> with the current codebase, although it's not really documented how you
>> would
>> do it: we currently persist a full-fledged model in between passes over
>> the corpus (because we're doing MapReduce iteratively, this happens
>> naturally), and this model doesn't "know" what documents were used to
>> train it (its a big matrix of topic <-> feature counts, that's all), so
>> you
>> could
>> take a fully trained model, and then run the current LDA over it, using
>> new documents as input, and it will start learning from them.  Now, it
>> won't be able to learn with *new vocabulary* without changing the code
>> a little, however.  The dictionary stays fixed in the current codebase.
>>
>>
>> > What I would actually like to do is training a topical classifier on
>> top of
>> > a lda model. Do you have any experience with that? I mean by changing
>> the
>> > lda model, the inputs for the classifier would also change. Do I have to
>> > train a classifier from scratch again or can I reuse the classifier
>> trained
>> > on top of the older lda model and just ajust that one slightly?
>> >
>>
>> This is actually a rather different question, it seems.  Let me make sure
>> I'm understanding what you're asking:
>>
>> You are using the p(topic | doc) values for each doc as *input features*
>> for another classifier, right?   Trying to update your model
>> (which itself can be thought of as a fuzzy classifier consisting of
>> weights
>> p(feature | topic) and an inference algorithm to produce p(topic | doc)
>> if fed a document consisting of a weighted feature vector) and keeping
>> the p(topic | doc) vectors fixed will definitely be wrong - if the model
>> changes, these weights will need to be updated.  The only way
>> to do this is to run these documents against the model in some way.
>>
>> But you don't really want to do that, you want to update your secondary
>> classifier, after your input topics drift a bit upon having been updated.
>> I think this problem still remains, however: your secondary classifier
>> was trained on input features which have changed, so it most likely
>> needs to be retrained as well.
>>
>> If, on the other hand, you are training a joint classifier (like
>> Supervised LDA, or Labeled LDA), and you ran *this* in an online
>> mode, you could probably update your classifier continually as you
>> got new labeled training data to train on.  But I'm speculating at this
>> point. :)
>>
>>
>>
>> >
>> > 2012/3/27 Dirk Weissenborn <[email protected]>
>> >
>> > > no problem. I ll post it
>> > >
>> > >
>> > > 2012/3/27 Jake Mannix <[email protected]>
>> > >
>> > >> Hey Dirk,
>> > >>
>> > >>   Do you mind continuing this discussion on the mailing list?  Lots
>> of
>> > >> our users may ask this kind of question in the future...
>> > >>
>> > >> On Mon, Mar 26, 2012 at 3:36 PM, Dirk Weissenborn <
>> > >> [email protected]> wrote:
>> > >>
>> > >>> Ok thanks,
>> > >>>
>> > >>> maybe one question. Is it possible to train an already trained
>> model on
>> > >>> just a few new documents with the provided algorithms or do you
>> have to
>> > >>> train through the whole corpus again. What I mean is whether you can
>> > train
>> > >>> an lda model incrementally or not?
>> > >>> What I would actually like to do is training a topical classifier on
>> > top
>> > >>> of a lda model. Do you have any experience with that? I mean by
>> > changing
>> > >>> the lda model, the inputs for the classifier would also change. Do I
>> > have
>> > >>> to train a classifier from scratch again or can I reuse the
>> classifier
>> > >>> trained on top of the older lda model and just ajust that one?
>> > >>>
>> > >>>
>> > >>> 2012/3/26 Jake Mannix <[email protected]>
>> > >>>
>> > >>>> On Mon, Mar 26, 2012 at 12:58 PM, Dirk Weissenborn <
>> > >>>> [email protected]> wrote:
>> > >>>>
>> > >>>> > Thank you for the quick response! It is possible that I need it
>> in
>> > >>>> not too
>> > >>>> > far future maybe I ll implement on top what already exists, which
>> > >>>> should
>> > >>>> > not be that hard as you mentioned. I ll provide a patch when the
>> > time
>> > >>>> > comes.
>> > >>>> >
>> > >>>>
>> > >>>> Feel free to email any questions about using the
>> > >>>> InMemoryCollapsedVariationalBayes0
>> > >>>> class - it's mainly been used for testing, so far, but if you want
>> to
>> > >>>> take
>> > >>>> that class
>> > >>>> and clean it up and look into fixing the online learning aspect of
>> it,
>> > >>>> that'd be
>> > >>>> excellent.  Let me know if you make any progress, because I'll
>> > probably
>> > >>>> be
>> > >>>> looking
>> > >>>> to work on this at some point as well, but I won't if you're
>> already
>> > >>>> working on it. :)
>> > >>>>
>> > >>>>
>> > >>>> >
>> > >>>> > 2012/3/26 Jake Mannix <[email protected]>
>> > >>>> >
>> > >>>> > > Hi Dirk,
>> > >>>> > >
>> > >>>> > >  This has not been implemented in Mahout, but the version of
>> > >>>> map-reduce
>> > >>>> > > (batch)-learned
>> > >>>> > > LDA which is done via (approximate+collapsed-) variational
>> bayes
>> > >>>> [1] is
>> > >>>> > > reasonably easily
>> > >>>> > > modifiable to the methods in this paper, as the LDA learner we
>> > >>>> currently
>> > >>>> > do
>> > >>>> > > via iterative
>> > >>>> > > MR passes is essentially an ensemble learner: each subset of
>> the
>> > >>>> data
>> > >>>> > > partially trains a
>> > >>>> > > full LDA model starting from the aggregate (summed) counts of
>> all
>> > >>>> of the
>> > >>>> > > data from
>> > >>>> > > previous iterations (see essentially the method named
>> > "approximately
>> > >>>> > > distributed LDA" /
>> > >>>> > > AD-LDA in Ref-[2]).
>> > >>>> > >
>> > >>>> > >  The method in the paper you refer to turns traditional VB (the
>> > >>>> slower,
>> > >>>> > > uncollapsed kind,
>> > >>>> > > with the nasty digamma functions all over the place) into a
>> > >>>> streaming
>> > >>>> > > learner, by accreting
>> > >>>> > > the word-counts of each document onto the model you're using
>> for
>> > >>>> > inference
>> > >>>> > > on the next
>> > >>>> > > documents.  The same exact idea can be done on the CVB0
>> inference
>> > >>>> > > technique, almost
>> > >>>> > > without change - as VB differs from CVB0 only in the E-step,
>> not
>> > the
>> > >>>> > > M-step.
>> > >>>> > >
>> > >>>> > >  The problem which comes up when I've considered doing this
>> kind
>> > of
>> > >>>> thing
>> > >>>> > > in the past
>> > >>>> > > is that if you do this in a distributed fashion, each member of
>> > the
>> > >>>> > > ensemble starts learning
>> > >>>> > > different topics simultaneously, and then the merge gets
>> trickier.
>> > >>>>  You
>> > >>>> > can
>> > >>>> > > avoid this by
>> > >>>> > > doing some of the techniques mentioned in [2] for HDP, where
>> you
>> > >>>> swap
>> > >>>> > > topic-ids on
>> > >>>> > > merge to make sure they match up, but I haven't investigated
>> that
>> > >>>> very
>> > >>>> > > thoroughly.  The
>> > >>>> > > other way to avoid this problem is to use the parameter denoted
>> > >>>> \rho_t in
>> > >>>> > > Hoffman et al -
>> > >>>> > > this parameter is telling us how much to weight the model as it
>> > was
>> > >>>> up
>> > >>>> > > until now, against
>> > >>>> > > the updates from the latest document (alternatively, how much
>> to
>> > >>>> "decay"
>> > >>>> > > previous
>> > >>>> > > documents).  If you don't let the topics drift *too much*
>> during
>> > >>>> parallel
>> > >>>> > > learning, you could
>> > >>>> > > probably make sure that they match up just fine on each merge,
>> > while
>> > >>>> > still
>> > >>>> > > speeding up
>> > >>>> > > the process faster than fully batch learning.
>> > >>>> > >
>> > >>>> > >  So yeah, this is a great idea, but getting it to work in a
>> > >>>> distributed
>> > >>>> > > fashion is tricky.
>> > >>>> > > In a non-distributed form, this idea is almost completely
>> > >>>> implemented in
>> > >>>> > > the class
>> > >>>> > > InMemoryCollapsedVariationalBayes0.  I say "almost" because
>> it's
>> > >>>> > > technically in
>> > >>>> > > there already, as a parameter choice
>> (initialModelCorpusFraction
>> > !=
>> > >>>> 0),
>> > >>>> > but
>> > >>>> > > I don't
>> > >>>> > > think it's working properly yet.  If you're interested in the
>> > >>>> problem,
>> > >>>> > > playing with this
>> > >>>> > > class would be a great place to start!
>> > >>>> > >
>> > >>>> > > Refrences:
>> > >>>> > > 1)
>> > >>>> > >
>> > >>>>
>> >
>> http://eprints.pascal-network.org/archive/00006729/01/AsuWelSmy2009a.pdf
>> > >>>> > > 2) http://www.csee.ogi.edu/~zak/cs506-pslc/dist_lda.pdf
>> > >>>> > >
>> > >>>> > > On Mon, Mar 26, 2012 at 11:54 AM, Dirk Weissenborn <
>> > >>>> > > [email protected]> wrote:
>> > >>>> > >
>> > >>>> > > > Hello,
>> > >>>> > > >
>> > >>>> > > > I wanted to ask whether there is already an online learning
>> > >>>> algorithm
>> > >>>> > > > implementation for lda or not?
>> > >>>> > > >
>> > >>>> > > >
>> > http://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010b.pdf
>> > >>>> > > >
>> > >>>> > > > cheers,
>> > >>>> > > > Dirk
>> > >>>> > > >
>> > >>>> > >
>> > >>>> > >
>> > >>>> > >
>> > >>>> > > --
>> > >>>> > >
>> > >>>> > >  -jake
>> > >>>> > >
>> > >>>> >
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> --
>> > >>>>
>> > >>>>  -jake
>> > >>>>
>> > >>>
>> > >>>
>> > >>
>> > >>
>> > >> --
>> > >>
>> > >>   -jake
>> > >>
>> > >>
>> > >
>> >
>>
>>
>>
>> --
>>
>>  -jake
>>
>
>

Re: online lda learning algorithm

Reply via email to