Have you read the lock free hog wild paper? Just sgd with multiple threads
and don't be afraid of memory stomps. It works faster than batch
On Mar 15, 2012 2:32 PM, "Dmitriy Lyubimov" <[email protected]> wrote:

> We already discussed the paper before. In fact, i had exactly same
> idea for partitioning the factorization task (something the authors
> call "stratified" sgd ) with stochastic learners before i ever saw
> this paper.
>
> I personally lost interest in this approach even before i read the
> paper because the way i understood it at that time, it would have
> required at least as many MR restarts with data exchange as there's a
> degree of parallelism and consequently just as many data passes. In
> framework of Mahout it is also difficult because Mahout doesn't
> support blocking out of the box for its DRM format so an additional
> job may be required to pre-block the data the way they want to process
> it --or-- we have to run over 100% of it during each restart, instead
> of a fraction if it.
>
> All in all, my speculation was there were little chances that this
> approach would provide a win over ALS techniques with restarts that we
> currently already have with a mid to high degree of parallelization
> (say 50 way parallelization and on).
>
> But honestly i would be happy to be wrong because I did not understand
> some of the work or did not see some of the optimizations suggested. I
> would be especially happy if it could beat our current ALS WR with a
> meaningful margin on bigger data.
>
> -d
>
> On Sat, Jan 14, 2012 at 9:45 AM, Zeno Gantner <[email protected]>
> wrote:
> > Hi list,
> >
> > I was talking to Isabel Drost in December, and we talked about a nice
> > paper from last year's KDD conference that suggests a neat trick that
> > allows doing SGD for matrix factorization in parallel.
> >
> > She said this would be interesting for some of you here.
> >
> > Here is the paper:
> > http://www.mpi-inf.mpg.de/~rgemulla/publications/gemulla11dsgd.pdf
> >
> > Note that the authors themselves implemented it already in Hadoop.
> >
> > Maybe someone would like to pick this up.
> >
> > I am still trying to find my way around the Mahout/Taste source code,
> > so do not expect anything from me too soon ;-)
> >
> > Best regards,
> >  Zeno
>

Reply via email to