Have you read the lock free hog wild paper? Just sgd with multiple threads and don't be afraid of memory stomps. It works faster than batch On Mar 15, 2012 2:32 PM, "Dmitriy Lyubimov" <[email protected]> wrote:
> We already discussed the paper before. In fact, i had exactly same > idea for partitioning the factorization task (something the authors > call "stratified" sgd ) with stochastic learners before i ever saw > this paper. > > I personally lost interest in this approach even before i read the > paper because the way i understood it at that time, it would have > required at least as many MR restarts with data exchange as there's a > degree of parallelism and consequently just as many data passes. In > framework of Mahout it is also difficult because Mahout doesn't > support blocking out of the box for its DRM format so an additional > job may be required to pre-block the data the way they want to process > it --or-- we have to run over 100% of it during each restart, instead > of a fraction if it. > > All in all, my speculation was there were little chances that this > approach would provide a win over ALS techniques with restarts that we > currently already have with a mid to high degree of parallelization > (say 50 way parallelization and on). > > But honestly i would be happy to be wrong because I did not understand > some of the work or did not see some of the optimizations suggested. I > would be especially happy if it could beat our current ALS WR with a > meaningful margin on bigger data. > > -d > > On Sat, Jan 14, 2012 at 9:45 AM, Zeno Gantner <[email protected]> > wrote: > > Hi list, > > > > I was talking to Isabel Drost in December, and we talked about a nice > > paper from last year's KDD conference that suggests a neat trick that > > allows doing SGD for matrix factorization in parallel. > > > > She said this would be interesting for some of you here. > > > > Here is the paper: > > http://www.mpi-inf.mpg.de/~rgemulla/publications/gemulla11dsgd.pdf > > > > Note that the authors themselves implemented it already in Hadoop. > > > > Maybe someone would like to pick this up. > > > > I am still trying to find my way around the Mahout/Taste source code, > > so do not expect anything from me too soon ;-) > > > > Best regards, > > Zeno >
