Is "Hog Wild" expected to be faster on 2 processors are on 20? If it is intended for many-processor machines, that may be a useful addition. These days 8 cores is at the knee of the price-performance curve for low-end servers.
Are there gradient descent algorithms suitable for OpenCL GPU coding? GPU seems like a hole in the Mahout suite, and would be very sexy to summer-of-code projects. On Fri, Mar 16, 2012 at 11:21 AM, Dmitriy Lyubimov <[email protected]> wrote: > I meant specifically the MR stuff which is SSGD seems to be aimed at. > On a single CPU restarts or even simple CAS updates are not a problem > as in the paper you've mentioned. There's no extra cost associated > with them. I think Mahout's single node online SGD is already > SMP-parallelized (albeit it does so to figure best fit for reg rate on > a validation subset). as far as i remember. That's different from > parallelization suggested in wild hog algo, but as long as we believe > the work needs to be done and loads all cpus, there's probably not > much win to use this or that approach for smp programming which > essentially produce same quality result without meaningful improvement > margin. > > On Thu, Mar 15, 2012 at 7:54 PM, Hector Yee <[email protected]> wrote: >> Have you read the lock free hog wild paper? Just sgd with multiple threads >> and don't be afraid of memory stomps. It works faster than batch >> On Mar 15, 2012 2:32 PM, "Dmitriy Lyubimov" <[email protected]> wrote: >> >>> We already discussed the paper before. In fact, i had exactly same >>> idea for partitioning the factorization task (something the authors >>> call "stratified" sgd ) with stochastic learners before i ever saw >>> this paper. >>> >>> I personally lost interest in this approach even before i read the >>> paper because the way i understood it at that time, it would have >>> required at least as many MR restarts with data exchange as there's a >>> degree of parallelism and consequently just as many data passes. In >>> framework of Mahout it is also difficult because Mahout doesn't >>> support blocking out of the box for its DRM format so an additional >>> job may be required to pre-block the data the way they want to process >>> it --or-- we have to run over 100% of it during each restart, instead >>> of a fraction if it. >>> >>> All in all, my speculation was there were little chances that this >>> approach would provide a win over ALS techniques with restarts that we >>> currently already have with a mid to high degree of parallelization >>> (say 50 way parallelization and on). >>> >>> But honestly i would be happy to be wrong because I did not understand >>> some of the work or did not see some of the optimizations suggested. I >>> would be especially happy if it could beat our current ALS WR with a >>> meaningful margin on bigger data. >>> >>> -d >>> >>> On Sat, Jan 14, 2012 at 9:45 AM, Zeno Gantner <[email protected]> >>> wrote: >>> > Hi list, >>> > >>> > I was talking to Isabel Drost in December, and we talked about a nice >>> > paper from last year's KDD conference that suggests a neat trick that >>> > allows doing SGD for matrix factorization in parallel. >>> > >>> > She said this would be interesting for some of you here. >>> > >>> > Here is the paper: >>> > http://www.mpi-inf.mpg.de/~rgemulla/publications/gemulla11dsgd.pdf >>> > >>> > Note that the authors themselves implemented it already in Hadoop. >>> > >>> > Maybe someone would like to pick this up. >>> > >>> > I am still trying to find my way around the Mahout/Taste source code, >>> > so do not expect anything from me too soon ;-) >>> > >>> > Best regards, >>> > Zeno >>> -- Lance Norskog [email protected]
