Re: CF: parallel SGD for matrix factorization by Gemulla et al.

Lance Norskog Fri, 16 Mar 2012 18:44:19 -0700

Is "Hog Wild" expected to be faster on 2 processors are on 20? If it
is intended for many-processor machines, that may be a useful
addition. These days 8 cores is at the knee of the price-performance
curve for low-end servers.


Are there gradient descent algorithms suitable for OpenCL GPU coding?
GPU seems like a hole in the Mahout suite, and would be very sexy to
summer-of-code projects.

On Fri, Mar 16, 2012 at 11:21 AM, Dmitriy Lyubimov <[email protected]> wrote:
> I meant specifically the MR stuff which is SSGD seems to be aimed at.
> On a single CPU restarts or even simple CAS updates are not a problem
> as in the paper you've mentioned. There's no extra cost associated
> with them. I think Mahout's single node online SGD is already
> SMP-parallelized (albeit it does so to figure best fit for reg rate on
> a validation subset). as far as i remember. That's different from
> parallelization suggested in wild hog algo, but as long as we believe
> the work needs to be done and loads all cpus, there's probably not
> much win to use this or that approach for smp programming which
> essentially produce same quality result without meaningful improvement
> margin.
>
> On Thu, Mar 15, 2012 at 7:54 PM, Hector Yee <[email protected]> wrote:
>> Have you read the lock free hog wild paper? Just sgd with multiple threads
>> and don't be afraid of memory stomps. It works faster than batch
>> On Mar 15, 2012 2:32 PM, "Dmitriy Lyubimov" <[email protected]> wrote:
>>
>>> We already discussed the paper before. In fact, i had exactly same
>>> idea for partitioning the factorization task (something the authors
>>> call "stratified" sgd ) with stochastic learners before i ever saw
>>> this paper.
>>>
>>> I personally lost interest in this approach even before i read the
>>> paper because the way i understood it at that time, it would have
>>> required at least as many MR restarts with data exchange as there's a
>>> degree of parallelism and consequently just as many data passes. In
>>> framework of Mahout it is also difficult because Mahout doesn't
>>> support blocking out of the box for its DRM format so an additional
>>> job may be required to pre-block the data the way they want to process
>>> it --or-- we have to run over 100% of it during each restart, instead
>>> of a fraction if it.
>>>
>>> All in all, my speculation was there were little chances that this
>>> approach would provide a win over ALS techniques with restarts that we
>>> currently already have with a mid to high degree of parallelization
>>> (say 50 way parallelization and on).
>>>
>>> But honestly i would be happy to be wrong because I did not understand
>>> some of the work or did not see some of the optimizations suggested. I
>>> would be especially happy if it could beat our current ALS WR with a
>>> meaningful margin on bigger data.
>>>
>>> -d
>>>
>>> On Sat, Jan 14, 2012 at 9:45 AM, Zeno Gantner <[email protected]>
>>> wrote:
>>> > Hi list,
>>> >
>>> > I was talking to Isabel Drost in December, and we talked about a nice
>>> > paper from last year's KDD conference that suggests a neat trick that
>>> > allows doing SGD for matrix factorization in parallel.
>>> >
>>> > She said this would be interesting for some of you here.
>>> >
>>> > Here is the paper:
>>> > http://www.mpi-inf.mpg.de/~rgemulla/publications/gemulla11dsgd.pdf
>>> >
>>> > Note that the authors themselves implemented it already in Hadoop.
>>> >
>>> > Maybe someone would like to pick this up.
>>> >
>>> > I am still trying to find my way around the Mahout/Taste source code,
>>> > so do not expect anything from me too soon ;-)
>>> >
>>> > Best regards,
>>> >  Zeno
>>>



-- 
Lance Norskog
[email protected]

Re: CF: parallel SGD for matrix factorization by Gemulla et al.

Reply via email to