[ 
https://issues.apache.org/jira/browse/MAHOUT-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701233#comment-13701233
 ] 

Peng Cheng edited comment on MAHOUT-1272 at 7/6/13 2:43 PM:
------------------------------------------------------------

Hey I have finished the class and test for parallel sgd factorizer for 
matrix-completion based recommender (not mapreduced, just single machine 
multi-thread), it is loosely based on vanilla sgd and hogwild!. I have only 
tested on toy and synthetic data (2000users * 1000 items) but it is pretty 
fast, 3-5x times faster than vanilla sgd with 8 cores. (never exceed 6x, 
apparently the executor induces high overhead allocation cost) And definitely 
faster than single machine ALSWR. 

I'm submitting my java files and patch for review.
                
      was (Author: peng):
    Hey I have finished the class and test for parallel sgd factorizer for 
matrix-completion based recommender (not mapreduced, just single machine 
multi-thread), it is loosely based on vanilla sgd and hogwild!. I have only 
tested on toy and synthetic data (2000users * 1000 times) but it is pretty 
fast, 3-5x times faster than vanilla sgd with 8 cores. (never exceed 6x, 
apparently the executor induces high overhead allocation cost) And definitely 
faster than single machine ALSWR. 

I'm submitting my java files and patch for review.
                  
> Parallel SGD matrix factorizer for SVDrecommender
> -------------------------------------------------
>
>                 Key: MAHOUT-1272
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1272
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Peng Cheng
>            Assignee: Sean Owen
>              Labels: features, patch, test
>         Attachments: mahout.patch, ParallelSGDFactorizer.java, 
> ParallelSGDFactorizerTest.java
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> a parallel factorizer based on MAHOUT-1089 may achieve better performance on 
> multicore processor.
> existing code is single-thread and perhaps may still be outperformed by the 
> default ALS-WR.
> In addition, its hardcoded online-to-batch-conversion prevents it to be used 
> by an online recommender. An online SGD implementation may help build 
> high-performance online recommender as a replacement of the outdated 
> slope-one.
> The new factorizer can implement either DSGD 
> (http://www.mpi-inf.mpg.de/~rgemulla/publications/gemulla11dsgd.pdf) or 
> hogwild! (www.cs.wisc.edu/~brecht/papers/hogwildTR.pdf).
> Related discussion has been carried on for a while but remain inconclusive:
> http://web.archiveorange.com/archive/v/z6zxQUSahofuPKEzZkzl

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to