[ https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150471#comment-13150471 ]
Sebastian Schelter commented on MAHOUT-542: ------------------------------------------- Its an interesting idea. Please open a new jira issue for it, as this is already closed and was thought to only represent the initial implementation. > MapReduce implementation of ALS-WR > ---------------------------------- > > Key: MAHOUT-542 > URL: https://issues.apache.org/jira/browse/MAHOUT-542 > Project: Mahout > Issue Type: New Feature > Components: Collaborative Filtering > Affects Versions: 0.5 > Reporter: Sebastian Schelter > Assignee: Sebastian Schelter > Fix For: 0.5 > > Attachments: MAHOUT-452.patch, MAHOUT-542-2.patch, > MAHOUT-542-3.patch, MAHOUT-542-4.patch, MAHOUT-542-5.patch, > MAHOUT-542-6.patch, logs.zip > > > As Mahout is currently lacking a distributed collaborative filtering > algorithm that uses matrix factorization, I spent some time reading through a > couple of the Netflix papers and stumbled upon the "Large-scale Parallel > Collaborative Filtering for the Netflix Prize" available at > http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf. > It describes a parallel algorithm that uses "Alternating-Least-Squares with > Weighted-λ-Regularization" to factorize the preference-matrix and gives some > insights on how the authors distributed the computation using Matlab. > It seemed to me that this approach could also easily be parallelized using > Map/Reduce, so I sat down and created a prototype version. I'm not really > sure I got the mathematical details correct (they need some optimization > anyway), but I wanna put up my prototype implementation here per Yonik's law > of patches. > Maybe someone has the time and motivation to work a little on this with me. > It would be great if someone could validate the approach taken (I'm willing > to help as the code might not be intuitive to read) and could try to > factorize some test data and give feedback then. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira