[ https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150419#comment-13150419 ]
Raphael Cendrillon commented on MAHOUT-542: ------------------------------------------- I'd like to get more involved in contributing to Mahout. In particular if there's any area you need support regarding ALS-WR or other topics as well I'd be very happy to lend a hand. In particular I was quite interested in your comments on automatically finding a good setting for lambda. I'm wondering whether something more sophisticated could be done than exhaustive search, for example if the loss function evaluated on the hold-out dataset is a convex function of lambda then gradient descent (or quasi-Newton methods) could be used. > MapReduce implementation of ALS-WR > ---------------------------------- > > Key: MAHOUT-542 > URL: https://issues.apache.org/jira/browse/MAHOUT-542 > Project: Mahout > Issue Type: New Feature > Components: Collaborative Filtering > Affects Versions: 0.5 > Reporter: Sebastian Schelter > Assignee: Sebastian Schelter > Fix For: 0.5 > > Attachments: MAHOUT-452.patch, MAHOUT-542-2.patch, > MAHOUT-542-3.patch, MAHOUT-542-4.patch, MAHOUT-542-5.patch, > MAHOUT-542-6.patch, logs.zip > > > As Mahout is currently lacking a distributed collaborative filtering > algorithm that uses matrix factorization, I spent some time reading through a > couple of the Netflix papers and stumbled upon the "Large-scale Parallel > Collaborative Filtering for the Netflix Prize" available at > http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf. > It describes a parallel algorithm that uses "Alternating-Least-Squares with > Weighted-λ-Regularization" to factorize the preference-matrix and gives some > insights on how the authors distributed the computation using Matlab. > It seemed to me that this approach could also easily be parallelized using > Map/Reduce, so I sat down and created a prototype version. I'm not really > sure I got the mathematical details correct (they need some optimization > anyway), but I wanna put up my prototype implementation here per Yonik's law > of patches. > Maybe someone has the time and motivation to work a little on this with me. > It would be great if someone could validate the approach taken (I'm willing > to help as the code might not be intuitive to read) and could try to > factorize some test data and give feedback then. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira