[ https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107453#comment-13107453 ]
Fabian Alenius commented on MAHOUT-542: --------------------------------------- Hi, I was thinking of rewriting the itemRatings and userRatings job into one job using MultipleOutputs. Based on my understanding release 0.20.2* supports MultipleOutputs, although using deprecated APIS. Would such a patch be accepted or are there issues prohibiting such a change? What is the current target version of Hadoop? > MapReduce implementation of ALS-WR > ---------------------------------- > > Key: MAHOUT-542 > URL: https://issues.apache.org/jira/browse/MAHOUT-542 > Project: Mahout > Issue Type: New Feature > Components: Collaborative Filtering > Affects Versions: 0.5 > Reporter: Sebastian Schelter > Assignee: Sebastian Schelter > Fix For: 0.5 > > Attachments: MAHOUT-452.patch, MAHOUT-542-2.patch, > MAHOUT-542-3.patch, MAHOUT-542-4.patch, MAHOUT-542-5.patch, > MAHOUT-542-6.patch, logs.zip > > > As Mahout is currently lacking a distributed collaborative filtering > algorithm that uses matrix factorization, I spent some time reading through a > couple of the Netflix papers and stumbled upon the "Large-scale Parallel > Collaborative Filtering for the Netflix Prize" available at > http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf. > It describes a parallel algorithm that uses "Alternating-Least-Squares with > Weighted-λ-Regularization" to factorize the preference-matrix and gives some > insights on how the authors distributed the computation using Matlab. > It seemed to me that this approach could also easily be parallelized using > Map/Reduce, so I sat down and created a prototype version. I'm not really > sure I got the mathematical details correct (they need some optimization > anyway), but I wanna put up my prototype implementation here per Yonik's law > of patches. > Maybe someone has the time and motivation to work a little on this with me. > It would be great if someone could validate the approach taken (I'm willing > to help as the code might not be intuitive to read) and could try to > factorize some test data and give feedback then. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira