[ https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838044#action_12838044 ]
Tamas Jambor commented on MAHOUT-305: ------------------------------------- I usually pick random N % data for each user as Ankur suggested. This would ensure that the recommender is not biased, and it doesn't really matter that non-relevant items are in this subset, since they needed to be ranked lower anyway. I think the way Sean implemented it is also pretty good, taking the top-n relevant items and evaluating on that data, but you have to build a new model for each user, which makes it impossible to use it on a big data set, especially with SVD. I agree that the other issue is how to deal with non-rated item. I personally just rank items that has known ratings, so that the relevance judgement is always known. but I've been thinking to changing it to count unrated items as non-relevant. I think there are pros and cons either way. > Combine both cooccurrence-based CF M/R jobs > ------------------------------------------- > > Key: MAHOUT-305 > URL: https://issues.apache.org/jira/browse/MAHOUT-305 > Project: Mahout > Issue Type: Improvement > Components: Collaborative Filtering > Affects Versions: 0.2 > Reporter: Sean Owen > Assignee: Ankur > Priority: Minor > > We have two different but essentially identical MapReduce jobs to make > recommendations based on item co-occurrence: > org.apache.mahout.cf.taste.hadoop.{item,cooccurrence}. They ought to be > merged. Not sure exactly how to approach that but noting this in JIRA, per > Ankur. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.