[ https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836543#action_12836543 ]
Ankur commented on MAHOUT-305: ------------------------------ Sean, Thanks for filing the jira. Nothing points from our discussion here. 1. Need to decide on the dataset to run both the implementations on. I have netflix dataset in mind but a strange thing I observed during my tests with it is that there were 2 - 3 users who rated more than 10,000 movies! This seemed a little odd to me. Can you or some else who has had experience with the dataset validate my observation ? 2. Both the implementations need to run on dataset in the identical environment to gauge performance and accuracy. For accuracy I believe we need to do a Precision-Recall test. My understanding of it is that a) Do a 80-20 split of the data (80% train and 20% test) with split happening on a timeline. b) Feed training data to the algorithm and generate recommendations for a subset of users from training data. c) Compare those recommendations with items actually present in the history of user in test data. d) Calculate precision = tp / (tp + fp) = (recommendations actually present in user's history) / (total items recommended) e) Calculate recall = tp / (tp + fn) = (recommendations actually present in user's history) / (total items in user's history) f) Finally take a simple avg of both across all the users to get approx global precision/recall. please feel free to correct any of the step above if I misunderstood anything. > Combine both cooccurrence-based CF M/R jobs > ------------------------------------------- > > Key: MAHOUT-305 > URL: https://issues.apache.org/jira/browse/MAHOUT-305 > Project: Mahout > Issue Type: Improvement > Components: Collaborative Filtering > Affects Versions: 0.2 > Reporter: Sean Owen > Priority: Minor > > We have two different but essentially identical MapReduce jobs to make > recommendations based on item co-occurrence: > org.apache.mahout.cf.taste.hadoop.{item,cooccurrence}. They ought to be > merged. Not sure exactly how to approach that but noting this in JIRA, per > Ankur. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.