[ https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836725#action_12836725 ]
Ankur commented on MAHOUT-305: ------------------------------ Typically when doing train-test data split, we divide the data on a timeline. So as a simple example if we have 10 days data then we would keep last 2 days data as test data and remaining as training data. If we remove all 5 star rating the crude way, we may not be able to ensure this condition, not a hard one but still a best practice AFAIK. Also I am not sure if 5 star ratings would be 20 or even 10% of the total data. The crude way you mentioned is ok for a start but I am not sure if its a fair evaluation or not. Also with this we would effectively be calculating precision as precision = (5 start recommendations actually present in user's history) / (total 5 star recommendations) recall = (5 start recommendations actually present in user's history) / (total 5 start items in user's history) is that what you mean? > Combine both cooccurrence-based CF M/R jobs > ------------------------------------------- > > Key: MAHOUT-305 > URL: https://issues.apache.org/jira/browse/MAHOUT-305 > Project: Mahout > Issue Type: Improvement > Components: Collaborative Filtering > Affects Versions: 0.2 > Reporter: Sean Owen > Assignee: Ankur > Priority: Minor > > We have two different but essentially identical MapReduce jobs to make > recommendations based on item co-occurrence: > org.apache.mahout.cf.taste.hadoop.{item,cooccurrence}. They ought to be > merged. Not sure exactly how to approach that but noting this in JIRA, per > Ankur. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.