[ https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861171#action_12861171 ]
Ted Dunning commented on MAHOUT-305: ------------------------------------ {quote} Ted says he ... doesn't like throwing out the low-count co-occurrences. I agree, in the sense that low-count doesn't mean unimportant. It's something that LLR that figures out whether it's meaningless or contains a lot of info. {quote} Close. But I would go further and say that on average individual data records that are high count are generally less useful than those with low counts and they are quadratically more expensive to deal with. That combination of much higher expense and considerably lower value makes it seem to be a good idea to nuke (aka downsample) those records rather than lose the low count stuff. Dropping low count items in the combiner is even worse since there might have been quite a number scattered around that could have added up to interesting levels. > Combine both cooccurrence-based CF M/R jobs > ------------------------------------------- > > Key: MAHOUT-305 > URL: https://issues.apache.org/jira/browse/MAHOUT-305 > Project: Mahout > Issue Type: Improvement > Components: Collaborative Filtering > Affects Versions: 0.2 > Reporter: Sean Owen > Assignee: Ankur > Priority: Minor > > We have two different but essentially identical MapReduce jobs to make > recommendations based on item co-occurrence: > org.apache.mahout.cf.taste.hadoop.{item,cooccurrence}. They ought to be > merged. Not sure exactly how to approach that but noting this in JIRA, per > Ankur. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.