[ https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861144#action_12861144 ]
Sean Owen commented on MAHOUT-305: ---------------------------------- Ted says he likes LLR, and doesn't like throwing out the low-count co-occurrences. I agree, in the sense that low-count doesn't mean unimportant. It's something that LLR that figures out whether it's meaningless or contains a lot of info. I think the sentiment reduces to, this would be a better system if LLRs were used instead of simple co-occurrence counts as weights, which is right. It would involve the whole step of computing all item-item LLRs right? which can be done. My vision is to start with this simple system and work towards generalizing, so I can stick in a different means of generation the weights matrix, and different strategies for pruning.. So if generalizing to create a second, LLR-based system comes next, does it make sense to leave in the dumb co-occurrence based system as well? meh, probably for now. So what's the appropriately dumb pruning method for co-occurrence counts? Since pruning a co-occurrence means setting its count to 0, it made sense to me that the error from pruning is minimized by pruning those with lowest counts (already closest to 0). (By the way I meant 'running well' in the sense of quickly; I haven't run much evaluation of the output yet.) > Combine both cooccurrence-based CF M/R jobs > ------------------------------------------- > > Key: MAHOUT-305 > URL: https://issues.apache.org/jira/browse/MAHOUT-305 > Project: Mahout > Issue Type: Improvement > Components: Collaborative Filtering > Affects Versions: 0.2 > Reporter: Sean Owen > Assignee: Ankur > Priority: Minor > > We have two different but essentially identical MapReduce jobs to make > recommendations based on item co-occurrence: > org.apache.mahout.cf.taste.hadoop.{item,cooccurrence}. They ought to be > merged. Not sure exactly how to approach that but noting this in JIRA, per > Ankur. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.