[ 
https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861144#action_12861144
 ] 

Sean Owen commented on MAHOUT-305:
----------------------------------

Ted says he likes LLR, and doesn't like throwing out the low-count 
co-occurrences.

I agree, in the sense that low-count doesn't mean unimportant. It's something 
that LLR that figures out whether it's meaningless or contains a lot of info.

I think the sentiment reduces to, this would be a better system if LLRs were 
used instead of simple co-occurrence counts as weights, which is right. It 
would involve the whole step of computing all item-item LLRs right? which can 
be done.

My vision is to start with this simple system and work towards generalizing, so 
I can stick in a different means of generation the weights matrix, and 
different strategies for pruning..

So if generalizing to create a second, LLR-based system comes next, does it 
make sense to leave in the dumb co-occurrence based system as well? meh, 
probably for now. So what's the appropriately dumb pruning method for 
co-occurrence counts?

Since pruning a co-occurrence means setting its count to 0, it made sense to me 
that the error from pruning is minimized by pruning those with lowest counts 
(already closest to 0).

(By the way I meant 'running well' in the sense of quickly; I haven't run much 
evaluation of the output yet.)

> Combine both cooccurrence-based CF M/R jobs
> -------------------------------------------
>
>                 Key: MAHOUT-305
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-305
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.2
>            Reporter: Sean Owen
>            Assignee: Ankur
>            Priority: Minor
>
> We have two different but essentially identical MapReduce jobs to make 
> recommendations based on item co-occurrence: 
> org.apache.mahout.cf.taste.hadoop.{item,cooccurrence}. They ought to be 
> merged. Not sure exactly how to approach that but noting this in JIRA, per 
> Ankur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to