[ 
https://issues.apache.org/jira/browse/MAHOUT-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158657#comment-13158657
 ] 

Sean Owen commented on MAHOUT-898:
----------------------------------

Yes I could imagine this improves metrics in some cases. I ran a little test 
and actually saw a small RMSE decrease over the existing implementation for 
example. I truly don't know whether it's overall going to help or hurt things.

I would actually phrase your suggestion differently: instead of construing a 
negative weight as a vote against a value in the weighted average, it's 
construing it as a *positive* vote for the *opposite* value. Here opposite 
means the negative of the rating. And that's the only bit I have a problem 
with, conceptually. If the opposite of 4 on a scale of 5 were 2, instead of -4, 
it would seem complete. (Really, should be as far below the user's mean rating 
as 4 is above it -- and it happens to do that automatically if the mean is 
already 0, yes. It won't be 0 in general.)

I think that's a perfectly coherent strategy, one I hadn't thought of before. 
It is different from what's in the literature and what's been in the code. I 
still hesitate to change the simple weighted average here. At the same time I 
think it would be fine to incorporate this other strategy.

We could make this pluggable with a default implementation that does what the 
algorithm today does. It adds yet another hook and pluggable module to worry 
about, but, I don't think it's so bad.

Am I missing anything easier? Looking for a way to balance the many issues in 
this thread as best we can.

                
> Error in formula for preference estimation in GenericItemBasedRecommender
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-898
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-898
>             Project: Mahout
>          Issue Type: Bug
>          Components: Collaborative Filtering
>         Environment: mahout-core
>            Reporter: Paulo Villegas
>            Assignee: Sean Owen
>            Priority: Minor
>              Labels: patch
>             Fix For: 0.6
>
>         Attachments: GenericItemBasedRecommender.diff
>
>
> The formula to estimate the preference for an item in the Taste item-based 
> recommender normalizes by the sum of similarities for items used in 
> estimation. But the terms in the sum taken to normalize should be in absolute 
> value, since they can be negative (e.g. when using Pearson correlation, 
> similarity is in [-1,1]). Now they are not, and as a result when there are 
> negative and positive values they cancel out, giving a small denominator and 
> incorrectly boosting the preference for the item (symptom: it is easy for a 
> predicted preference to take the maximum value, since the quotient becomes 
> large and it is capped afterwards)
> The patch is rather trivial (a one-liner, actually) for 
> src/main/java/org/apache/mahout/cf/taste/impl/recommender/GenericItemBasedRecommender.java
> Note: the same error & suggested fix happens in GenericUserBasedRecommender

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to