[ https://issues.apache.org/jira/browse/MAHOUT-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158657#comment-13158657 ]
Sean Owen commented on MAHOUT-898: ---------------------------------- Yes I could imagine this improves metrics in some cases. I ran a little test and actually saw a small RMSE decrease over the existing implementation for example. I truly don't know whether it's overall going to help or hurt things. I would actually phrase your suggestion differently: instead of construing a negative weight as a vote against a value in the weighted average, it's construing it as a *positive* vote for the *opposite* value. Here opposite means the negative of the rating. And that's the only bit I have a problem with, conceptually. If the opposite of 4 on a scale of 5 were 2, instead of -4, it would seem complete. (Really, should be as far below the user's mean rating as 4 is above it -- and it happens to do that automatically if the mean is already 0, yes. It won't be 0 in general.) I think that's a perfectly coherent strategy, one I hadn't thought of before. It is different from what's in the literature and what's been in the code. I still hesitate to change the simple weighted average here. At the same time I think it would be fine to incorporate this other strategy. We could make this pluggable with a default implementation that does what the algorithm today does. It adds yet another hook and pluggable module to worry about, but, I don't think it's so bad. Am I missing anything easier? Looking for a way to balance the many issues in this thread as best we can. > Error in formula for preference estimation in GenericItemBasedRecommender > ------------------------------------------------------------------------- > > Key: MAHOUT-898 > URL: https://issues.apache.org/jira/browse/MAHOUT-898 > Project: Mahout > Issue Type: Bug > Components: Collaborative Filtering > Environment: mahout-core > Reporter: Paulo Villegas > Assignee: Sean Owen > Priority: Minor > Labels: patch > Fix For: 0.6 > > Attachments: GenericItemBasedRecommender.diff > > > The formula to estimate the preference for an item in the Taste item-based > recommender normalizes by the sum of similarities for items used in > estimation. But the terms in the sum taken to normalize should be in absolute > value, since they can be negative (e.g. when using Pearson correlation, > similarity is in [-1,1]). Now they are not, and as a result when there are > negative and positive values they cancel out, giving a small denominator and > incorrectly boosting the preference for the item (symptom: it is easy for a > predicted preference to take the maximum value, since the quotient becomes > large and it is capped afterwards) > The patch is rather trivial (a one-liner, actually) for > src/main/java/org/apache/mahout/cf/taste/impl/recommender/GenericItemBasedRecommender.java > Note: the same error & suggested fix happens in GenericUserBasedRecommender -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira