[
https://issues.apache.org/jira/browse/MAHOUT-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158657#comment-13158657
]
Sean Owen commented on MAHOUT-898:
----------------------------------
Yes I could imagine this improves metrics in some cases. I ran a little test
and actually saw a small RMSE decrease over the existing implementation for
example. I truly don't know whether it's overall going to help or hurt things.
I would actually phrase your suggestion differently: instead of construing a
negative weight as a vote against a value in the weighted average, it's
construing it as a *positive* vote for the *opposite* value. Here opposite
means the negative of the rating. And that's the only bit I have a problem
with, conceptually. If the opposite of 4 on a scale of 5 were 2, instead of -4,
it would seem complete. (Really, should be as far below the user's mean rating
as 4 is above it -- and it happens to do that automatically if the mean is
already 0, yes. It won't be 0 in general.)
I think that's a perfectly coherent strategy, one I hadn't thought of before.
It is different from what's in the literature and what's been in the code. I
still hesitate to change the simple weighted average here. At the same time I
think it would be fine to incorporate this other strategy.
We could make this pluggable with a default implementation that does what the
algorithm today does. It adds yet another hook and pluggable module to worry
about, but, I don't think it's so bad.
Am I missing anything easier? Looking for a way to balance the many issues in
this thread as best we can.
> Error in formula for preference estimation in GenericItemBasedRecommender
> -------------------------------------------------------------------------
>
> Key: MAHOUT-898
> URL: https://issues.apache.org/jira/browse/MAHOUT-898
> Project: Mahout
> Issue Type: Bug
> Components: Collaborative Filtering
> Environment: mahout-core
> Reporter: Paulo Villegas
> Assignee: Sean Owen
> Priority: Minor
> Labels: patch
> Fix For: 0.6
>
> Attachments: GenericItemBasedRecommender.diff
>
>
> The formula to estimate the preference for an item in the Taste item-based
> recommender normalizes by the sum of similarities for items used in
> estimation. But the terms in the sum taken to normalize should be in absolute
> value, since they can be negative (e.g. when using Pearson correlation,
> similarity is in [-1,1]). Now they are not, and as a result when there are
> negative and positive values they cancel out, giving a small denominator and
> incorrectly boosting the preference for the item (symptom: it is easy for a
> predicted preference to take the maximum value, since the quotient becomes
> large and it is capped afterwards)
> The patch is rather trivial (a one-liner, actually) for
> src/main/java/org/apache/mahout/cf/taste/impl/recommender/GenericItemBasedRecommender.java
> Note: the same error & suggested fix happens in GenericUserBasedRecommender
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira