[ https://issues.apache.org/jira/browse/MAHOUT-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619589#comment-13619589 ]
Cunlu Zou commented on MAHOUT-1185: ----------------------------------- Please check the code carefully, there are two variables calcuated in the processOneUser function, the average diffs (the variable *average* in the code) calculated correctly as you said, but there is also another variable to calculate the average preference value for *individual item* (the variable *itemAverage* in the code), they are totally different. The itemAverage value is used when no diffs values are avaible to predict the preference, for example, suppose we have following user-pref matrix (a-c are users,A-C are items) A B C a 1 - 3 b 2 - 4 c - 2 - for user c, we wanna predict the preference value for item C, since we only know user c has the preference value for item B, but there is no diff value available between B and C, in this case, the mahout tried to use the average value for item C which is (3+4)/2=3.5 as the predict value for the item C. The same case for user c to predict the preference value for item A. By comparing the predicted values, we then recommend item C not item A to user c instead. However, the code has the mistake for calculating this average value (*NOT the DIFF value) as I stated in the previous comments, hope I made this clear. > MemoryDiffStorage.class has a bug for slope one algorithm which could cause > incorrect recommendation results > ------------------------------------------------------------------------------------------------------------ > > Key: MAHOUT-1185 > URL: https://issues.apache.org/jira/browse/MAHOUT-1185 > Project: Mahout > Issue Type: Bug > Components: Collaborative Filtering > Affects Versions: 0.7 > Environment: Ubuntu > Reporter: Cunlu Zou > Assignee: Sean Owen > Labels: patch > Attachments: MemoryDiffStorage.patch > > Original Estimate: 10m > Remaining Estimate: 10m > > The function processOneUser(long averageCount, long userID) in the > MemoryDiffStorage.class file contains a bug for calculating the itemAverage. > Since the function tried to calculate the average difference among items (in > a nested loop) and also the average individual item preference value in the > same loop (the loop only from 0 to length-2, *for (int i = 0; i < length - 1; > i++)*), the itemAverage variable does not count the last item's preference > value for every users which could lead to an incorrect recommendation results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira