[ 
https://issues.apache.org/jira/browse/MAHOUT-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619589#comment-13619589
 ] 

Cunlu Zou edited comment on MAHOUT-1185 at 4/2/13 7:30 AM:
-----------------------------------------------------------

Please check the code carefully, there are two variables calcuated in the 
processOneUser function, the average diffs (the variable *average* in the code) 
calculated correctly as you said, but there is also another variable to 
calculate the average preference value for *individual item* (the variable 
*itemAverage* in the code), they are totally different. The itemAverage value 
is used when no diffs values are avaible to predict the preference, for 
example, suppose we have following user-pref matrix (a-c are users,A-C are 
items)
    | ||A||B||C|
    |a||1||-||3|
    |b||2||-||4|
    |c||-||2||-|
for user c, we wanna predict the preference value for item C, since we only 
know user c has the preference value for item B, but there is no diff value 
available between B and C, in this case, the mahout tried to use the average 
value for item C which is (3+4)/2=3.5 as the predict value for the item C. The 
same case for user c to predict the preference value for item A. By comparing 
the predicted values, we then recommend item C not item A to user c instead.

However, the code has the mistake for calculating this average value (*NOT the 
DIFF value) as I stated in the previous comments, hope I made this clear.

                
      was (Author: stevenzcl1):
    Please check the code carefully, there are two variables calcuated in the 
processOneUser function, the average diffs (the variable *average* in the code) 
calculated correctly as you said, but there is also another variable to 
calculate the average preference value for *individual item* (the variable 
*itemAverage* in the code), they are totally different. The itemAverage value 
is used when no diffs values are avaible to predict the preference, for 
example, suppose we have following user-pref matrix (a-c are users,A-C are 
items)
        A    B    C
    a   1    -    3
    b   2    -    4
    c   -    2    -
for user c, we wanna predict the preference value for item C, since we only 
know user c has the preference value for item B, but there is no diff value 
available between B and C, in this case, the mahout tried to use the average 
value for item C which is (3+4)/2=3.5 as the predict value for the item C. The 
same case for user c to predict the preference value for item A. By comparing 
the predicted values, we then recommend item C not item A to user c instead.

However, the code has the mistake for calculating this average value (*NOT the 
DIFF value) as I stated in the previous comments, hope I made this clear.

                  
> MemoryDiffStorage.class has a bug for slope one algorithm which could cause 
> incorrect recommendation results
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1185
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1185
>             Project: Mahout
>          Issue Type: Bug
>          Components: Collaborative Filtering
>    Affects Versions: 0.7
>         Environment: Ubuntu
>            Reporter: Cunlu Zou
>            Assignee: Sean Owen
>              Labels: patch
>         Attachments: MemoryDiffStorage.patch
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> The function processOneUser(long averageCount, long userID) in the 
> MemoryDiffStorage.class file contains a bug for calculating the itemAverage. 
> Since the function tried to calculate the average difference among items (in 
> a nested loop) and also the average individual item preference value in the 
> same loop (the loop only from 0 to length-2, *for (int i = 0; i < length - 1; 
> i++)*), the itemAverage variable does not count the last item's preference 
> value for every users which could lead to an incorrect recommendation results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to