[ 
https://issues.apache.org/jira/browse/MAHOUT-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966968#action_12966968
 ] 

Sebastian Schelter commented on MAHOUT-558:
-------------------------------------------

I think we misunderstand each other. When we compute similar items to A,B,C 
with the current trunk code than any X that is not similar to *each* of the 
input items will be excluded. Say X is similar to A and B but not to C than the 
last summand will be NaN and thus the estimate for X will be NaN, which means 
not similar and X will therefore be excluded.

With the patched version only the similarities of X to A and B would be 
considered. I agree that this favors obscure items. I initially proposed to 
interpret a "missing" similarity as 0 in this case, so that the average result 
would be lowered for items that have a smaller number of similarities. Should I 
better include this in the patch?

The practical use for me would be in this scenario: If you only work with a 
small number of precomputed similarities per item in memory and use the current 
version without the patch, mostSimilarItems(...) will give you empty results 
for a collection of input items in most cases (from my experiences). I've seen 
this behavior very often with larger shopping carts for example, and that's why 
I proposed to lower the conditions for items to be included in the result.

I don't see how this change would affect MostSimilarEstimator.

> Extend ItembasedRecommender to offer different "exclusion modes" when 
> computing most similar items to a collection of input items
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-558
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-558
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Sebastian Schelter
>         Attachments: MAHOUT-558.patch
>
>
> GenericItembased Recommender currently excludes all items that are not 
> similar at least one of the input items when computing the most similar items 
> to a collection of items. We should introduce a way to have the user decide 
> whether he/she wants this behavior or he/she wants to have all items included 
> that are similar to at least one of the input items, which is more useful in 
> practice in my experience.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to