[
https://issues.apache.org/jira/browse/MAHOUT-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966968#action_12966968
]
Sebastian Schelter commented on MAHOUT-558:
-------------------------------------------
I think we misunderstand each other. When we compute similar items to A,B,C
with the current trunk code than any X that is not similar to *each* of the
input items will be excluded. Say X is similar to A and B but not to C than the
last summand will be NaN and thus the estimate for X will be NaN, which means
not similar and X will therefore be excluded.
With the patched version only the similarities of X to A and B would be
considered. I agree that this favors obscure items. I initially proposed to
interpret a "missing" similarity as 0 in this case, so that the average result
would be lowered for items that have a smaller number of similarities. Should I
better include this in the patch?
The practical use for me would be in this scenario: If you only work with a
small number of precomputed similarities per item in memory and use the current
version without the patch, mostSimilarItems(...) will give you empty results
for a collection of input items in most cases (from my experiences). I've seen
this behavior very often with larger shopping carts for example, and that's why
I proposed to lower the conditions for items to be included in the result.
I don't see how this change would affect MostSimilarEstimator.
> Extend ItembasedRecommender to offer different "exclusion modes" when
> computing most similar items to a collection of input items
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Key: MAHOUT-558
> URL: https://issues.apache.org/jira/browse/MAHOUT-558
> Project: Mahout
> Issue Type: New Feature
> Components: Collaborative Filtering
> Affects Versions: 0.5
> Reporter: Sebastian Schelter
> Attachments: MAHOUT-558.patch
>
>
> GenericItembased Recommender currently excludes all items that are not
> similar at least one of the input items when computing the most similar items
> to a collection of items. We should introduce a way to have the user decide
> whether he/she wants this behavior or he/she wants to have all items included
> that are similar to at least one of the input items, which is more useful in
> practice in my experience.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.