[
https://issues.apache.org/jira/browse/MAHOUT-407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879415#action_12879415
]
Sean Owen commented on MAHOUT-407:
----------------------------------
Looks fine in principle, but could I ask you to bring it up to date with head?
not your fault, should have reviewed and submitted it earlier, but it's
conflicting with other recent changes. I think you're in the best position to
bring it up to date.
> Limit the number of similar items per item in the ItemSimilarityJob
> -------------------------------------------------------------------
>
> Key: MAHOUT-407
> URL: https://issues.apache.org/jira/browse/MAHOUT-407
> Project: Mahout
> Issue Type: New Feature
> Components: Collaborative Filtering
> Reporter: Sebastian Schelter
> Attachments: MAHOUT-407.patch
>
>
> In order to keep the item-similarity-matrix sparse, it would be a useful
> improvement to add an option like "maxSimilaritiesPerItem" to
> o.a.m.cf.taste.hadoop.similarity.item.ItemSimilarityJob, which would make it
> try to cap the number of similar items per item.
> However as we store each similarity pair only once it could happen that there
> are more than "maxSimilaritiesPerItem" similar items for a single item as we
> can't drop some of the pairs because the other item in the pair might have
> too little similarities otherwise.
> A default value of 100 co-occurrences (similarities) will be used because
> this is already the default in the distributed recommender.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.