[
https://issues.apache.org/jira/browse/MAHOUT-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898727#action_12898727
]
Sebastian Schelter commented on MAHOUT-460:
-------------------------------------------
Patch attached, which fixes a big misunderstanding in the existing code. I had
created MaybePruneRowsMapper from Sean's old UserVectorToCooccurrenceMapper.
It's main use should have been to limit the number of cooccurrences per item in
the RecommenderJob. Unfortunately it was applied to the item-user-matrix (the
itemvectors) instead of the user-item-matrix (the uservectors), which is now
corrected.
Please note that the approach taken here is only a heuristic as each mapper
instance tries to limit the number of cooccurrences on its own, if I understand
the code correctly.
I introduced a new job argument "maxCooccurrencesPerItem" with a default of 100.
> Add "maxPreferencesPerItemConsidered" option to
> o.a.m.cf.taste.hadoop.similarity.item.ItemSimilarityJob
> -------------------------------------------------------------------------------------------------------
>
> Key: MAHOUT-460
> URL: https://issues.apache.org/jira/browse/MAHOUT-460
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Reporter: Sebastian Schelter
> Attachments: MAHOUT-460.patch
>
>
> Because "coocurrence algorithms ... scale in the square of the number of
> occurrences most popular item" (Ted wrote that in a recent mail) we should
> offer a parameter to the ItemSimilarity job that makes it limit the number of
> considered preferences per item. RecommenderJob already has such an option.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.