[
https://issues.apache.org/jira/browse/MAHOUT-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295596#comment-13295596
]
CodyInnowhere commented on MAHOUT-1032:
---------------------------------------
The data is very sparse indeed. I think I should do more pruning before try
this algorithm again.
> AggregateAndRecommendReducer gets OOM in setup() method
> -------------------------------------------------------
>
> Key: MAHOUT-1032
> URL: https://issues.apache.org/jira/browse/MAHOUT-1032
> Project: Mahout
> Issue Type: Bug
> Components: Collaborative Filtering
> Affects Versions: 0.5, 0.6, 0.7, 0.8
> Environment: hadoop cluster with -Xmx set to 2G
> Reporter: CodyInnowhere
> Assignee: Sean Owen
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> This bug is actually caused by the very first job: itemIDIndex. This job
> transfers itemID to an integer index, and in the later
> AggregateAndRecommendReducer, tries to read all items to the
> OpenIntLongHashMap indexItemIDMap. However, for large data sets, e.g., my
> test data set covers 100million+ items(not too many items for a large
> e-commerce website), tasks get out of memory in setup() method. I don't think
> the itemIDIndex is necessary, without this job, the final
> AggregateAndRecommend step doesn't have to read all items to the memory to do
> the reverse index mapping.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira