[ https://issues.apache.org/jira/browse/MAHOUT-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836102#comment-13836102 ]
Gokhan Capan commented on MAHOUT-1286: -------------------------------------- Let's "Won't Fix" this issue. I think what we need to do is implementing more sparse matrix (or alike) data structures for different access patterns, other than the current map of maps approach. The ideas would apply to current 2 FastByIDMaps based DataModel. > Memory-efficient DataModel, supporting fast online updates and element-wise > iteration > ------------------------------------------------------------------------------------- > > Key: MAHOUT-1286 > URL: https://issues.apache.org/jira/browse/MAHOUT-1286 > Project: Mahout > Issue Type: Improvement > Components: Collaborative Filtering > Affects Versions: 0.9 > Reporter: Peng Cheng > Labels: collaborative-filtering, datamodel, patch, recommender > Fix For: 0.9 > > Attachments: InMemoryDataModel.java, InMemoryDataModelTest.java, > Semifinal-implementation-added.patch, benchmark.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > Most DataModel implementation in current CF component use hash map to enable > fast 2d indexing and update. This is not memory-efficient for big data set. > e.g. Netflix prize dataset takes 11G heap space as a FileDataModel. > Improved implementation of DataModel should use more compact data structure > (like arrays), this can trade a little of time complexity in 2d indexing for > vast improvement in memory efficiency. In addition, any online recommender or > online-to-batch converted recommender will not be affected by this in > training process. -- This message was sent by Atlassian JIRA (v6.1#6144)