[jira] [Commented] (MAHOUT-1286) Memory-efficient DataModel, supporting fast online updates and element-wise iteration

Gokhan Capan (JIRA) Thu, 05 Sep 2013 05:23:57 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759021#comment-13759021
 ]


Gokhan Capan commented on MAHOUT-1286:
--------------------------------------

There was a thread on updating "int" indices and "double" values in matrices, 
but there are simply too many consequences of that update that we can't deal 
with right now. Even if it is not an exact Matrix structure, we can start with 
2d hash tables and proceed later. 

Let's start this. I tried to insert Netflix ratings into: i- DataModel backed 
by 2 matrices. ii- The one in this patch. Good news is insert performance is 
good enough. I am going to try gets and iterations, too. Tomorrow I am starting 
the 2d hash table based on your implementation with a matrix-like interface, I 
am going to share a github link with you.
                
> Memory-efficient DataModel, supporting fast online updates and element-wise 
> iteration
> -------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1286
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1286
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.9
>            Reporter: Peng Cheng
>              Labels: collaborative-filtering, datamodel, patch, recommender
>             Fix For: 0.9
>
>         Attachments: InMemoryDataModel.java, InMemoryDataModelTest.java, 
> Semifinal-implementation-added.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Most DataModel implementation in current CF component use hash map to enable 
> fast 2d indexing and update. This is not memory-efficient for big data set. 
> e.g. Netflix prize dataset takes 11G heap space as a FileDataModel.
> Improved implementation of DataModel should use more compact data structure 
> (like arrays), this can trade a little of time complexity in 2d indexing for 
> vast improvement in memory efficiency. In addition, any online recommender or 
> online-to-batch converted recommender will not be affected by this in 
> training process.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1286) Memory-efficient DataModel, supporting fast online updates and element-wise iteration

Reply via email to