[ 
https://issues.apache.org/jira/browse/MAHOUT-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717659#comment-13717659
 ] 

Peng Cheng commented on MAHOUT-1286:
------------------------------------

Aye aye, I just did, turns out that instances of PreferenceArray$PreferenceView 
has taken 1.7G. Quite unexpected right? Thanks a lot for the advice.
My next experiment will just use GenericPreference [] directly, there will be 
no more PreferenceArray.

Class Name                                                                     
|    Objects |  Shallow Heap |    Retained Heap
-------------------------------------------------------------------------------------------------------------------------------
org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray$PreferenceView|
 72,237,632 | 1,733,703,168 | >= 1,733,703,168
long[]                                                                         
|    480,199 |   818,209,680 |   >= 818,209,680
float[]                                                                        
|    480,190 |   410,563,592 |   >= 410,563,592
java.lang.Object[]                                                             
|     18,230 |   361,525,488 | >= 2,443,647,088
org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray               
|    480,189 |    15,366,048 | >= 1,237,456,672
java.util.ArrayList                                                            
|     17,811 |       427,464 | >= 2,092,416,104
char[]                                                                         
|      2,150 |       272,632 |       >= 272,632
byte[]                                                                         
|        141 |        54,048 |        >= 54,048
java.lang.String                                                               
|      2,119 |        50,856 |       >= 271,920
java.util.concurrent.ConcurrentHashMap$HashEntry                               
|        673 |        21,536 |        >= 38,104
java.net.URL                                                                   
|        229 |        14,656 |        >= 40,720
java.util.HashMap$Entry                                                        
|        344 |        11,008 |        >= 68,760
-------------------------------------------------------------------------------------------------------------------------------

                
> Memory-efficient DataModel, supporting fast online updates and element-wise 
> iteration
> -------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1286
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1286
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.9
>            Reporter: Peng Cheng
>            Assignee: Sean Owen
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Most DataModel implementation in current CF component use hash map to enable 
> fast 2d indexing and update. This is not memory-efficient for big data set. 
> e.g. Netflix prize dataset takes 11G heap space as a FileDataModel.
> Improved implementation of DataModel should use more compact data structure 
> (like arrays), this can trade a little of time complexity in 2d indexing for 
> vast improvement in memory efficiency. In addition, any online recommender or 
> online-to-batch converted recommender will not be affected by this in 
> training process.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to