Re: [jira] [Commented] (MAHOUT-1286) Memory-efficient DataModel, supporting fast online updates and element-wise iteration

Peng Cheng Tue, 23 Jul 2013 15:07:35 -0700

That's exactly what I'm trying to do right now :) (I'm testingFastByIDArrayMap), but we probably have more problems than just HashMap,based on the heap dump analysis result, PreferenceArray probably will beour next target. This is awesome, as your FactorizablePreferences didn'tuse it in the first place.


Yours Peng


On 13-07-23 05:46 PM, Sebastian Schelter wrote:

IMHO you will always have memory issues if you try to provide constant time
random access. Thats why I proposed to created a special memory efficient
DataModel for sequential access.


2013/7/23 Peng Cheng (JIRA) <[email protected]>

     [
https://issues.apache.org/jira/browse/MAHOUT-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717659#comment-13717659]

Peng Cheng commented on MAHOUT-1286:
------------------------------------

Aye aye, I just did, turns out that instances of
PreferenceArray$PreferenceView has taken 1.7G. Quite unexpected right?
Thanks a lot for the advice.
My next experiment will just use GenericPreference [] directly, there will
be no more PreferenceArray.

Class Name
     |    Objects |  Shallow Heap |    Retained Heap

-------------------------------------------------------------------------------------------------------------------------------
org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray$PreferenceView|
72,237,632 | 1,733,703,168 | >= 1,733,703,168
long[]
     |    480,199 |   818,209,680 |   >= 818,209,680
float[]
      |    480,190 |   410,563,592 |   >= 410,563,592
java.lang.Object[]
     |     18,230 |   361,525,488 | >= 2,443,647,088
org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray
     |    480,189 |    15,366,048 | >= 1,237,456,672
java.util.ArrayList
      |     17,811 |       427,464 | >= 2,092,416,104
char[]
     |      2,150 |       272,632 |       >= 272,632
byte[]
     |        141 |        54,048 |        >= 54,048
java.lang.String
     |      2,119 |        50,856 |       >= 271,920
java.util.concurrent.ConcurrentHashMap$HashEntry
     |        673 |        21,536 |        >= 38,104
java.net.URL
     |        229 |        14,656 |        >= 40,720
java.util.HashMap$Entry
      |        344 |        11,008 |        >= 68,760

-------------------------------------------------------------------------------------------------------------------------------

Memory-efficient DataModel, supporting fast online updates and

element-wise iteration
-------------------------------------------------------------------------------------

                 Key: MAHOUT-1286
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1286
             Project: Mahout
          Issue Type: Improvement
          Components: Collaborative Filtering
    Affects Versions: 0.9
            Reporter: Peng Cheng
            Assignee: Sean Owen
   Original Estimate: 336h
  Remaining Estimate: 336h

Most DataModel implementation in current CF component use hash map to

enable fast 2d indexing and update. This is not memory-efficient for big
data set. e.g. Netflix prize dataset takes 11G heap space as a
FileDataModel.

Improved implementation of DataModel should use more compact data

structure (like arrays), this can trade a little of time complexity in 2d
indexing for vast improvement in memory efficiency. In addition, any online
recommender or online-to-batch converted recommender will not be affected
by this in training process.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA
administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] [Commented] (MAHOUT-1286) Memory-efficient DataModel, supporting fast online updates and element-wise iteration

Reply via email to