I'll put up a more detailed review soon, but I'm basically +1 on this. I am a bit concerned at the many new data structures to be maintained in-memory per partition on each worker task (if I am reading the diff right) to maintain the cache. I am happy to see that with the LRU turned on, the 2 compute thread jobs are slightly faster. If these numbers are correct, Claudio has also shown us that the 2 compute thread in-memory job takes longer than the single threaded version! I'm hoping these tests were on EC2 :)
Thanks! On Sun, Feb 3, 2013 at 6:54 AM, Claudio Martella (JIRA) <[email protected]>wrote: > > [ > https://issues.apache.org/jira/browse/GIRAPH-461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] > > Claudio Martella updated GIRAPH-461: > ------------------------------------ > > Attachment: GIRAPH-461.patch > > Fixed a missing line. > > > Convert static assignment of in-memory partitions with LRU cache > > ---------------------------------------------------------------- > > > > Key: GIRAPH-461 > > URL: https://issues.apache.org/jira/browse/GIRAPH-461 > > Project: Giraph > > Issue Type: Sub-task > > Components: graph > > Reporter: Claudio Martella > > Attachments: GIRAPH-461.patch, GIRAPH-461.patch, GIRAPH-461.patch > > > > > > Currently, the out-of-core partitions are assigned to memory or to disk > statically. Using an LRU cache should help keeping in-memory only the > partitions that are actively accessed, given a job that does not access all > the graph at each superstep (traversals) and a good data partitioning (non > random). > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA > administrators > For more information on JIRA, see: http://www.atlassian.com/software/jira >
