[ 
https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412968#comment-13412968
 ] 

Eli Reisman commented on GIRAPH-249:
------------------------------------

I've been looking at all this stuff too for my internship. For Jakob's 
fantastic metrics patch GIRAPH-232, I have learned most of the memory danger is 
during INPUT_SUPERSTEP when InputSplits become partitions and start to get 
moved around. Whether a worker is sending out partitions as it builds them 
doesn't help us because that same worker is also receiving partitions of its 
own from other workers that read a particular InputSplit! So this is the real 
memory danger we've seen.

So...what about spilling to disk only when a worker is low on memory during 
INPUT_SUPERSTEP, and at the end of it reloading them when the pressure of 
processing splits is done and the pressure of mutating the graph and message 
passing begins. This is when the "partitions in RAM" piece becomes such a win 
anyway. And best of all, no partitions are being mutated once they are 
constructed and sent to their assigned worker during INPUT_SUPERSTEP, so any 
spilled to disk will not need to return.

I have found using GIRAPH-247 to even out the lumpiness of large social graph 
data, the partition sizes are already WAY more balanced than using the old 
vertex-centric way, so that also might play well with this idea.

What do you think? If you try the metrics patch, you will see what I'm talking 
about with the INPUT_SUPERSTEP, its dramatic with a large input data set.

                
> Move part of the graph out-of-core when memory is low
> -----------------------------------------------------
>
>                 Key: GIRAPH-249
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-249
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Alessandro Presta
>            Assignee: Alessandro Presta
>         Attachments: GIRAPH-249.patch
>
>
> There has been some talk about Giraph's scaling limitations due to keeping 
> the whole graph and messages in RAM.
> We need to investigate methods to fall back to disk when running out of 
> memory, while gracefully degrading performance.
> This issue is for graph storage. Messages should probably be a separate 
> issue, although the interplay between the two is crucial.
> We should also discuss what are our primary goals here: completing a job 
> (albeit slowly) instead of failing when the graph is too big, while still 
> encouraging memory optimizations and high-memory clusters; or restructuring 
> Giraph to be as efficient as possible in disk mode, making it almost a 
> standard way of operating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to