[
https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415831#comment-13415831
]
Eli Reisman commented on GIRAPH-249:
------------------------------------
I did test against trunk, I what I meant above is that I tried a small sample
of data at first compared to what I was running successfully on the other
patches and it still died. I wish I could send you the metrics (I can't, but
you could install GIRAPH-232 and set up a Graphite server on your local machine
to see on your browser, its dramatic what a great view under the hood you get)
but yes please take my word, its all about input super step right now. If we
get through that better, the thing will scale a long way before you have to
worry about fixing the computation steps.
I agree about trying with a lower mem ration on config. I would say you might
even be much safer at ".15" I see failures at 92% to 94% memory ratio usage in
my metrics, no matter the patches involved. if it gets above that, no chance
for that worker to survive the input step. 85% usage seems to be totally safe.
I might be working other project this week, but I will try to keep an eye on
this and help test more if you like. It will be almost impossible to know if
this patch is doing what it should if you are not testing on a cluster. Its
really great code, nice work either way.
> Move part of the graph out-of-core when memory is low
> -----------------------------------------------------
>
> Key: GIRAPH-249
> URL: https://issues.apache.org/jira/browse/GIRAPH-249
> Project: Giraph
> Issue Type: Improvement
> Reporter: Alessandro Presta
> Assignee: Alessandro Presta
> Attachments: GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch,
> GIRAPH-249.patch, GIRAPH-249.patch
>
>
> There has been some talk about Giraph's scaling limitations due to keeping
> the whole graph and messages in RAM.
> We need to investigate methods to fall back to disk when running out of
> memory, while gracefully degrading performance.
> This issue is for graph storage. Messages should probably be a separate
> issue, although the interplay between the two is crucial.
> We should also discuss what are our primary goals here: completing a job
> (albeit slowly) instead of failing when the graph is too big, while still
> encouraging memory optimizations and high-memory clusters; or restructuring
> Giraph to be as efficient as possible in disk mode, making it almost a
> standard way of operating.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira