[
https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419531#comment-13419531
]
Claudio Martella commented on GIRAPH-249:
-----------------------------------------
Yes, agreed. We should definitely re-use objects: that should save memory and
computation time GC side.
I've been thinking about a good strategy for spilling out-of-core the graph. I
can't really find any smart way of doing it. How do you choose which partition
to spill to disk? do you spill a piece of each partition or a whole partition?
If the first case is the chosen one, it gets tricky to compute the aggregate
threshold among multiple partitions. In the second case, it would make sense to
make the spiller aware of inactive vertices statistics. You want to spill to
disk partitions with inactive vertices. Which brings me to the impact of
spilling to disk partitions. We'd end up scanning the partition and loading
only vertices that are (1) active (b) are inactive but have messages sent in
previous superstep. It means you'd still waste a lot of IO, mapreduce style.
This is very tricky and i'm not sure how much sense it makes. IMO, it makes
sense to me to have the constraint of being able to keep the whole graph in
memory and eventually going 100% out-of-core with messages, whose strategy is
quite efficient and well understood, than mixing up pieces. On the contrary,
I'd rather go for just plain mapreduce. Just my two cents.
> Move part of the graph out-of-core when memory is low
> -----------------------------------------------------
>
> Key: GIRAPH-249
> URL: https://issues.apache.org/jira/browse/GIRAPH-249
> Project: Giraph
> Issue Type: Improvement
> Reporter: Alessandro Presta
> Assignee: Alessandro Presta
> Attachments: GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch,
> GIRAPH-249.patch, GIRAPH-249.patch
>
>
> There has been some talk about Giraph's scaling limitations due to keeping
> the whole graph and messages in RAM.
> We need to investigate methods to fall back to disk when running out of
> memory, while gracefully degrading performance.
> This issue is for graph storage. Messages should probably be a separate
> issue, although the interplay between the two is crucial.
> We should also discuss what are our primary goals here: completing a job
> (albeit slowly) instead of failing when the graph is too big, while still
> encouraging memory optimizations and high-memory clusters; or restructuring
> Giraph to be as efficient as possible in disk mode, making it almost a
> standard way of operating.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira