[
https://issues.apache.org/jira/browse/GIRAPH-322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454229#comment-13454229
]
Eli Reisman commented on GIRAPH-322:
------------------------------------
No problem Hyunsik, this patch is undergoing change and not ready for action
yet, if you're playing with testing the test.sh script, go right ahead!
Maja: Thats good to know about spill to disk, that is not what I saw before and
I should definitely play with this some more, I must have set the params wrong.
The thing about disk spill here is no one seems to bite on the idea of
replacing any existing solutions with Giraph if spill to disk is involved, most
folks here seem already convinced that a solution that spills to disk too much
is not different enough from the existing solutions for a given use case to
warrant changing the approach. The use cases for Giraph here revolve around
scale out on many machines and running in-memory. I am hoping by getting your
params set right and using it as an emergency buffer when memory is just too
tight we can gradually loosen up this view and let users decide for themselves
if its an option they want to exercise, but as it stands now I need to pursue
solutions that are at least feasible in memory first. I am excited to see this
might still be a viable option to make this work. Thanks for the advice!
As far as where the crash occurs, I'm saying the whole graph data I'm using
loads no problem and can exist in memory. So far as I tested an earlier
messaging solution and GIRAPH-314 with amortizing on different numbers of
supersteps, the data in-memory was not where the errors were reported. They
were coming on the Netty receiving side as all the messages were arriving at
their destinations. When the messages arrive, the data (for the most part) is
aggregating how many times a 2nd degree neighbor is encountered in incoming
messages, so the data we aggregate at each vertex is not going to grow as fast
as the accumulated size of the messages the vertex receives. Thats why the
amortizing helped so much, if we can process each message we can GC it quickly
and the local storage per vertex only grows in size when a new unique neighbor
ID is encountered in an incoming message.
So...we'll see what actually happens, this is an experiment after all, but so
far so good. The volume on the network and messages sitting on the sender or
receiver end before processing in a given superstep seem to represent the
memory overload dangers. If we amortize message volume over supersteps and
de-duplicate transmissions, this pressure is relieved. How much remains to be
seen. I am still operating from a laptop and haven't been able to get set up to
do testing yet so no word. But i think your fix was just the right thing to get
past the error I was seeing, thanks!
Two things I want to try in another iteration on this patch right away:
1) Try to use the PartitionOwner to send one copy to each host worker and
spread out duplicate references to partitions on the remote owner rather than
grouping them early as we spoke about. In this way perhaps even when a use case
does not combine this with -Dhash.userPartitionCount we can at least
deduplicate volume of messages on the wire to the same host all the time. If
this goes well, it will not be hard to throw up another JIRA and convert
SendMessageCache similarly.
2) make sure the solution is robust for non-idempotent messaging by removing
the M -> Set<I> in favor of a M -> List<I> model to retain duplicate deliveries
of the same message to the same vertex. In cases like PageRank where the same
DoubleWritable might get sent to the same vertex from neighbors on the same
sending worker from multiple partitions on that worker, this message has to be
seen as a "fresh delivery" of that value as the receiver iterates on its
message for each sender that "sent" a copy originally.
3) try it out, see what happens!
Thanks for all the advice, I really appreciate your input. I'll have another
patch up soon to see if we're heading in the right direction with this...
> Run Length Encoding for Vertex#sendMessageToAllEdges might curb out of
> control message growth in large scale jobs
> -----------------------------------------------------------------------------------------------------------------
>
> Key: GIRAPH-322
> URL: https://issues.apache.org/jira/browse/GIRAPH-322
> Project: Giraph
> Issue Type: Improvement
> Components: bsp
> Affects Versions: 0.2.0
> Reporter: Eli Reisman
> Assignee: Eli Reisman
> Priority: Minor
> Fix For: 0.2.0
>
> Attachments: GIRAPH-322-1.patch, GIRAPH-322-2.patch,
> GIRAPH-322-3.patch
>
>
> Vertex#sendMessageToAllEdges is a case that goes against the grain of the
> data structures and code paths used to transport messages through a Giraph
> application and out on the network. While messages to a single vertex can be
> combined (and should be) in some applications that could make use of this
> broadcast messaging, the out of control message growth of algorithms like
> triangle closing means we need to de-duplicate messages bound for many
> vertices/partitions.
> This will be an evolving solution (this first patch is just the first step)
> and currently it does not present a robust solution for disk-spill message
> stores. I figure I can get some advice about that or it can be a follow-up
> JIRA if this turns out to be a fruitful pursuit. This first patch is also
> Netty-only and simply defaults to the old sendMessagesToAllEdges()
> implementation if USE_NETTY is false. All this can be cleaned up when we know
> this works and/or is worth pursuing.
> The idea is to send as few broadcast messages as possible by run-length
> encoding their delivery and only duplicating message on the network when they
> are bound for different partitions. This is also best when combined with
> "-Dhash.userPartitionCount=# of workers" so you don't do too much of that.
> If this shows promise I will report back and keep working on this. As it is,
> it represents an end-to-end solution, using Netty, for in-memory messaging.
> It won't break with spill to disk, but you do lose the de-duplicating effect.
> More to follow, comments/ideas welcome. I expect this to change a lot as I
> test it and ideas/suggestions crop up.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira