[
https://issues.apache.org/jira/browse/GIRAPH-322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453941#comment-13453941
]
Maja Kabiljo commented on GIRAPH-322:
-------------------------------------
So it works now? Please share results from the runs when you get them.
For limiting the number of open requests, it works like this: whenever we send
a request we check how many requests we have for which we haven't received
reply yet - if that number is above the limit we'll just wait there. So yes,
vertex.compute execution will be paused. Since we send reply only after a
request is processed, this way we are also limiting the number of unprocessed
requests on the receiving side. To use this option you need to set the
following two parameters:
giraph.waitForRequestsConfirmation=true
giraph.maxNumberOfOpenRequests= your_limit
I agree with everything you said about making this solution work with
out-of-core messaging. With the current way there is duplication, and just
keeping references would hurt performance too much because of random accesses.
I think we'll have to figure out some clever solution in between. But let's
first see how this works with in-core stuff, and we can get to out-of-core
later.
Now kind of unrelated to this patch, but more to GIRAPH-314, and your problem
size. So you are saying that you are able to keep in memory all the maps from
second degree neighbours to number of paths to them. And you were also able to
transfer all the messages before this solution, just by using amortization. So
the problem at the first place was not transferring all the messages, it's
keeping all unprocessed messages in the memory at the same time? Or am I
getting this wrong? If you fit above mentioned maps in memory, I guess the
number of edges per worker is not big? This patch is something very useful for
a certain kind of problems: when the messages are big objects and we don't have
a smart combiner and when we have a lot of neighbours per vertex - much more
than the number of workers (unless you have some smart partitioning strategy).
Just trying to get the sense why this is the best approach for your particular
problem, not discussing the general idea here.
> Run Length Encoding for Vertex#sendMessageToAllEdges might curb out of
> control message growth in large scale jobs
> -----------------------------------------------------------------------------------------------------------------
>
> Key: GIRAPH-322
> URL: https://issues.apache.org/jira/browse/GIRAPH-322
> Project: Giraph
> Issue Type: Improvement
> Components: bsp
> Affects Versions: 0.2.0
> Reporter: Eli Reisman
> Assignee: Eli Reisman
> Priority: Minor
> Fix For: 0.2.0
>
> Attachments: GIRAPH-322-1.patch, GIRAPH-322-2.patch,
> GIRAPH-322-3.patch
>
>
> Vertex#sendMessageToAllEdges is a case that goes against the grain of the
> data structures and code paths used to transport messages through a Giraph
> application and out on the network. While messages to a single vertex can be
> combined (and should be) in some applications that could make use of this
> broadcast messaging, the out of control message growth of algorithms like
> triangle closing means we need to de-duplicate messages bound for many
> vertices/partitions.
> This will be an evolving solution (this first patch is just the first step)
> and currently it does not present a robust solution for disk-spill message
> stores. I figure I can get some advice about that or it can be a follow-up
> JIRA if this turns out to be a fruitful pursuit. This first patch is also
> Netty-only and simply defaults to the old sendMessagesToAllEdges()
> implementation if USE_NETTY is false. All this can be cleaned up when we know
> this works and/or is worth pursuing.
> The idea is to send as few broadcast messages as possible by run-length
> encoding their delivery and only duplicating message on the network when they
> are bound for different partitions. This is also best when combined with
> "-Dhash.userPartitionCount=# of workers" so you don't do too much of that.
> If this shows promise I will report back and keep working on this. As it is,
> it represents an end-to-end solution, using Netty, for in-memory messaging.
> It won't break with spill to disk, but you do lose the de-duplicating effect.
> More to follow, comments/ideas welcome. I expect this to change a lot as I
> test it and ideas/suggestions crop up.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira