Eli Reisman created GIRAPH-328:
----------------------------------
Summary: Outgoing messages from current superstep should be
grouped at the sender by owning worker, not by partition
Key: GIRAPH-328
URL: https://issues.apache.org/jira/browse/GIRAPH-328
Project: Giraph
Issue Type: Improvement
Components: bsp, graph
Affects Versions: 0.2.0
Reporter: Eli Reisman
Assignee: Eli Reisman
Priority: Minor
Fix For: 0.2.0
Currently, outgoing messages created by the Vertex#compute() cycle on each
worker are stored and grouped by the partitionId on the destination worker to
which the messages belong. This results in messages being duplicated on the
wire per partition on a given receiving worker that has delivery vertices for
those messages.
By partitioning the outgoing, current-superstep messages by destination worker,
we can split them into partitions at insertion into a MessageStore on the
destination worker. What we trade in come compute time while inserting at the
receiver side, we gain in fine grained control over the real number of messages
each worker caches outbound for any given worker before flushing, and how those
flush messages are aggregated for delivery as well. Potentially, it allows for
a great reduction in duplicate messages sent in situations like
Vertex#sendMessageToAllEdges() -- see GIRAPH-322, GIRAPH-314. You get the idea.
This might be a poor idea, and it can certainly use some additional refinement,
but it passes mvn verify and may even run ;) It interoperates with the disk
spill code, but not as well as it could. Consider this a request for comment on
the idea (and the approach) rather than a finished product.
Comments/ideas/help welcome! Thanks
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira