[ https://issues.apache.org/jira/browse/GIRAPH-328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eli Reisman updated GIRAPH-328: ------------------------------- Attachment: GIRAPH-328-4.patch Just a placeholder. Nearly done with the changes Avery asked for, I think I may have also fixed the partition-lookup bottleneck he mentioned on the way. However, when I don't look up partitions on the request receiver's end, I get a failure in the RequestFailureTest(s): Anyone know what the deal is? I'll keep working on this. {code} send2Requests(org.apache.giraph.comm.RequestFailureTest) Time elapsed: 0.591 sec <<< FAILURE! java.lang.AssertionError: expected:<70> but was:<0> at org.junit.Assert.fail(Assert.java:58) at org.junit.Assert.failNotEquals(Assert.java:259) at org.junit.Assert.assertEquals(Assert.java:80) at org.junit.Assert.assertEquals(Assert.java:88) at org.apache.giraph.comm.RequestFailureTest.checkResult(RequestFailureTest.java:137) at org.apache.giraph.comm.RequestFailureTest.send2Requests(RequestFailureTest.java:165) {code} > Outgoing messages from current superstep should be grouped at the sender by > owning worker, not by partition > ----------------------------------------------------------------------------------------------------------- > > Key: GIRAPH-328 > URL: https://issues.apache.org/jira/browse/GIRAPH-328 > Project: Giraph > Issue Type: Improvement > Components: bsp, graph > Affects Versions: 0.2.0 > Reporter: Eli Reisman > Assignee: Eli Reisman > Priority: Minor > Fix For: 0.2.0 > > Attachments: GIRAPH-328-1.patch, GIRAPH-328-2.patch, > GIRAPH-328-3.patch, GIRAPH-328-4.patch > > > Currently, outgoing messages created by the Vertex#compute() cycle on each > worker are stored and grouped by the partitionId on the destination worker to > which the messages belong. This results in messages being duplicated on the > wire per partition on a given receiving worker that has delivery vertices for > those messages. > By partitioning the outgoing, current-superstep messages by destination > worker, we can split them into partitions at insertion into a MessageStore on > the destination worker. What we trade in come compute time while inserting at > the receiver side, we gain in fine grained control over the real number of > messages each worker caches outbound for any given worker before flushing, > and how those flush messages are aggregated for delivery as well. > Potentially, it allows for a great reduction in duplicate messages sent in > situations like Vertex#sendMessageToAllEdges() -- see GIRAPH-322, GIRAPH-314. > You get the idea. > This might be a poor idea, and it can certainly use some additional > refinement, but it passes mvn verify and may even run ;) It interoperates > with the disk spill code, but not as well as it could. Consider this a > request for comment on the idea (and the approach) rather than a finished > product. > Comments/ideas/help welcome! Thanks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira