[ https://issues.apache.org/jira/browse/GIRAPH-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455093#comment-13455093 ]
Eli Reisman commented on GIRAPH-314: ------------------------------------ No problem, I welcome the input. The combiner is not needed at the beginning or is just an extra step once at the sending side, because we just combined the messages using IntArrayListWritable instead of many IntWritables right from the get go. From the receiver side, combiners don't help us much because we still have incredible amounts of extra messages coming in over Netty all the time as long as the are serialized and de-serialized organized around Partition -> vertexid -> List<M> and thats what GIRAPH-322 addresses. As for the message limiting, as long as the sender does not keep iterating on compute() and we don't overwhelm the sender that way, its a great idea. But once we serialize-deserialize to disk or anywhere else, we lose the single reference to each message and we get back individual objects, which then have to be put into a sender-side combiner or other extra plumbing, or just sent out duplicated on Netty. And we're talking about degree(V)^2 messages for all V in G(V) so its a lot to churn through in one superstep. The amortizing is fast and by avoiding the disk we leave the possibility for GIRAPH-322 to manage the message growth without serializing-deserializing and ending up with a bunch of instances to send over the wire again or random access on the disk. So I'm not conviced 314 + 322 are a good alternative, but they seem worth exploring at this point. If it turns out the only way to make large jobs on an application like 314 run to completion is to focus on spill to disk entirely, I will certainly embrace that route. > Implement better message grouping to improve performance in > SimpleTriangleClosingVertex > --------------------------------------------------------------------------------------- > > Key: GIRAPH-314 > URL: https://issues.apache.org/jira/browse/GIRAPH-314 > Project: Giraph > Issue Type: Improvement > Components: examples > Affects Versions: 0.2.0 > Reporter: Eli Reisman > Assignee: Eli Reisman > Priority: Trivial > Fix For: 0.2.0 > > Attachments: GIRAPH-314-1.patch, GIRAPH-314-2.patch, > GIRAPH-314-3.patch, GIRAPH-314-4.patch > > > After running SimpleTriangleClosingVertex at scale I'm thinking the > sendMessageToAllEdges() is pretty in the code, but its not a good idea in > practice since each vertex V sends degree(V)^2 messages right in the first > superset in this algorithm. Could do something with a combiner etc. but just > grouping messages by hand at the application level by using > IntArrayListWritable again does the trick fine. > Probably should have just done it this way before, but > sendMessageToAllEdges() looked so nice. Sigh. Changed unit tests to reflect > this new approach, passes mvn verify and cluster, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira