[ 
https://issues.apache.org/jira/browse/GIRAPH-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455093#comment-13455093
 ] 

Eli Reisman commented on GIRAPH-314:
------------------------------------

No problem, I welcome the input. The combiner is not needed at the beginning or 
is just an extra step once at the sending side, because we just combined the 
messages using IntArrayListWritable instead of many IntWritables right from the 
get go. From the receiver side, combiners don't help us much because we still 
have incredible amounts of extra messages coming in over Netty all the time as 
long as the are serialized and de-serialized organized around Partition -> 
vertexid -> List<M> and thats what GIRAPH-322 addresses.

As for the message limiting, as long as the sender does not keep iterating on 
compute() and we don't overwhelm the sender that way, its a great idea. But 
once we serialize-deserialize to disk or anywhere else, we lose the single 
reference to each message and we get back individual objects, which then have 
to be put into a sender-side combiner or other extra plumbing, or just sent out 
duplicated on Netty. And we're talking about degree(V)^2 messages for all V in 
G(V) so its a lot to churn through in one superstep. The amortizing is fast and 
by avoiding the disk we leave the possibility for GIRAPH-322 to manage the 
message growth without serializing-deserializing and ending up with a bunch of 
instances to send over the wire again or random access on the disk. So I'm not 
conviced 314 + 322 are a good alternative, but they seem worth exploring at 
this point. If it turns out the only way to make large jobs on an application 
like 314 run to completion is to focus on spill to disk entirely, I will 
certainly embrace that route.



                
> Implement better message grouping to improve performance in 
> SimpleTriangleClosingVertex
> ---------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-314
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-314
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Trivial
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-314-1.patch, GIRAPH-314-2.patch, 
> GIRAPH-314-3.patch, GIRAPH-314-4.patch
>
>
> After running SimpleTriangleClosingVertex at scale I'm thinking the 
> sendMessageToAllEdges() is pretty in the code, but its not a good idea in 
> practice since each vertex V sends degree(V)^2 messages right in the first 
> superset in this algorithm. Could do something with a combiner etc. but just 
> grouping messages by hand at the application level by using 
> IntArrayListWritable again does the trick fine.
> Probably should have just done it this way before, but 
> sendMessageToAllEdges() looked so nice. Sigh. Changed unit tests to reflect 
> this new approach, passes mvn verify and cluster, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to