[ 
https://issues.apache.org/jira/browse/GIRAPH-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471936#comment-13471936
 ] 

Eli Reisman commented on GIRAPH-357:
------------------------------------

I have not used the combining features too much. If you are combining messages 
at the sender side, the benefit should come only if the messages in 
SendMessageCache build up long enough to have enough queued to the same 
destination that its worth combining them, right? The more often you send a 
burst of cached messages, the less often the combiner is going to build up 
enough messages to actually have a few to combine that actually saves us some 
resources? And what sort of combining operation this is, and the nature of the 
messages getting into the cache that may or may not be combinable is different 
for different algorithms?

So, in my naive view of combining, it seems:

1. Running the combiner function on bundles of outgoing messages from the cache 
to a given worker might need to be tuned per-application?

2. Running it below some threshold of # of messages-per-outgoing-cache-bundle 
will always be silly/ineffecient, such as combining on every 1-messsage send. 
BTW: when do we ever (in the current form) send just one message at a time? It 
seems like this could only happen on the final flush of the cache at the end of 
a superstep?

So...would this be something we would tune with the "# of cached messages 
per-worker before flushing cache" GiraphConfiguration dash-D option, per 
application, rather than in code, assuming this algorithm needs a client-side 
combiner? If we always send X number of messages, the combiner should always 
have X or so to work with in the hopes of matching and reducing a few before 
serialization?

When you say "serialize" do you mean on the network, or spill to disk for later 
sending at the end of the superstep? I'm assuming the former? One of the things 
I have been very aware of during GIRAPH-328/322 is the fact that its one thing 
to carefully keep a single reference to something on the send side (for 
example) but its entirely another to innocently serialize it and end up with N 
unique copies at the far end of the deserialization. Is there some overarching 
idea here about how to minimize this? One thing I like about the idea (just the 
idea so far!) of 322 is that on both the client and recv sides, the original 
message reference can be shared without N copies being created during ser/deser.

                
> Don't try to combine if there is only one message
> -------------------------------------------------
>
>                 Key: GIRAPH-357
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-357
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-357.patch
>
>
> In SendMessageCache, we call combiner even if we have just one message. 
> Combining is kind of expensive since we recreate the message object and the 
> list. With default settings and bigger graph, for PageRankBenchmark there is 
> 10-15% superstep speedup if we don't call it when we have a single message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to