-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12252/
-----------------------------------------------------------

Review request for giraph.


Bugs: GIRAPH-704
    https://issues.apache.org/jira/browse/GIRAPH-704


Repository: giraph-git


Description
-------

I was investigating with where the time/CPU is going in some applications, and 
processing messages turned out to be one of the most expensive things we do. We 
should provide better implementations using primitive maps whenever that's 
possible.

Here are some results of page rank benchmark, using 40 workers, 100m vertices 
and 1k edges per vertex (2.5b edges per worker).
* Current code, with combiner: superstep 75s, 265m cpu ms
* IntFloatMessageStore: superstep 55s, 185cpu ms
* Current code, without combiner: superstep 120s, 415m cpu ms
* IndByteArrayMessageStore: superstep 108s, 355m cpu ms
(I was running for 3 supersteps, when run with 0 supersteps it takes 26m cpu 
ms, so this should be subtracted from all the numbers to get fair comparison)
So IntFloatMessageStore is about 35% cpu and 25% elapse time savings, 
IndByteArrayMessageStore 15% cpu and 10% time. On real huge graph, with 
LongDoubleMessageStore speedup was similar, with LongByteArrayMessageStore even 
a bit better.
Also note that using combiner is much worse, we do have additional 
serialization/deserialization there, but I am not sure that's enough to justify 
this huge difference. I tried sizing all the buffers properly, it didn't help. 
Will do more investigation around this later.

I implemented this in a way that infrastructure chooses appropriate message 
store based on vertex id, message type and combiner. We could also have an 
option, but this becomes trickier with switchable computations and combiner. 
We'd have to add a function to switch message store too, I'd rather wait to 
come up with a better solution to be able to switch things in configuration in 
general, without adding specific methods for each.


Diffs
-----

  giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 
4b0f985 
  giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java affc260 
  
giraph-core/src/main/java/org/apache/giraph/comm/messages/ByteArrayMessagesPerVertexStore.java
 fecd7ee 
  
giraph-core/src/main/java/org/apache/giraph/comm/messages/InMemoryMessageStoreFactory.java
 ba8a005 
  
giraph-core/src/main/java/org/apache/giraph/comm/messages/out_of_core/PartitionDiskBackedMessageStore.java
 4ae805a 
  
giraph-core/src/main/java/org/apache/giraph/comm/messages/primitives/IntByteArrayMessageStore.java
 PRE-CREATION 
  
giraph-core/src/main/java/org/apache/giraph/comm/messages/primitives/IntFloatMessageStore.java
 PRE-CREATION 
  
giraph-core/src/main/java/org/apache/giraph/comm/messages/primitives/LongByteArrayMessageStore.java
 PRE-CREATION 
  
giraph-core/src/main/java/org/apache/giraph/comm/messages/primitives/LongDoubleMessageStore.java
 PRE-CREATION 
  
giraph-core/src/main/java/org/apache/giraph/comm/messages/primitives/package-info.java
 PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdData.java 
9b3f165 
  giraph-core/src/main/java/org/apache/giraph/utils/WritableUtils.java c78d717 
  giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java 
89b6f9e 
  giraph-core/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 
35e6362 
  giraph-core/src/test/java/org/apache/giraph/comm/RequestTest.java c8c09df 
  
giraph-core/src/test/java/org/apache/giraph/comm/messages/TestIntFloatPrimitiveMessageStores.java
 PRE-CREATION 
  
giraph-core/src/test/java/org/apache/giraph/comm/messages/TestLongDoublePrimitiveMessageStores.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/12252/diff/


Testing
-------

Passes mvn clean verify, added tests for new stores.
Tested on real large graph, with many compute and netty threads, verified that 
results are the same (for both LongDoubleMessageStore and 
LongByteArrayMessageStore).


Thanks,

Maja Kabiljo

Reply via email to