-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12252/
-----------------------------------------------------------
Review request for giraph.
Bugs: GIRAPH-704
https://issues.apache.org/jira/browse/GIRAPH-704
Repository: giraph-git
Description
-------
I was investigating with where the time/CPU is going in some applications, and
processing messages turned out to be one of the most expensive things we do. We
should provide better implementations using primitive maps whenever that's
possible.
Here are some results of page rank benchmark, using 40 workers, 100m vertices
and 1k edges per vertex (2.5b edges per worker).
* Current code, with combiner: superstep 75s, 265m cpu ms
* IntFloatMessageStore: superstep 55s, 185cpu ms
* Current code, without combiner: superstep 120s, 415m cpu ms
* IndByteArrayMessageStore: superstep 108s, 355m cpu ms
(I was running for 3 supersteps, when run with 0 supersteps it takes 26m cpu
ms, so this should be subtracted from all the numbers to get fair comparison)
So IntFloatMessageStore is about 35% cpu and 25% elapse time savings,
IndByteArrayMessageStore 15% cpu and 10% time. On real huge graph, with
LongDoubleMessageStore speedup was similar, with LongByteArrayMessageStore even
a bit better.
Also note that using combiner is much worse, we do have additional
serialization/deserialization there, but I am not sure that's enough to justify
this huge difference. I tried sizing all the buffers properly, it didn't help.
Will do more investigation around this later.
I implemented this in a way that infrastructure chooses appropriate message
store based on vertex id, message type and combiner. We could also have an
option, but this becomes trickier with switchable computations and combiner.
We'd have to add a function to switch message store too, I'd rather wait to
come up with a better solution to be able to switch things in configuration in
general, without adding specific methods for each.
Diffs
-----
giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
4b0f985
giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java affc260
giraph-core/src/main/java/org/apache/giraph/comm/messages/ByteArrayMessagesPerVertexStore.java
fecd7ee
giraph-core/src/main/java/org/apache/giraph/comm/messages/InMemoryMessageStoreFactory.java
ba8a005
giraph-core/src/main/java/org/apache/giraph/comm/messages/out_of_core/PartitionDiskBackedMessageStore.java
4ae805a
giraph-core/src/main/java/org/apache/giraph/comm/messages/primitives/IntByteArrayMessageStore.java
PRE-CREATION
giraph-core/src/main/java/org/apache/giraph/comm/messages/primitives/IntFloatMessageStore.java
PRE-CREATION
giraph-core/src/main/java/org/apache/giraph/comm/messages/primitives/LongByteArrayMessageStore.java
PRE-CREATION
giraph-core/src/main/java/org/apache/giraph/comm/messages/primitives/LongDoubleMessageStore.java
PRE-CREATION
giraph-core/src/main/java/org/apache/giraph/comm/messages/primitives/package-info.java
PRE-CREATION
giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdData.java
9b3f165
giraph-core/src/main/java/org/apache/giraph/utils/WritableUtils.java c78d717
giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java
89b6f9e
giraph-core/src/test/java/org/apache/giraph/comm/RequestFailureTest.java
35e6362
giraph-core/src/test/java/org/apache/giraph/comm/RequestTest.java c8c09df
giraph-core/src/test/java/org/apache/giraph/comm/messages/TestIntFloatPrimitiveMessageStores.java
PRE-CREATION
giraph-core/src/test/java/org/apache/giraph/comm/messages/TestLongDoublePrimitiveMessageStores.java
PRE-CREATION
Diff: https://reviews.apache.org/r/12252/diff/
Testing
-------
Passes mvn clean verify, added tests for new stores.
Tested on real large graph, with many compute and netty threads, verified that
results are the same (for both LongDoubleMessageStore and
LongByteArrayMessageStore).
Thanks,
Maja Kabiljo