-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9449/
-----------------------------------------------------------

(Updated Feb. 15, 2013, 1:24 a.m.)


Review request for giraph.


Changes
-------

Refactored with common abstractions for sending edges/messages, as per Maja's 
advice.

ByteArrayVertexIdData extended by ByteArrayVertexId(Messages/Edges)

SendCache extended by Send(Message/Edge)Cache

SendWorkerDataRequest extended by SendWorker(Messages/Edges)Request


Description
-------

This patch adds the following classes:
- SendWorkerEdgesRequest: a request used to send edges during input superstep, 
similar to the corresponding one for messages
- SendEdgeCache: similar to SendMessageCache
- ByteArrayVertexIdEdges: serialized representation for lists of edges (for 
different source vertices), similar to the corresponding one for messages
- EdgeStore: a server-side structure that stores transient edges from incoming 
requests, and later moves them to the owning vertices.
- ByteArrayEdges: an edge list (for the same source vertex) stored as a 
byte-array. The standard way of iterating is by reusing Edge objects, but an 
alternative iterator that instantiates new objects is provided. Depending on 
the vertex implementation, we use one of the other.
This is a refactor of the byte-array code in RepresentativeVertex, which now 
contains an instance of ByteArrayEdges.
When calling setEdges(), RepresentativeVertex is smart to realize that the 
passed Iterable is actually an instance of ByteArrayEdges, and simply takes 
ownership of it (without iterating).
If using something like EdgeListVertex (which keeps references to the passed 
edges), we will use the alternative iterable (this is of course less 
memory-efficient).

I've also renamed RepresentativeVertex to ByteArrayVertex because it was 
misleading (it doesn't need to be used with ByteArrayPartition, it's perfectly 
fine to have multiple Vertex objects, each storing its edges in a byte-array).

Future work:

EdgeStore could become an interface in the future, allowing for different 
implementations (e.g. out-of-core) and handling permanent edge storage in place 
of Vertex. That way, we would have only one Vertex class, and pluggable storage 
implementations (which makes it easier to switch without changing user code).


This addresses bug GIRAPH-515.
    https://issues.apache.org/jira/browse/GIRAPH-515


Diffs (updated)
-----

  
giraph-core/src/main/java/org/apache/giraph/benchmark/ByteArrayVertexPageRankBenchmark.java
 PRE-CREATION 
  
giraph-core/src/main/java/org/apache/giraph/benchmark/MultiGraphByteArrayVertexPageRankBenchmark.java
 PRE-CREATION 
  
giraph-core/src/main/java/org/apache/giraph/benchmark/MultiGraphRepresentativeVertexPageRankBenchmark.java
 96288323e6028e779113d2520ea9edad497bb0e1 
  giraph-core/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 
19b08bdb19df21b1dc56dad2cebb499222f9b19e 
  
giraph-core/src/main/java/org/apache/giraph/benchmark/RepresentativeVertexPageRankBenchmark.java
 331ae41a2c0df6b124cbf33944b05f080b49ce94 
  giraph-core/src/main/java/org/apache/giraph/comm/SendCache.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/comm/SendEdgeCache.java 
PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/comm/SendMessageCache.java 
3cbf0eb4775fa3ff0b0351f247df87783bf05995 
  giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java 
3655d79d8f249338da30ae2bb38b9cfd6b7b1f56 
  
giraph-core/src/main/java/org/apache/giraph/comm/WorkerClientRequestProcessor.java
 0c043e29ae3160bbfc389c435427cf57010a91e1 
  giraph-core/src/main/java/org/apache/giraph/comm/WorkerServer.java 
e60db5529b7fef0b16441ef88df7053d6856ffc5 
  
giraph-core/src/main/java/org/apache/giraph/comm/messages/ByteArrayMessagesPerVertexStore.java
 65caa5d2777b90fa8e14bee7c8d69316d512c651 
  
giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClientRequestProcessor.java
 d4e919ed1aa1f977a2e487531f57b3a2fc0fad47 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java 
1b7cc5410aa4d7e1b9ae4580dd5ed484e09ff7ed 
  giraph-core/src/main/java/org/apache/giraph/comm/requests/RequestType.java 
aac00289f915f61e61334cdcd92c93c1ef3b5419 
  
giraph-core/src/main/java/org/apache/giraph/comm/requests/SendWorkerDataRequest.java
 PRE-CREATION 
  
giraph-core/src/main/java/org/apache/giraph/comm/requests/SendWorkerEdgesRequest.java
 PRE-CREATION 
  
giraph-core/src/main/java/org/apache/giraph/comm/requests/SendWorkerMessagesRequest.java
 641c795521006c460138d6b3b6d9ceb3c3e7eccf 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 
9e129efebe39c42bab9d59b3246055b79cdbdfa3 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 
8797c0e80824558bf544650f7c896bddd3f873fb 
  
giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java
 3e158afdc480656b3937508f5d86ec294bfa3b99 
  giraph-core/src/main/java/org/apache/giraph/graph/EdgeStore.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/partition/ByteArrayPartition.java 
12989180a4aabed19c3aefa52ef38ad6d7aa6851 
  
giraph-core/src/main/java/org/apache/giraph/partition/DiskBackedPartitionStore.java
 844a229096005059e9cd05b5bf213d2afa1d41dd 
  giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayEdges.java 
PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdData.java 
PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdEdges.java 
PRE-CREATION 
  
giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdMessages.java
 dea4229f10224edb30f59626d5987ea840e8a271 
  giraph-core/src/main/java/org/apache/giraph/utils/VertexIdIterator.java 
PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/vertex/ByteArrayVertex.java 
PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/vertex/ByteArrayVertexBase.java 
PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/vertex/EdgeListVertex.java 
9ae692fc00432e28f0b87f11ed5981e600c95019 
  
giraph-core/src/main/java/org/apache/giraph/vertex/MultiGraphByteArrayVertex.java
 PRE-CREATION 
  
giraph-core/src/main/java/org/apache/giraph/vertex/MultiGraphRepresentativeVertex.java
 4733e2a6011ec8e1cc4eef1d2eb61abe777ec310 
  giraph-core/src/main/java/org/apache/giraph/vertex/RepresentativeVertex.java 
f805007b8bb8f89e9388cf89c2e81f92328b2b1c 
  
giraph-core/src/main/java/org/apache/giraph/vertex/RepresentativeVertexBase.java
 4de6ed85b499e74b04e93c3780324a6b9e9f2b83 
  giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java 
fa3ab49f11d61352a5f6f69699375abd2bf1e527 
  
giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java 
bdf9f5705811340748172a70dc952493d5ececfc 
  giraph-core/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 
2845c90cbfd38f2f35e70e3b79767e1386d54a7e 
  giraph-core/src/test/java/org/apache/giraph/comm/RequestTest.java 
d779fe46377eaa8fa2debf0836f975a30ec6e21f 
  giraph-core/src/test/java/org/apache/giraph/utils/MockUtils.java 
82dc2839d83f80ebcf52bad252886d50310eacc5 
  giraph-core/src/test/java/org/apache/giraph/vertex/TestMultiGraphVertex.java 
a5a3545de7dc9e30ab0f30926122049fdbe1173b 
  giraph-core/src/test/java/org/apache/giraph/vertex/TestMutableVertex.java 
ca4ba1a336f68b584c4fdbaf74be60dbe41644d5 

Diff: https://reviews.apache.org/r/9449/diff/


Testing
-------

mvn verify

Tested on both benchmarks and real-world applications.
This typically brings requirements down a lot: in an application using a few 
hundred billion edges, which previously only ran with 300 workers, we're now 
able to run with 100 workers, with a lot of memory to spare and even faster 
than before (from around 600s to 400s).


Thanks,

Alessandro Presta

Reply via email to