-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9449/#review16604
-----------------------------------------------------------


This is great, and results are promising. Reusing the ideas from improvements 
made to messaging part is something we should have done some time ago. Just to 
clarify, for mutations we still have the old code there?

I didn't go through the whole patch yet, but initial felling is that we have a 
lot of code duplication here. For example SendEdgeCache is just a copy of 
SendMessageCache, replacing M with E and ByteArrayVertexIdMessages with 
ByteArrayVertexIdEdges (which are also very similar).

- Maja Kabiljo


On Feb. 14, 2013, 6:34 p.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9449/
> -----------------------------------------------------------
> 
> (Updated Feb. 14, 2013, 6:34 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> This patch adds the following classes:
> - SendWorkerEdgesRequest: a request used to send edges during input 
> superstep, similar to the corresponding one for messages
> - SendEdgeCache: similar to SendMessageCache
> - ByteArrayVertexIdEdges: serialized representation for lists of edges (for 
> different source vertices), similar to the corresponding one for messages
> - EdgeStore: a server-side structure that stores transient edges from 
> incoming requests, and later moves them to the owning vertices.
> - ByteArrayEdges: an edge list (for the same source vertex) stored as a 
> byte-array. The standard way of iterating is by reusing Edge objects, but an 
> alternative iterator that instantiates new objects is provided. Depending on 
> the vertex implementation, we use one of the other.
> This is a refactor of the byte-array code in RepresentativeVertex, which now 
> contains an instance of ByteArrayEdges.
> When calling setEdges(), RepresentativeVertex is smart to realize that the 
> passed Iterable is actually an instance of ByteArrayEdges, and simply takes 
> ownership of it (without iterating).
> If using something like EdgeListVertex (which keeps references to the passed 
> edges), we will use the alternative iterable (this is of course less 
> memory-efficient).
> 
> I've also renamed RepresentativeVertex to ByteArrayVertex because it was 
> misleading (it doesn't need to be used with ByteArrayPartition, it's 
> perfectly fine to have multiple Vertex objects, each storing its edges in a 
> byte-array).
> 
> Future work:
> 
> EdgeStore could become an interface in the future, allowing for different 
> implementations (e.g. out-of-core) and handling permanent edge storage in 
> place of Vertex. That way, we would have only one Vertex class, and pluggable 
> storage implementations (which makes it easier to switch without changing 
> user code).
> 
> 
> This addresses bug GIRAPH-515.
>     https://issues.apache.org/jira/browse/GIRAPH-515
> 
> 
> Diffs
> -----
> 
>   
> giraph-core/src/main/java/org/apache/giraph/benchmark/ByteArrayVertexPageRankBenchmark.java
>  PRE-CREATION 
>   
> giraph-core/src/main/java/org/apache/giraph/benchmark/MultiGraphByteArrayVertexPageRankBenchmark.java
>  PRE-CREATION 
>   
> giraph-core/src/main/java/org/apache/giraph/benchmark/MultiGraphRepresentativeVertexPageRankBenchmark.java
>  96288323e6028e779113d2520ea9edad497bb0e1 
>   
> giraph-core/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 
> 19b08bdb19df21b1dc56dad2cebb499222f9b19e 
>   
> giraph-core/src/main/java/org/apache/giraph/benchmark/RepresentativeVertexPageRankBenchmark.java
>  331ae41a2c0df6b124cbf33944b05f080b49ce94 
>   giraph-core/src/main/java/org/apache/giraph/comm/SendEdgeCache.java 
> PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java 
> 3655d79d8f249338da30ae2bb38b9cfd6b7b1f56 
>   
> giraph-core/src/main/java/org/apache/giraph/comm/WorkerClientRequestProcessor.java
>  0c043e29ae3160bbfc389c435427cf57010a91e1 
>   giraph-core/src/main/java/org/apache/giraph/comm/WorkerServer.java 
> e60db5529b7fef0b16441ef88df7053d6856ffc5 
>   
> giraph-core/src/main/java/org/apache/giraph/comm/messages/ByteArrayMessagesPerVertexStore.java
>  65caa5d2777b90fa8e14bee7c8d69316d512c651 
>   
> giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClientRequestProcessor.java
>  d4e919ed1aa1f977a2e487531f57b3a2fc0fad47 
>   
> giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java 
> 1b7cc5410aa4d7e1b9ae4580dd5ed484e09ff7ed 
>   giraph-core/src/main/java/org/apache/giraph/comm/requests/RequestType.java 
> aac00289f915f61e61334cdcd92c93c1ef3b5419 
>   
> giraph-core/src/main/java/org/apache/giraph/comm/requests/SendWorkerEdgesRequest.java
>  PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 
> 9e129efebe39c42bab9d59b3246055b79cdbdfa3 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 
> 8797c0e80824558bf544650f7c896bddd3f873fb 
>   
> giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java
>  3e158afdc480656b3937508f5d86ec294bfa3b99 
>   giraph-core/src/main/java/org/apache/giraph/graph/EdgeStore.java 
> PRE-CREATION 
>   
> giraph-core/src/main/java/org/apache/giraph/partition/ByteArrayPartition.java 
> 12989180a4aabed19c3aefa52ef38ad6d7aa6851 
>   
> giraph-core/src/main/java/org/apache/giraph/partition/DiskBackedPartitionStore.java
>  844a229096005059e9cd05b5bf213d2afa1d41dd 
>   giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayEdges.java 
> PRE-CREATION 
>   
> giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdEdges.java 
> PRE-CREATION 
>   
> giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdMessages.java
>  dea4229f10224edb30f59626d5987ea840e8a271 
>   giraph-core/src/main/java/org/apache/giraph/utils/VertexIdIterator.java 
> PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/vertex/ByteArrayVertex.java 
> PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/vertex/ByteArrayVertexBase.java 
> PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/vertex/EdgeListVertex.java 
> 9ae692fc00432e28f0b87f11ed5981e600c95019 
>   
> giraph-core/src/main/java/org/apache/giraph/vertex/MultiGraphByteArrayVertex.java
>  PRE-CREATION 
>   
> giraph-core/src/main/java/org/apache/giraph/vertex/MultiGraphRepresentativeVertex.java
>  4733e2a6011ec8e1cc4eef1d2eb61abe777ec310 
>   
> giraph-core/src/main/java/org/apache/giraph/vertex/RepresentativeVertex.java 
> f805007b8bb8f89e9388cf89c2e81f92328b2b1c 
>   
> giraph-core/src/main/java/org/apache/giraph/vertex/RepresentativeVertexBase.java
>  4de6ed85b499e74b04e93c3780324a6b9e9f2b83 
>   giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java 
> fa3ab49f11d61352a5f6f69699375abd2bf1e527 
>   
> giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java
>  bdf9f5705811340748172a70dc952493d5ececfc 
>   giraph-core/src/test/java/org/apache/giraph/utils/MockUtils.java 
> 82dc2839d83f80ebcf52bad252886d50310eacc5 
>   
> giraph-core/src/test/java/org/apache/giraph/vertex/TestMultiGraphVertex.java 
> a5a3545de7dc9e30ab0f30926122049fdbe1173b 
>   giraph-core/src/test/java/org/apache/giraph/vertex/TestMutableVertex.java 
> ca4ba1a336f68b584c4fdbaf74be60dbe41644d5 
> 
> Diff: https://reviews.apache.org/r/9449/diff/
> 
> 
> Testing
> -------
> 
> mvn verify
> 
> Tested on both benchmarks and real-world applications.
> This typically brings requirements down a lot: in an application using a few 
> hundred billion edges, which previously only ran with 300 workers, we're now 
> able to run with 100 workers, with a lot of memory to spare and even faster 
> than before (from around 600s to 400s).
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>

Reply via email to