Takeshi Yamamuro created SPARK-5883: ---------------------------------------
Summary: Add compression scheme in VertexAttributeBlock for shipping vertices to edge partitions Key: SPARK-5883 URL: https://issues.apache.org/jira/browse/SPARK-5883 Project: Spark Issue Type: Improvement Components: GraphX Reporter: Takeshi Yamamuro The size of shipped data between vertex partitions and edge partitions is one of major issues for better performance. SPAR-3649 indicated the ~10% performance gain in Pregel iterations by using the custom serializers for ShuffledRDD. However, it is kind of tough to implement efficient serializers for ShuffledRDD inside GraphX because 1)how to use serializers in ShuffledRDD is different between SortShuffleManager and HashShuffleManager (See SPARK-3649) and 2)the type of 'VD' is unknown to GraphX. Therefore, I think that compressing shippded data inside GraphX (before they are passed into ShuffleRDD) is one of better solutions for that. GraphX users register user-defined serializer for VD, and then GraphX uses the serializer so as to compress shipped data between vertex partitions and edge ones. My current patch applies this idea in ReplicatedVertexView#upgrade and ReplicatedVertexView#updateVertices. Also, it can be applied into ReplicatedVertexView#withActiveSet and VertexRDDImpl#aggregateUsingIndex. I'm not sure that this design is acceptable, so any advice welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org