Takeshi Yamamuro created SPARK-5883:
---------------------------------------

             Summary: Add compression scheme in VertexAttributeBlock for 
shipping vertices to edge partitions
                 Key: SPARK-5883
                 URL: https://issues.apache.org/jira/browse/SPARK-5883
             Project: Spark
          Issue Type: Improvement
          Components: GraphX
            Reporter: Takeshi Yamamuro


The size of shipped data between vertex partitions and edge partitions
is one of major issues for better performance.
SPAR-3649 indicated the ~10% performance gain in Pregel iterations
by using the custom serializers for ShuffledRDD.

However, it is kind of tough to implement efficient serializers for ShuffledRDD
inside GraphX because 1)how to use serializers in ShuffledRDD is different
between SortShuffleManager and HashShuffleManager (See SPARK-3649)
and 2)the type of 'VD' is unknown to GraphX.

Therefore, I think that compressing shippded data inside GraphX
(before they are passed into ShuffleRDD) is one of better solutions for that.
GraphX users register user-defined serializer for VD, and then
GraphX uses the serializer so as to compress shipped data between
vertex partitions and edge ones.

My current patch applies this idea in ReplicatedVertexView#upgrade
and ReplicatedVertexView#updateVertices.
Also, it can be applied into ReplicatedVertexView#withActiveSet
and VertexRDDImpl#aggregateUsingIndex.

I'm not sure that this design is acceptable, so any advice welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to