Andrew Ray created SPARK-8718:
---------------------------------

             Summary: Improve EdgePartition2D for non perfect square number of 
partitions
                 Key: SPARK-8718
                 URL: https://issues.apache.org/jira/browse/SPARK-8718
             Project: Spark
          Issue Type: Improvement
          Components: GraphX
            Reporter: Andrew Ray
            Priority: Minor


The current implementation of EdgePartition2D has a major limitation:

bq. One of the limitations of this approach is that the number of machines must 
either be a perfect square. We partially address this limitation by computing 
the machine assignment to the next largest perfect square and then mapping back 
down to the actual number of machines. Unfortunately, this can also lead to 
work imbalance and so it is suggested that a perfect square is used.

To remove this limitation I'm proposing the following code change. It allows us 
to partition into any number of evenly sized bins while maintaining the 
property that any vertex will only need to be replicated at most 2 * 
sqrt(numParts) times. To maintain current behavior for perfect squares we use 
the old algorithm in that case, although this could be removed if we dont care 
about producing the exact same result.

See this IPython notebook for a visualization of what is being proposed 
[https://github.com/aray/e2d/blob/master/EdgePartition2D.ipynb] and download it 
to interactively change the number of partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to