GitHub user weiwee opened a pull request:

    https://github.com/apache/spark/pull/20821

    [SPARK-23678][GraphX]  a more efficient partition strategy

    ## What changes were proposed in this pull request?
    
    add a new partition strategy with several advantage:
    
    1. nicer bound on vertex replication, sqrt(2 * numParts), which is about 
23% reducing compare with EdgePartition2D  partition strategy, which has bound 
2 * sqrt(numParts). This reduce the shuffle size in several operation such as 
aggregateMessage and triplets.
    2. colocate all edges between two vertices regardless of direction. 
    3. same work balance compared with EdgePartition2D  
    
    ## How was this patch tested?
    
    manual tests, see 
[https://github.com/weiwee/edgePartitionTri/blob/master/EdgePartitionTriangle.ipynb](url)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/weiwee/spark edge-partition-triangle

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20821.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20821
    
----
commit 05df5c809f91c59e45bb22411a8a5828f3e30512
Author: wenbinwei <wenbinwei@...>
Date:   2018-03-14T07:28:18Z

    add new partition strategy: EdgePartitionTriangle

commit 200b1716fe90604f8068ba5309c7673e5586b1cd
Author: wenbinwei <wenbinwei@...>
Date:   2018-03-14T07:30:48Z

    add case clause EdgePartitionTriangle to method fromString

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to