GitHub user weiwee opened a pull request: https://github.com/apache/spark/pull/20821
[SPARK-23678][GraphX] a more efficient partition strategy ## What changes were proposed in this pull request? add a new partition strategy with several advantage: 1. nicer bound on vertex replication, sqrt(2 * numParts), which is about 23% reducing compare with EdgePartition2D partition strategy, which has bound 2 * sqrt(numParts). This reduce the shuffle size in several operation such as aggregateMessage and triplets. 2. colocate all edges between two vertices regardless of direction. 3. same work balance compared with EdgePartition2D ## How was this patch tested? manual tests, see [https://github.com/weiwee/edgePartitionTri/blob/master/EdgePartitionTriangle.ipynb](url) You can merge this pull request into a Git repository by running: $ git pull https://github.com/weiwee/spark edge-partition-triangle Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20821.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20821 ---- commit 05df5c809f91c59e45bb22411a8a5828f3e30512 Author: wenbinwei <wenbinwei@...> Date: 2018-03-14T07:28:18Z add new partition strategy: EdgePartitionTriangle commit 200b1716fe90604f8068ba5309c7673e5586b1cd Author: wenbinwei <wenbinwei@...> Date: 2018-03-14T07:30:48Z add case clause EdgePartitionTriangle to method fromString ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org