[jira] [Updated] (SPARK-3523) GraphX graph partitioning strategy

Larry Xiao (JIRA) Sun, 14 Sep 2014 19:20:07 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Larry Xiao updated SPARK-3523:
------------------------------
    Description: 
We implemented some algorithms for partitioning on GraphX, and evaluated. And 
find the partitioning has space of improving. Seek opinion and advice.

h5. Motivation
* Graph in real world follow power law. Eg. On twitter 1% of the vertices are 
adjacent to nearly half of the edges.
* For high-degree vertex, one vertex concentrates vast resources. So the 
workload on few high-degree vertex should be decomposed by all machines
*  For low-degree vertex, The computation on one vertex is  quite small. Thus 
should exploit the locality of the computation on low-degree vertex.

h5. Algorithm Description
* HybridCut 
!https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/HybridCut.png|width=360!
* HybridCutPlus 
!https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/HybridCutPlus.png|width=360!
* Greedy BiCut
  * a heuristic algorithm for bipartite

h5. Result
* 
!https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/FactorBalance.png|width=360!
 The left Y axis is replication factor, right axis is the balance (measured 
using CV, coefficient of variation) of either vertices or edges of all 
partitions. The balance of edges can infer computation balance, and the balance 
of vertices can infer communication balance.
* 
!https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/Shuffle.png|width=360!
 This is an example of a balanced partitioning's saving on communication.
* 
!https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/Bipartite.png|width=360!
* in-2.0-1m is a generated power law graph with alpha equals 2.0

h5. Code
* 
https://github.com/larryxiao/spark/blob/GraphX/graphx/src/main/scala/org/apache/spark/graphx/impl/GraphImpl.scala#L173
* Because the implementation breaks the current separation using 
PartitionStrategy.scala, so need to think of a way to support access to graph.

h5. Reference
- Bipartite-oriented Distributed Graph Partitioning for Big Learning.
- PowerLyra : Differentiated Graph Computation and Partitioning on Skewed Graphs

  was:
We implemented some algorithms for partitioning on GraphX, and evaluated. And 
find the partitioning has space of improving. Seek opinion and advice.

h5. Motivation
* Graph in real world follow power law. Eg. On twitter 1% of the vertices are 
adjacent to nearly half of the edges.
* For high-degree vertex, one vertex concentrates vast resources. So the 
workload on few high-degree vertex should be decomposed by all machines
*  For low-degree vertex, The computation on one vertex is  quite small. Thus 
should exploit the locality of the computation on low-degree vertex.

h5. Algorithm Description
* HybridCut 
!https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/HybridCut.png|width=360!
* HybridCutPlus 
!https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/HybridCutPlus.png|width=360!
* Greedy BiCut
  * a heuristic algorithm for bipartite

h5. Result
!https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/FactorBalance.png|width=360!
!https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/Bipartite.png|width=360!
!https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/Shuffle.png|width=360!

h5. Code
* 
https://github.com/larryxiao/spark/blob/GraphX/graphx/src/main/scala/org/apache/spark/graphx/impl/GraphImpl.scala#L173
* Because the implementation breaks the current separation using 
PartitionStrategy.scala, so need to think of a way to support access to graph.

h5. Reference
- Bipartite-oriented Distributed Graph Partitioning for Big Learning.
- PowerLyra : Differentiated Graph Computation and Partitioning on Skewed Graphs


> GraphX graph partitioning strategy
> ----------------------------------
>
>                 Key: SPARK-3523
>                 URL: https://issues.apache.org/jira/browse/SPARK-3523
>             Project: Spark
>          Issue Type: Improvement
>          Components: GraphX
>    Affects Versions: 1.0.2
>            Reporter: Larry Xiao
>
> We implemented some algorithms for partitioning on GraphX, and evaluated. And 
> find the partitioning has space of improving. Seek opinion and advice.
> h5. Motivation
> * Graph in real world follow power law. Eg. On twitter 1% of the vertices are 
> adjacent to nearly half of the edges.
> * For high-degree vertex, one vertex concentrates vast resources. So the 
> workload on few high-degree vertex should be decomposed by all machines
> *  For low-degree vertex, The computation on one vertex is  quite small. Thus 
> should exploit the locality of the computation on low-degree vertex.
> h5. Algorithm Description
> * HybridCut 
> !https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/HybridCut.png|width=360!
> * HybridCutPlus 
> !https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/HybridCutPlus.png|width=360!
> * Greedy BiCut
>   * a heuristic algorithm for bipartite
> h5. Result
> * 
> !https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/FactorBalance.png|width=360!
>  The left Y axis is replication factor, right axis is the balance (measured 
> using CV, coefficient of variation) of either vertices or edges of all 
> partitions. The balance of edges can infer computation balance, and the 
> balance of vertices can infer communication balance.
> * 
> !https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/Shuffle.png|width=360!
>  This is an example of a balanced partitioning's saving on communication.
> * 
> !https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/Bipartite.png|width=360!
> * in-2.0-1m is a generated power law graph with alpha equals 2.0
> h5. Code
> * 
> https://github.com/larryxiao/spark/blob/GraphX/graphx/src/main/scala/org/apache/spark/graphx/impl/GraphImpl.scala#L173
> * Because the implementation breaks the current separation using 
> PartitionStrategy.scala, so need to think of a way to support access to graph.
> h5. Reference
> - Bipartite-oriented Distributed Graph Partitioning for Big Learning.
> - PowerLyra : Differentiated Graph Computation and Partitioning on Skewed 
> Graphs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3523) GraphX graph partitioning strategy

Reply via email to