This is an automated email from the ASF dual-hosted git repository. fmcquillan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/madlib.git
The following commit(s) were added to refs/heads/master by this push: new 874d189 add comment to graph user docs to distribute edge table by source vertex id 874d189 is described below commit 874d1892c5e35436c6e5bfc46ad9983a6587b159 Author: Frank McQuillan <fmcquil...@pivotal.io> AuthorDate: Fri May 17 14:10:30 2019 -0700 add comment to graph user docs to distribute edge table by source vertex id --- src/ports/postgres/modules/graph/apsp.sql_in | 2 ++ src/ports/postgres/modules/graph/bfs.sql_in | 3 +++ src/ports/postgres/modules/graph/hits.sql_in | 3 +++ src/ports/postgres/modules/graph/pagerank.sql_in | 3 +++ src/ports/postgres/modules/graph/sssp.sql_in | 3 +++ src/ports/postgres/modules/graph/wcc.sql_in | 5 +++-- 6 files changed, 17 insertions(+), 2 deletions(-) diff --git a/src/ports/postgres/modules/graph/apsp.sql_in b/src/ports/postgres/modules/graph/apsp.sql_in index c7bf210..7cd77d3 100644 --- a/src/ports/postgres/modules/graph/apsp.sql_in +++ b/src/ports/postgres/modules/graph/apsp.sql_in @@ -55,6 +55,8 @@ for this implementation is O(V^2 * E) where V is the number of vertices and E is the number of edges. In practice, run-time will be generally be much less than this, but it depends on the graph. +On a Greenplum cluster, the edge table should be distributed +by the source vertex id column for better performance. @anchor apsp @par APSP diff --git a/src/ports/postgres/modules/graph/bfs.sql_in b/src/ports/postgres/modules/graph/bfs.sql_in index c1c27fe..ea991fa 100644 --- a/src/ports/postgres/modules/graph/bfs.sql_in +++ b/src/ports/postgres/modules/graph/bfs.sql_in @@ -130,6 +130,9 @@ and a single BFS result is generated. </dl> +@note On a Greenplum cluster, the edge table should be distributed +by the source vertex id column for better performance. + @anchor notes @par Notes diff --git a/src/ports/postgres/modules/graph/hits.sql_in b/src/ports/postgres/modules/graph/hits.sql_in index 96a507c..83f838d 100644 --- a/src/ports/postgres/modules/graph/hits.sql_in +++ b/src/ports/postgres/modules/graph/hits.sql_in @@ -127,6 +127,9 @@ parameter. </dl> +@note On a Greenplum cluster, the edge table should be distributed +by the source vertex id column for better performance. + @anchor notes @par Notes diff --git a/src/ports/postgres/modules/graph/pagerank.sql_in b/src/ports/postgres/modules/graph/pagerank.sql_in index b81b58e..cd239bd 100644 --- a/src/ports/postgres/modules/graph/pagerank.sql_in +++ b/src/ports/postgres/modules/graph/pagerank.sql_in @@ -132,6 +132,9 @@ for personalized PageRank. When this parameter is provided, personalized PageRan will run. In the absence of this parameter, regular PageRank will run. </dl> +@note On a Greenplum cluster, the edge table should be distributed +by the source vertex id column for better performance. + @anchor examples @examp diff --git a/src/ports/postgres/modules/graph/sssp.sql_in b/src/ports/postgres/modules/graph/sssp.sql_in index 372f1fb..8175624 100644 --- a/src/ports/postgres/modules/graph/sssp.sql_in +++ b/src/ports/postgres/modules/graph/sssp.sql_in @@ -104,6 +104,9 @@ A summary table named <out_table>_summary is also created. This is an internal t <dd>TEXT, default = NULL. List of columns used to group the input into discrete subgraphs. These columns must exist in the edge table. When this value is null, no grouping is used and a single SSSP result is generated. </dd> </dl> +@note On a Greenplum cluster, the edge table should be distributed +by the source vertex id column for better performance. + @par Path Retrieval The path retrieval function returns the shortest path from the diff --git a/src/ports/postgres/modules/graph/wcc.sql_in b/src/ports/postgres/modules/graph/wcc.sql_in index 1c3808b..bc6ce7a 100644 --- a/src/ports/postgres/modules/graph/wcc.sql_in +++ b/src/ports/postgres/modules/graph/wcc.sql_in @@ -115,8 +115,9 @@ weakly connected components are generated for all data </dl> -@note On Greenplum cluster, the edge table should be distributed on the src -column for better performance. In addition, the user should note that this +@note On a Greenplum cluster, the edge table should be distributed +by the source vertex id column for better performance. +In addition, the user should note that this function creates a duplicate of the edge table (on Greenplum cluster) for better performance.