date:20180328

[GitHub] madlib pull request #252: leftover minor RF user doc update

2018-03-28 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/madlib/pull/252


---

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

2018-03-28 Thread jingyimei

Github user jingyimei commented on a diff in the pull request:

https://github.com/apache/madlib/pull/244#discussion_r177916814
  
--- Diff: src/ports/postgres/modules/graph/test/pagerank.sql_in ---
@@ -95,6 +101,49 @@ SELECT assert(relative_error(SUM(pagerank), 1) < 
0.1,
 ) FROM pagerank_gr_out WHERE user_id=2;
 
 
+-- Tests for Personalized Page Rank
+
+-- Test without grouping 
+
+DROP TABLE IF EXISTS pagerank_ppr_out;
+DROP TABLE IF EXISTS pagerank_ppr_out_summary;
+SELECT pagerank(
+ 'vertex',-- Vertex table
+ 'id',-- Vertix id column
+ '"EDGE"',  -- "EDGE" table
+ 'src=src, dest=dest', -- "EDGE" args
+ 'pagerank_ppr_out', -- Output table of PageRank
+ NULL,  -- Default damping factor (0.85)
+ NULL,  -- Default max iters (100)
+ NULL,  -- Default Threshold 
+ NULL, -- Grouping column
+'{1,3}'); -- Personlized Nodes
+
+
+-- View the PageRank of all vertices, sorted by their scores.
+SELECT assert(relative_error(SUM(pagerank), 1) < 0.00124,
--- End diff --

Is this  0.00124 based on current test result? Can we make it smaller?


---

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

2018-03-28 Thread jingyimei

Github user jingyimei commented on a diff in the pull request:

https://github.com/apache/madlib/pull/244#discussion_r177899442
  
--- Diff: src/ports/postgres/modules/graph/pagerank.py_in ---
@@ -211,19 +261,30 @@ def pagerank(schema_madlib, vertex_table, vertex_id, 
edge_table, edge_args,
 distinct_grp_table, grouping_cols_list)
 # Find number of vertices in each group, this is the 
normalizing factor
 # for computing the random_prob
+where_clause_ppr = ''
+if nodes_of_interest > 0:
+where_clause_ppr = """where __vertices__ = 
ANY(ARRAY{nodes_of_interest})""".format(
+**locals())
+random_prob_grp = 1.0 - damping_factor
+init_prob_grp = 1.0 / len(nodes_of_interest)
+else:
+random_prob_grp  = 
"""{rand_damp}/COUNT(__vertices__)::DOUBLE PRECISION
+ """.format(**locals())
+init_prob_grp  =  """1/COUNT(__vertices__)::DOUBLE 
PRECISION""".format(
+**locals())
+
 plpy.execute("DROP TABLE IF EXISTS 
{0}".format(vertices_per_group))
 plpy.execute("""CREATE TEMP TABLE {vertices_per_group} AS
 SELECT {distinct_grp_table}.*,
-1/COUNT(__vertices__)::DOUBLE PRECISION AS {init_pr},
-{rand_damp}/COUNT(__vertices__)::DOUBLE PRECISION
-AS {random_prob}
+{init_prob_grp} AS {init_pr},
+{random_prob_grp} as {random_prob}
 FROM {distinct_grp_table} INNER JOIN (
 SELECT {grouping_cols}, {src} AS __vertices__
 FROM {edge_temp_table}
 UNION
 SELECT {grouping_cols}, {dest} FROM 
{edge_temp_table}
 ){subq}
-ON {grouping_where_clause}
+ON {grouping_where_clause} {where_clause_ppr}
--- End diff --

put {where_clause_ppr} in a new line


---

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

2018-03-28 Thread jingyimei

Github user jingyimei commented on a diff in the pull request:

https://github.com/apache/madlib/pull/244#discussion_r177912288
  
--- Diff: src/ports/postgres/modules/graph/pagerank.py_in ---
@@ -527,14 +615,55 @@ def pagerank(schema_madlib, vertex_table, vertex_id, 
edge_table, edge_args,
 """.format(**locals()))
 
 # Step 4: Cleanup
-plpy.execute("""DROP TABLE IF EXISTS {0},{1},{2},{3},{4},{5},{6}
+plpy.execute("""DROP TABLE IF EXISTS 
{0},{1},{2},{3},{4},{5},{6},{7}
 """.format(out_cnts, edge_temp_table, cur, message, cur_unconv,
-   message_unconv, nodes_with_no_incoming_edges))
+   message_unconv, nodes_with_no_incoming_edges, 
personalized_nodes))
--- End diff --

This "personalized_nodes" table doesn't get created before


---

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

2018-03-28 Thread jingyimei

Github user jingyimei commented on a diff in the pull request:

https://github.com/apache/madlib/pull/244#discussion_r177897977
  
--- Diff: src/ports/postgres/modules/graph/pagerank.py_in ---
@@ -211,19 +261,30 @@ def pagerank(schema_madlib, vertex_table, vertex_id, 
edge_table, edge_args,
 distinct_grp_table, grouping_cols_list)
 # Find number of vertices in each group, this is the 
normalizing factor
 # for computing the random_prob
+where_clause_ppr = ''
+if nodes_of_interest > 0:
+where_clause_ppr = """where __vertices__ = 
ANY(ARRAY{nodes_of_interest})""".format(
+**locals())
+random_prob_grp = 1.0 - damping_factor
+init_prob_grp = 1.0 / len(nodes_of_interest)
--- End diff --

len(nodes_of_interest) == total_ppr_nodes ? so that we don't need to run 
O(n) again


---

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

2018-03-28 Thread jingyimei

Github user jingyimei commented on a diff in the pull request:

https://github.com/apache/madlib/pull/244#discussion_r177910146
  
--- Diff: src/ports/postgres/modules/graph/pagerank.py_in ---
@@ -211,19 +261,30 @@ def pagerank(schema_madlib, vertex_table, vertex_id, 
edge_table, edge_args,
 distinct_grp_table, grouping_cols_list)
 # Find number of vertices in each group, this is the 
normalizing factor
 # for computing the random_prob
+where_clause_ppr = ''
+if nodes_of_interest > 0:
+where_clause_ppr = """where __vertices__ = 
ANY(ARRAY{nodes_of_interest})""".format(
--- End diff --

After consulting with QP, `__vertices__ = ANY(ARRAY{nodes_of_interest})` 
works exactly the same as `__vertices__ in (nodes_of_interest)`, this may look 
simpler.  

Besides, since we use this condition in multiple places, I am wondering if 
a join clause is faster - we create a temp table that saves special node ids 
and we join this temp table with vertex table by vertex id - QP suggested to 
try both and see which one runs faster.


---

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

2018-03-28 Thread jingyimei

Github user jingyimei commented on a diff in the pull request:

https://github.com/apache/madlib/pull/244#discussion_r177851780
  
--- Diff: src/ports/postgres/modules/graph/pagerank.py_in ---
@@ -44,29 +44,62 @@ from utilities.utilities import add_postfix
 from utilities.utilities import extract_keyvalue_params
 from utilities.utilities import unique_string, split_quoted_delimited_str
 from utilities.utilities import is_platform_pg
+from utilities.utilities import py_list_to_sql_string
 
 from utilities.validate_args import columns_exist_in_table, 
get_cols_and_types
 from utilities.validate_args import table_exists
 
+
 def validate_pagerank_args(schema_madlib, vertex_table, vertex_id, 
edge_table,
edge_params, out_table, damping_factor, 
max_iter,
-   threshold, grouping_cols_list):
+   threshold, grouping_cols_list, 
nodes_of_interest):
 """
 Function to validate input parameters for PageRank
 """
 validate_graph_coding(vertex_table, vertex_id, edge_table, edge_params,
   out_table, 'PageRank')
-## Validate args such as threshold and max_iter
+# Validate args such as threshold and max_iter
 validate_params_for_link_analysis(schema_madlib, "PageRank",
-threshold, max_iter,
-edge_table, grouping_cols_list)
+  threshold, max_iter,
+  edge_table, grouping_cols_list)
 _assert(damping_factor >= 0.0 and damping_factor <= 1.0,
 "PageRank: Invalid damping factor value ({0}), must be between 
0 and 1.".
 format(damping_factor))
 
-
-def pagerank(schema_madlib, vertex_table, vertex_id, edge_table, edge_args,
- out_table, damping_factor, max_iter, threshold, 
grouping_cols, **kwargs):
+# Validate against the givin set of nodes for Personalized Page Rank
+if nodes_of_interest:
+nodes_of_interest_count = len(nodes_of_interest)
+vertices_count = plpy.execute("""
+   SELECT count(DISTINCT({vertex_id})) AS cnt FROM 
{vertex_table}
+   WHERE {vertex_id} = ANY(ARRAY{nodes_of_interest})
+   """.format(**locals()))[0]["cnt"]
+# Check to see if the given set of nodes exist in vertex table
+if vertices_count != len(nodes_of_interest):
+plpy.error("PageRank: Invalid value for {0}, must be a subset 
of the vertex_table".format(
--- End diff --

This query tests several invalid scenarios, including duplicate nodes in 
nodes_of_interest, in the error msg maybe we can say "Invalid value for {0}, 
must be a subset of the vertex_table without duplicate nodes".


---

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

2018-03-28 Thread jingyimei

Github user jingyimei commented on a diff in the pull request:

https://github.com/apache/madlib/pull/244#discussion_r177894976
  
--- Diff: src/ports/postgres/modules/graph/pagerank.py_in ---
@@ -211,19 +261,30 @@ def pagerank(schema_madlib, vertex_table, vertex_id, 
edge_table, edge_args,
 distinct_grp_table, grouping_cols_list)
 # Find number of vertices in each group, this is the 
normalizing factor
 # for computing the random_prob
+where_clause_ppr = ''
+if nodes_of_interest > 0:
--- End diff --

`if nodes_of_interest:`  or `if total_ppr_nodes > 0:` 


---

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

2018-03-28 Thread jingyimei

Github user jingyimei commented on a diff in the pull request:

https://github.com/apache/madlib/pull/244#discussion_r177915601
  
--- Diff: src/ports/postgres/modules/graph/pagerank.py_in ---
@@ -647,6 +778,26 @@ SELECT * FROM pagerank_out ORDER BY user_id, pagerank 
DESC;
 -- View the summary table to find the number of iterations required for
 -- convergence for each group.
 SELECT * FROM pagerank_out_summary;
+
+-- Compute the Personalized PageRank:
+DROP TABLE IF EXISTS pagerank_out, pagerank_out_summary;
+SELECT madlib.pagerank(
+   'vertex', -- Vertex table
+   'id', -- Vertix id column
+   'edge',   -- Edge table
+   'src=src, dest=dest', -- Comma delimted string of 
edge arguments
+   'pagerank_out',   -- Output table of PageRank
+NULL,-- Default damping factor 
(0.85)
+NULL,-- Default max iters (100)
+NULL,-- Default Threshold
+NULL,-- No Grouping
--- End diff --

move those NULLs one space left


---

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

2018-03-28 Thread jingyimei

Github user jingyimei commented on a diff in the pull request:

https://github.com/apache/madlib/pull/244#discussion_r177914251
  
--- Diff: src/ports/postgres/modules/graph/pagerank.py_in ---
@@ -149,25 +186,37 @@ def pagerank(schema_madlib, vertex_table, vertex_id, 
edge_table, edge_args,
 out_cnts = unique_string(desp='out_cnts')
 out_cnts_cnt = unique_string(desp='cnt')
 v1 = unique_string(desp='v1')
+personalized_nodes = unique_string(desp='personalized_nodes')
 
 if is_platform_pg():
 cur_distribution = cnts_distribution = ''
 else:
-cur_distribution = cnts_distribution = \
-"DISTRIBUTED BY ({0}{1})".format(
-grouping_cols_comma, vertex_id)
+cur_distribution = cnts_distribution = "DISTRIBUTED BY 
({0}{1})".format(
+grouping_cols_comma, vertex_id)
 cur_join_clause = """{edge_temp_table}.{dest} = {cur}.{vertex_id}
 """.format(**locals())
 out_cnts_join_clause = """{out_cnts}.{vertex_id} =
 {edge_temp_table}.{src} """.format(**locals())
 v1_join_clause = """{v1}.{vertex_id} = {edge_temp_table}.{src}
 """.format(**locals())
 
+# Get query params for Personalized Page Rank.
+ppr_params = get_query_params_for_ppr(nodes_of_interest, 
damping_factor,
--- End diff --

Is it better to check `if nodes_of_interest` before calling 
get_query_params_for_ppr instead of checking it in get_query_params_for_ppr?


---

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

2018-03-28 Thread jingyimei

Github user jingyimei commented on a diff in the pull request:

https://github.com/apache/madlib/pull/244#discussion_r177914961
  
--- Diff: src/ports/postgres/modules/graph/pagerank.py_in ---
@@ -551,14 +680,16 @@ def pagerank_help(schema_madlib, message, **kwargs):
 message.lower() in ("usage", "help", "?"):
 help_string = "Get from method below"
 help_string = get_graph_usage(schema_madlib, 'PageRank',
-"""out_table TEXT, -- Name of the output table for PageRank
+  """out_table TEXT, -- Name of 
the output table for PageRank
 damping_factor DOUBLE PRECISION, -- Damping factor in random surfer 
model
  -- (DEFAULT = 0.85)
 max_iter  INTEGER, -- Maximum iteration number (DEFAULT = 100)
 threshold DOUBLE PRECISION, -- Stopping criteria (DEFAULT = 
1/(N*1000),
 -- N is number of vertices in the 
graph)
-grouping_col  TEXT -- Comma separated column names to group on
+grouping_col  TEXT, -- Comma separated column names to group on
-- (DEFAULT = NULL, no grouping)
+nodes_of_interest ARRAY OF INTEGER -- A comma seperated list of 
vertices
+  or nodes for personalized page 
rank.
 """) + """
 
--- End diff --

indent left side, and indent comment(--) right


---

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

2018-03-28 Thread jingyimei

Github user jingyimei commented on a diff in the pull request:

https://github.com/apache/madlib/pull/244#discussion_r177892625
  
--- Diff: src/ports/postgres/modules/graph/pagerank.py_in ---
@@ -44,29 +44,62 @@ from utilities.utilities import add_postfix
 from utilities.utilities import extract_keyvalue_params
 from utilities.utilities import unique_string, split_quoted_delimited_str
 from utilities.utilities import is_platform_pg
+from utilities.utilities import py_list_to_sql_string
 
 from utilities.validate_args import columns_exist_in_table, 
get_cols_and_types
 from utilities.validate_args import table_exists
 
+
 def validate_pagerank_args(schema_madlib, vertex_table, vertex_id, 
edge_table,
edge_params, out_table, damping_factor, 
max_iter,
-   threshold, grouping_cols_list):
+   threshold, grouping_cols_list, 
nodes_of_interest):
 """
 Function to validate input parameters for PageRank
 """
 validate_graph_coding(vertex_table, vertex_id, edge_table, edge_params,
   out_table, 'PageRank')
-## Validate args such as threshold and max_iter
+# Validate args such as threshold and max_iter
 validate_params_for_link_analysis(schema_madlib, "PageRank",
-threshold, max_iter,
-edge_table, grouping_cols_list)
+  threshold, max_iter,
+  edge_table, grouping_cols_list)
 _assert(damping_factor >= 0.0 and damping_factor <= 1.0,
 "PageRank: Invalid damping factor value ({0}), must be between 
0 and 1.".
 format(damping_factor))
 
-
-def pagerank(schema_madlib, vertex_table, vertex_id, edge_table, edge_args,
- out_table, damping_factor, max_iter, threshold, 
grouping_cols, **kwargs):
+# Validate against the givin set of nodes for Personalized Page Rank
+if nodes_of_interest:
+nodes_of_interest_count = len(nodes_of_interest)
+vertices_count = plpy.execute("""
+   SELECT count(DISTINCT({vertex_id})) AS cnt FROM 
{vertex_table}
+   WHERE {vertex_id} = ANY(ARRAY{nodes_of_interest})
+   """.format(**locals()))[0]["cnt"]
+# Check to see if the given set of nodes exist in vertex table
+if vertices_count != len(nodes_of_interest):
+plpy.error("PageRank: Invalid value for {0}, must be a subset 
of the vertex_table".format(
+nodes_of_interest))
+# Validate given set of nodes against each user group.
+# If all the given nodes are not present in the user group
+# then throw an error.
+if grouping_cols_list:
+missing_user_grps = ''
+grp_by_column = get_table_qualified_col_str(
+edge_table, grouping_cols_list)
+grps_without_nodes = plpy.execute("""
+   SELECT {grp_by_column} FROM {edge_table}
+   WHERE src = ANY(ARRAY{nodes_of_interest}) group by 
{grp_by_column}
+   having count(DISTINCT(src)) != {nodes_of_interest_count}
+   """.format(**locals()))
+for row in range(grps_without_nodes.nrows()):
+missing_user_grps += 
str(grps_without_nodes[row]['user_id'])
+if row < grps_without_nodes.nrows() - 1:
+missing_user_grps += ' ,'
+if grps_without_nodes.nrows() > 0:
+plpy.error("Nodes for Personalizaed Page Rank are missing 
from these groups: {0} ".format(
+missing_user_grps))
+
--- End diff --

Here some similar things are test twice - when `if nodes_of_interest`, 
there is a `count` operation in line 73 and in line 77 there is one test(this 
is for without grouping). Then when `if grouping_cols_list`, another `count` 
and `compare` happen in line 90 per group. There might be a way to simplify the 
logic here so that for grouping, we don't need to do it twice.  Besides, if the 
above query really slow down performance a lot, I would think about doing it 
simpler by not giving a list of groups missing special nodes but just a 
warning(optional, depending on how expensive the above query is).


---

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

2018-03-28 Thread jingyimei

Github user jingyimei commented on a diff in the pull request:

https://github.com/apache/madlib/pull/244#discussion_r177916983
  
--- Diff: src/ports/postgres/modules/graph/test/pagerank.sql_in ---
@@ -95,6 +101,49 @@ SELECT assert(relative_error(SUM(pagerank), 1) < 
0.1,
 ) FROM pagerank_gr_out WHERE user_id=2;
 
 
+-- Tests for Personalized Page Rank
+
+-- Test without grouping 
+
+DROP TABLE IF EXISTS pagerank_ppr_out;
+DROP TABLE IF EXISTS pagerank_ppr_out_summary;
+SELECT pagerank(
+ 'vertex',-- Vertex table
+ 'id',-- Vertix id column
+ '"EDGE"',  -- "EDGE" table
+ 'src=src, dest=dest', -- "EDGE" args
+ 'pagerank_ppr_out', -- Output table of PageRank
+ NULL,  -- Default damping factor (0.85)
+ NULL,  -- Default max iters (100)
+ NULL,  -- Default Threshold 
+ NULL, -- Grouping column
+'{1,3}'); -- Personlized Nodes
+
+
+-- View the PageRank of all vertices, sorted by their scores.
+SELECT assert(relative_error(SUM(pagerank), 1) < 0.00124,
+'PageRank: Scores do not sum up to 1.'
+) FROM pagerank_ppr_out;
+
+
+-- Test with grouping 
+
+DROP TABLE IF EXISTS pagerank_ppr_grp_out;
+DROP TABLE IF EXISTS pagerank_ppr_grp_out_summary;
+SELECT pagerank(
+ 'vertex',-- Vertex table
+ 'id',-- Vertix id column
+ '"EDGE"',  -- "EDGE" table
+ 'src=src, dest=dest', -- "EDGE" args
+ 'pagerank_ppr_grp_out', -- Output table of PageRank
+ NULL,  -- Default damping factor (0.85)
+ NULL,  -- Default max iters (100)
+ NULL,  -- Default Threshold 
+ 'user_id', -- Grouping column
+'{1,3}'); -- Personlized Nodes
+
+SELECT assert(count(*) = 14, 'Tuple count for Pagerank out table != 14') 
FROM pagerank_ppr_grp_out;
--- End diff --

can we do similar assertion here by group?


---

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

2018-03-28 Thread jingyimei

Github user jingyimei commented on a diff in the pull request:

https://github.com/apache/madlib/pull/244#discussion_r177917620
  
--- Diff: src/ports/postgres/modules/graph/pagerank.sql_in ---
@@ -273,6 +278,48 @@ SELECT * FROM pagerank_out_summary ORDER BY user_id;
 (2 rows)
 
 
+-# Example of Personalized Page Rank with Nodes {2,4}
+
+DROP TABLE IF EXISTS pagerank_out, pagerank_out_summary;
+SELECT madlib.pagerank(
+   'vertex', -- Vertex table
+   'id', -- Vertix id column
+   'edge',   -- Edge table
+   'src=src, dest=dest', -- Comma delimted string of 
edge arguments
+   'pagerank_out',   -- Output table of PageRank 
+NULL,-- Default damping factor 
(0.85)
+NULL,-- Default max iters (100)
+NULL,-- Default Threshold 
+NULL,-- No Grouping 
+   '{2,4}'); -- Personlized Nodes
--- End diff --

Great


---

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

2018-03-28 Thread jingyimei

Github user jingyimei commented on a diff in the pull request:

https://github.com/apache/madlib/pull/244#discussion_r177915929
  
--- Diff: src/ports/postgres/modules/graph/test/pagerank.sql_in ---
@@ -66,7 +66,12 @@ SELECT pagerank(
  'id',-- Vertix id column
  '"EDGE"',  -- "EDGE" table
  'src=src, dest=dest', -- "EDGE" args
- 'pagerank_out'); -- Output table of PageRank
+ 'pagerank_out',-- Output table of PageRank
+  NULL, -- Default damping factor (0.85)
+  NULL, -- Default max iters (100)
+  NULL, -- Default Threshold 
+  NULL, -- No Grouping 
+ NULL); -- Personlized Nodes
--- End diff --

In this case, we can remove the last 5 NULLs since they are all optional.


---

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

2018-03-28 Thread jingyimei

Github user jingyimei commented on a diff in the pull request:

https://github.com/apache/madlib/pull/244#discussion_r177893734
  
--- Diff: src/ports/postgres/modules/graph/pagerank.py_in ---
@@ -122,12 +158,13 @@ def pagerank(schema_madlib, vertex_table, vertex_id, 
edge_table, edge_args,
 grouping_where_clause = ''
 group_by_clause = ''
 random_prob = ''
+ppr_join_clause = ''
 
 edge_temp_table = unique_string(desp='temp_edge')
 grouping_cols_comma = grouping_cols + ',' if grouping_cols else ''
 distribution = ('' if is_platform_pg() else
 "DISTRIBUTED BY ({0}{1})".format(
-grouping_cols_comma, dest))
+grouping_cols_comma, dest))
--- End diff --

maybe indent with the above line, or move the above line backwards to the 
current place


---

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

2018-03-28 Thread jingyimei

Github user jingyimei commented on a diff in the pull request:

https://github.com/apache/madlib/pull/244#discussion_r177917195
  
--- Diff: src/ports/postgres/modules/graph/pagerank.py_in ---
@@ -149,25 +164,39 @@ def pagerank(schema_madlib, vertex_table, vertex_id, 
edge_table, edge_args,
 out_cnts = unique_string(desp='out_cnts')
 out_cnts_cnt = unique_string(desp='cnt')
 v1 = unique_string(desp='v1')
+personalized_nodes = unique_string(desp='personalized_nodes')
 
 if is_platform_pg():
 cur_distribution = cnts_distribution = ''
 else:
-cur_distribution = cnts_distribution = \
-"DISTRIBUTED BY ({0}{1})".format(
-grouping_cols_comma, vertex_id)
+cur_distribution = cnts_distribution = "DISTRIBUTED BY 
({0}{1})".format(
+grouping_cols_comma, vertex_id)
 cur_join_clause = """{edge_temp_table}.{dest} = {cur}.{vertex_id}
 """.format(**locals())
 out_cnts_join_clause = """{out_cnts}.{vertex_id} =
 {edge_temp_table}.{src} """.format(**locals())
 v1_join_clause = """{v1}.{vertex_id} = {edge_temp_table}.{src}
 """.format(**locals())
 
+# Get query params for Personalized Page Rank.
+ppr_params = get_query_params_for_ppr(nodes_of_interest, 
damping_factor,
+  ppr_join_clause, vertex_id,
+  edge_temp_table, 
vertex_table, cur_distribution,
+  personalized_nodes)
+total_ppr_nodes = ppr_params[0]
+random_jump_prob_ppr = ppr_params[1]
+ppr_join_clause = ppr_params[2]
+
 random_probability = (1.0 - damping_factor) / n_vertices
+if total_ppr_nodes > 0:
+random_jump_prob = random_jump_prob_ppr
+else:
+random_jump_prob = random_probability
--- End diff --

Got it.


---

[GitHub] madlib issue #253: MLP: Add install check tests for minibatch with grouping

2018-03-28 Thread asfgit

Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/253
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/413/



---

[GitHub] madlib pull request #253: MLP: Add install check tests for minibatch with gr...

2018-03-28 Thread kaknikhil

GitHub user kaknikhil opened a pull request:

https://github.com/apache/madlib/pull/253

MLP: Add install check tests for minibatch with grouping

This PR adds install check tests for MLP minibatch with grouping. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/madlib/madlib feature/mlp-minibatch-grouping

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/madlib/pull/253.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #253


commit fe0bc93d83fe295658d689f4957eb9d12a513c23
Author: Nikhil Kak 
Date:   2018-03-26T18:55:25Z

MLP: Add install check tests for minibatch with grouping




---

[GitHub] madlib issue #252: leftover minor RF user doc update

2018-03-28 Thread asfgit

Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/252
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/412/



---

[GitHub] madlib pull request #252: leftover minor RF user doc update

2018-03-28 Thread fmcquillan99

GitHub user fmcquillan99 opened a pull request:

https://github.com/apache/madlib/pull/252

leftover minor RF user doc update

A few remaining RF user doc changes I missed in 

https://github.com/apache/madlib/commit/7f3aae92f2d84bf7e4501ac5efec1ebfc7a80834

Also added links to 2 prev versions that were missing on front page of user 
docs 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fmcquillan99/apache-madlib doc-tree-1dot14-v2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/madlib/pull/252.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #252






---

[GitHub] madlib pull request #252: leftover minor RF user doc update

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084

[GitHub] madlib issue #253: MLP: Add install check tests for minibatch with grouping

[GitHub] madlib pull request #253: MLP: Add install check tests for minibatch with gr...

[GitHub] madlib issue #252: leftover minor RF user doc update

[GitHub] madlib pull request #252: leftover minor RF user doc update

21 matches

Site Navigation

Mail list logo

Footer information