[GitHub] madlib pull request #252: leftover minor RF user doc update
Github user asfgit closed the pull request at: https://github.com/apache/madlib/pull/252 ---
[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084
Github user jingyimei commented on a diff in the pull request: https://github.com/apache/madlib/pull/244#discussion_r177916814 --- Diff: src/ports/postgres/modules/graph/test/pagerank.sql_in --- @@ -95,6 +101,49 @@ SELECT assert(relative_error(SUM(pagerank), 1) < 0.1, ) FROM pagerank_gr_out WHERE user_id=2; +-- Tests for Personalized Page Rank + +-- Test without grouping + +DROP TABLE IF EXISTS pagerank_ppr_out; +DROP TABLE IF EXISTS pagerank_ppr_out_summary; +SELECT pagerank( + 'vertex',-- Vertex table + 'id',-- Vertix id column + '"EDGE"', -- "EDGE" table + 'src=src, dest=dest', -- "EDGE" args + 'pagerank_ppr_out', -- Output table of PageRank + NULL, -- Default damping factor (0.85) + NULL, -- Default max iters (100) + NULL, -- Default Threshold + NULL, -- Grouping column +'{1,3}'); -- Personlized Nodes + + +-- View the PageRank of all vertices, sorted by their scores. +SELECT assert(relative_error(SUM(pagerank), 1) < 0.00124, --- End diff -- Is this 0.00124 based on current test result? Can we make it smaller? ---
[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084
Github user jingyimei commented on a diff in the pull request: https://github.com/apache/madlib/pull/244#discussion_r177899442 --- Diff: src/ports/postgres/modules/graph/pagerank.py_in --- @@ -211,19 +261,30 @@ def pagerank(schema_madlib, vertex_table, vertex_id, edge_table, edge_args, distinct_grp_table, grouping_cols_list) # Find number of vertices in each group, this is the normalizing factor # for computing the random_prob +where_clause_ppr = '' +if nodes_of_interest > 0: +where_clause_ppr = """where __vertices__ = ANY(ARRAY{nodes_of_interest})""".format( +**locals()) +random_prob_grp = 1.0 - damping_factor +init_prob_grp = 1.0 / len(nodes_of_interest) +else: +random_prob_grp = """{rand_damp}/COUNT(__vertices__)::DOUBLE PRECISION + """.format(**locals()) +init_prob_grp = """1/COUNT(__vertices__)::DOUBLE PRECISION""".format( +**locals()) + plpy.execute("DROP TABLE IF EXISTS {0}".format(vertices_per_group)) plpy.execute("""CREATE TEMP TABLE {vertices_per_group} AS SELECT {distinct_grp_table}.*, -1/COUNT(__vertices__)::DOUBLE PRECISION AS {init_pr}, -{rand_damp}/COUNT(__vertices__)::DOUBLE PRECISION -AS {random_prob} +{init_prob_grp} AS {init_pr}, +{random_prob_grp} as {random_prob} FROM {distinct_grp_table} INNER JOIN ( SELECT {grouping_cols}, {src} AS __vertices__ FROM {edge_temp_table} UNION SELECT {grouping_cols}, {dest} FROM {edge_temp_table} ){subq} -ON {grouping_where_clause} +ON {grouping_where_clause} {where_clause_ppr} --- End diff -- put {where_clause_ppr} in a new line ---
[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084
Github user jingyimei commented on a diff in the pull request: https://github.com/apache/madlib/pull/244#discussion_r177912288 --- Diff: src/ports/postgres/modules/graph/pagerank.py_in --- @@ -527,14 +615,55 @@ def pagerank(schema_madlib, vertex_table, vertex_id, edge_table, edge_args, """.format(**locals())) # Step 4: Cleanup -plpy.execute("""DROP TABLE IF EXISTS {0},{1},{2},{3},{4},{5},{6} +plpy.execute("""DROP TABLE IF EXISTS {0},{1},{2},{3},{4},{5},{6},{7} """.format(out_cnts, edge_temp_table, cur, message, cur_unconv, - message_unconv, nodes_with_no_incoming_edges)) + message_unconv, nodes_with_no_incoming_edges, personalized_nodes)) --- End diff -- This "personalized_nodes" table doesn't get created before ---
[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084
Github user jingyimei commented on a diff in the pull request: https://github.com/apache/madlib/pull/244#discussion_r177897977 --- Diff: src/ports/postgres/modules/graph/pagerank.py_in --- @@ -211,19 +261,30 @@ def pagerank(schema_madlib, vertex_table, vertex_id, edge_table, edge_args, distinct_grp_table, grouping_cols_list) # Find number of vertices in each group, this is the normalizing factor # for computing the random_prob +where_clause_ppr = '' +if nodes_of_interest > 0: +where_clause_ppr = """where __vertices__ = ANY(ARRAY{nodes_of_interest})""".format( +**locals()) +random_prob_grp = 1.0 - damping_factor +init_prob_grp = 1.0 / len(nodes_of_interest) --- End diff -- len(nodes_of_interest) == total_ppr_nodes ? so that we don't need to run O(n) again ---
[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084
Github user jingyimei commented on a diff in the pull request: https://github.com/apache/madlib/pull/244#discussion_r177910146 --- Diff: src/ports/postgres/modules/graph/pagerank.py_in --- @@ -211,19 +261,30 @@ def pagerank(schema_madlib, vertex_table, vertex_id, edge_table, edge_args, distinct_grp_table, grouping_cols_list) # Find number of vertices in each group, this is the normalizing factor # for computing the random_prob +where_clause_ppr = '' +if nodes_of_interest > 0: +where_clause_ppr = """where __vertices__ = ANY(ARRAY{nodes_of_interest})""".format( --- End diff -- After consulting with QP, `__vertices__ = ANY(ARRAY{nodes_of_interest})` works exactly the same as `__vertices__ in (nodes_of_interest)`, this may look simpler. Besides, since we use this condition in multiple places, I am wondering if a join clause is faster - we create a temp table that saves special node ids and we join this temp table with vertex table by vertex id - QP suggested to try both and see which one runs faster. ---
[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084
Github user jingyimei commented on a diff in the pull request: https://github.com/apache/madlib/pull/244#discussion_r177851780 --- Diff: src/ports/postgres/modules/graph/pagerank.py_in --- @@ -44,29 +44,62 @@ from utilities.utilities import add_postfix from utilities.utilities import extract_keyvalue_params from utilities.utilities import unique_string, split_quoted_delimited_str from utilities.utilities import is_platform_pg +from utilities.utilities import py_list_to_sql_string from utilities.validate_args import columns_exist_in_table, get_cols_and_types from utilities.validate_args import table_exists + def validate_pagerank_args(schema_madlib, vertex_table, vertex_id, edge_table, edge_params, out_table, damping_factor, max_iter, - threshold, grouping_cols_list): + threshold, grouping_cols_list, nodes_of_interest): """ Function to validate input parameters for PageRank """ validate_graph_coding(vertex_table, vertex_id, edge_table, edge_params, out_table, 'PageRank') -## Validate args such as threshold and max_iter +# Validate args such as threshold and max_iter validate_params_for_link_analysis(schema_madlib, "PageRank", -threshold, max_iter, -edge_table, grouping_cols_list) + threshold, max_iter, + edge_table, grouping_cols_list) _assert(damping_factor >= 0.0 and damping_factor <= 1.0, "PageRank: Invalid damping factor value ({0}), must be between 0 and 1.". format(damping_factor)) - -def pagerank(schema_madlib, vertex_table, vertex_id, edge_table, edge_args, - out_table, damping_factor, max_iter, threshold, grouping_cols, **kwargs): +# Validate against the givin set of nodes for Personalized Page Rank +if nodes_of_interest: +nodes_of_interest_count = len(nodes_of_interest) +vertices_count = plpy.execute(""" + SELECT count(DISTINCT({vertex_id})) AS cnt FROM {vertex_table} + WHERE {vertex_id} = ANY(ARRAY{nodes_of_interest}) + """.format(**locals()))[0]["cnt"] +# Check to see if the given set of nodes exist in vertex table +if vertices_count != len(nodes_of_interest): +plpy.error("PageRank: Invalid value for {0}, must be a subset of the vertex_table".format( --- End diff -- This query tests several invalid scenarios, including duplicate nodes in nodes_of_interest, in the error msg maybe we can say "Invalid value for {0}, must be a subset of the vertex_table without duplicate nodes". ---
[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084
Github user jingyimei commented on a diff in the pull request: https://github.com/apache/madlib/pull/244#discussion_r177894976 --- Diff: src/ports/postgres/modules/graph/pagerank.py_in --- @@ -211,19 +261,30 @@ def pagerank(schema_madlib, vertex_table, vertex_id, edge_table, edge_args, distinct_grp_table, grouping_cols_list) # Find number of vertices in each group, this is the normalizing factor # for computing the random_prob +where_clause_ppr = '' +if nodes_of_interest > 0: --- End diff -- `if nodes_of_interest:` or `if total_ppr_nodes > 0:` ---
[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084
Github user jingyimei commented on a diff in the pull request: https://github.com/apache/madlib/pull/244#discussion_r177915601 --- Diff: src/ports/postgres/modules/graph/pagerank.py_in --- @@ -647,6 +778,26 @@ SELECT * FROM pagerank_out ORDER BY user_id, pagerank DESC; -- View the summary table to find the number of iterations required for -- convergence for each group. SELECT * FROM pagerank_out_summary; + +-- Compute the Personalized PageRank: +DROP TABLE IF EXISTS pagerank_out, pagerank_out_summary; +SELECT madlib.pagerank( + 'vertex', -- Vertex table + 'id', -- Vertix id column + 'edge', -- Edge table + 'src=src, dest=dest', -- Comma delimted string of edge arguments + 'pagerank_out', -- Output table of PageRank +NULL,-- Default damping factor (0.85) +NULL,-- Default max iters (100) +NULL,-- Default Threshold +NULL,-- No Grouping --- End diff -- move those NULLs one space left ---
[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084
Github user jingyimei commented on a diff in the pull request: https://github.com/apache/madlib/pull/244#discussion_r177914251 --- Diff: src/ports/postgres/modules/graph/pagerank.py_in --- @@ -149,25 +186,37 @@ def pagerank(schema_madlib, vertex_table, vertex_id, edge_table, edge_args, out_cnts = unique_string(desp='out_cnts') out_cnts_cnt = unique_string(desp='cnt') v1 = unique_string(desp='v1') +personalized_nodes = unique_string(desp='personalized_nodes') if is_platform_pg(): cur_distribution = cnts_distribution = '' else: -cur_distribution = cnts_distribution = \ -"DISTRIBUTED BY ({0}{1})".format( -grouping_cols_comma, vertex_id) +cur_distribution = cnts_distribution = "DISTRIBUTED BY ({0}{1})".format( +grouping_cols_comma, vertex_id) cur_join_clause = """{edge_temp_table}.{dest} = {cur}.{vertex_id} """.format(**locals()) out_cnts_join_clause = """{out_cnts}.{vertex_id} = {edge_temp_table}.{src} """.format(**locals()) v1_join_clause = """{v1}.{vertex_id} = {edge_temp_table}.{src} """.format(**locals()) +# Get query params for Personalized Page Rank. +ppr_params = get_query_params_for_ppr(nodes_of_interest, damping_factor, --- End diff -- Is it better to check `if nodes_of_interest` before calling get_query_params_for_ppr instead of checking it in get_query_params_for_ppr? ---
[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084
Github user jingyimei commented on a diff in the pull request: https://github.com/apache/madlib/pull/244#discussion_r177914961 --- Diff: src/ports/postgres/modules/graph/pagerank.py_in --- @@ -551,14 +680,16 @@ def pagerank_help(schema_madlib, message, **kwargs): message.lower() in ("usage", "help", "?"): help_string = "Get from method below" help_string = get_graph_usage(schema_madlib, 'PageRank', -"""out_table TEXT, -- Name of the output table for PageRank + """out_table TEXT, -- Name of the output table for PageRank damping_factor DOUBLE PRECISION, -- Damping factor in random surfer model -- (DEFAULT = 0.85) max_iter INTEGER, -- Maximum iteration number (DEFAULT = 100) threshold DOUBLE PRECISION, -- Stopping criteria (DEFAULT = 1/(N*1000), -- N is number of vertices in the graph) -grouping_col TEXT -- Comma separated column names to group on +grouping_col TEXT, -- Comma separated column names to group on -- (DEFAULT = NULL, no grouping) +nodes_of_interest ARRAY OF INTEGER -- A comma seperated list of vertices + or nodes for personalized page rank. """) + """ --- End diff -- indent left side, and indent comment(--) right ---
[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084
Github user jingyimei commented on a diff in the pull request: https://github.com/apache/madlib/pull/244#discussion_r177892625 --- Diff: src/ports/postgres/modules/graph/pagerank.py_in --- @@ -44,29 +44,62 @@ from utilities.utilities import add_postfix from utilities.utilities import extract_keyvalue_params from utilities.utilities import unique_string, split_quoted_delimited_str from utilities.utilities import is_platform_pg +from utilities.utilities import py_list_to_sql_string from utilities.validate_args import columns_exist_in_table, get_cols_and_types from utilities.validate_args import table_exists + def validate_pagerank_args(schema_madlib, vertex_table, vertex_id, edge_table, edge_params, out_table, damping_factor, max_iter, - threshold, grouping_cols_list): + threshold, grouping_cols_list, nodes_of_interest): """ Function to validate input parameters for PageRank """ validate_graph_coding(vertex_table, vertex_id, edge_table, edge_params, out_table, 'PageRank') -## Validate args such as threshold and max_iter +# Validate args such as threshold and max_iter validate_params_for_link_analysis(schema_madlib, "PageRank", -threshold, max_iter, -edge_table, grouping_cols_list) + threshold, max_iter, + edge_table, grouping_cols_list) _assert(damping_factor >= 0.0 and damping_factor <= 1.0, "PageRank: Invalid damping factor value ({0}), must be between 0 and 1.". format(damping_factor)) - -def pagerank(schema_madlib, vertex_table, vertex_id, edge_table, edge_args, - out_table, damping_factor, max_iter, threshold, grouping_cols, **kwargs): +# Validate against the givin set of nodes for Personalized Page Rank +if nodes_of_interest: +nodes_of_interest_count = len(nodes_of_interest) +vertices_count = plpy.execute(""" + SELECT count(DISTINCT({vertex_id})) AS cnt FROM {vertex_table} + WHERE {vertex_id} = ANY(ARRAY{nodes_of_interest}) + """.format(**locals()))[0]["cnt"] +# Check to see if the given set of nodes exist in vertex table +if vertices_count != len(nodes_of_interest): +plpy.error("PageRank: Invalid value for {0}, must be a subset of the vertex_table".format( +nodes_of_interest)) +# Validate given set of nodes against each user group. +# If all the given nodes are not present in the user group +# then throw an error. +if grouping_cols_list: +missing_user_grps = '' +grp_by_column = get_table_qualified_col_str( +edge_table, grouping_cols_list) +grps_without_nodes = plpy.execute(""" + SELECT {grp_by_column} FROM {edge_table} + WHERE src = ANY(ARRAY{nodes_of_interest}) group by {grp_by_column} + having count(DISTINCT(src)) != {nodes_of_interest_count} + """.format(**locals())) +for row in range(grps_without_nodes.nrows()): +missing_user_grps += str(grps_without_nodes[row]['user_id']) +if row < grps_without_nodes.nrows() - 1: +missing_user_grps += ' ,' +if grps_without_nodes.nrows() > 0: +plpy.error("Nodes for Personalizaed Page Rank are missing from these groups: {0} ".format( +missing_user_grps)) + --- End diff -- Here some similar things are test twice - when `if nodes_of_interest`, there is a `count` operation in line 73 and in line 77 there is one test(this is for without grouping). Then when `if grouping_cols_list`, another `count` and `compare` happen in line 90 per group. There might be a way to simplify the logic here so that for grouping, we don't need to do it twice. Besides, if the above query really slow down performance a lot, I would think about doing it simpler by not giving a list of groups missing special nodes but just a warning(optional, depending on how expensive the above query is). ---
[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084
Github user jingyimei commented on a diff in the pull request: https://github.com/apache/madlib/pull/244#discussion_r177916983 --- Diff: src/ports/postgres/modules/graph/test/pagerank.sql_in --- @@ -95,6 +101,49 @@ SELECT assert(relative_error(SUM(pagerank), 1) < 0.1, ) FROM pagerank_gr_out WHERE user_id=2; +-- Tests for Personalized Page Rank + +-- Test without grouping + +DROP TABLE IF EXISTS pagerank_ppr_out; +DROP TABLE IF EXISTS pagerank_ppr_out_summary; +SELECT pagerank( + 'vertex',-- Vertex table + 'id',-- Vertix id column + '"EDGE"', -- "EDGE" table + 'src=src, dest=dest', -- "EDGE" args + 'pagerank_ppr_out', -- Output table of PageRank + NULL, -- Default damping factor (0.85) + NULL, -- Default max iters (100) + NULL, -- Default Threshold + NULL, -- Grouping column +'{1,3}'); -- Personlized Nodes + + +-- View the PageRank of all vertices, sorted by their scores. +SELECT assert(relative_error(SUM(pagerank), 1) < 0.00124, +'PageRank: Scores do not sum up to 1.' +) FROM pagerank_ppr_out; + + +-- Test with grouping + +DROP TABLE IF EXISTS pagerank_ppr_grp_out; +DROP TABLE IF EXISTS pagerank_ppr_grp_out_summary; +SELECT pagerank( + 'vertex',-- Vertex table + 'id',-- Vertix id column + '"EDGE"', -- "EDGE" table + 'src=src, dest=dest', -- "EDGE" args + 'pagerank_ppr_grp_out', -- Output table of PageRank + NULL, -- Default damping factor (0.85) + NULL, -- Default max iters (100) + NULL, -- Default Threshold + 'user_id', -- Grouping column +'{1,3}'); -- Personlized Nodes + +SELECT assert(count(*) = 14, 'Tuple count for Pagerank out table != 14') FROM pagerank_ppr_grp_out; --- End diff -- can we do similar assertion here by group? ---
[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084
Github user jingyimei commented on a diff in the pull request: https://github.com/apache/madlib/pull/244#discussion_r177917620 --- Diff: src/ports/postgres/modules/graph/pagerank.sql_in --- @@ -273,6 +278,48 @@ SELECT * FROM pagerank_out_summary ORDER BY user_id; (2 rows) +-# Example of Personalized Page Rank with Nodes {2,4} + +DROP TABLE IF EXISTS pagerank_out, pagerank_out_summary; +SELECT madlib.pagerank( + 'vertex', -- Vertex table + 'id', -- Vertix id column + 'edge', -- Edge table + 'src=src, dest=dest', -- Comma delimted string of edge arguments + 'pagerank_out', -- Output table of PageRank +NULL,-- Default damping factor (0.85) +NULL,-- Default max iters (100) +NULL,-- Default Threshold +NULL,-- No Grouping + '{2,4}'); -- Personlized Nodes --- End diff -- Great ---
[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084
Github user jingyimei commented on a diff in the pull request: https://github.com/apache/madlib/pull/244#discussion_r177915929 --- Diff: src/ports/postgres/modules/graph/test/pagerank.sql_in --- @@ -66,7 +66,12 @@ SELECT pagerank( 'id',-- Vertix id column '"EDGE"', -- "EDGE" table 'src=src, dest=dest', -- "EDGE" args - 'pagerank_out'); -- Output table of PageRank + 'pagerank_out',-- Output table of PageRank + NULL, -- Default damping factor (0.85) + NULL, -- Default max iters (100) + NULL, -- Default Threshold + NULL, -- No Grouping + NULL); -- Personlized Nodes --- End diff -- In this case, we can remove the last 5 NULLs since they are all optional. ---
[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084
Github user jingyimei commented on a diff in the pull request: https://github.com/apache/madlib/pull/244#discussion_r177893734 --- Diff: src/ports/postgres/modules/graph/pagerank.py_in --- @@ -122,12 +158,13 @@ def pagerank(schema_madlib, vertex_table, vertex_id, edge_table, edge_args, grouping_where_clause = '' group_by_clause = '' random_prob = '' +ppr_join_clause = '' edge_temp_table = unique_string(desp='temp_edge') grouping_cols_comma = grouping_cols + ',' if grouping_cols else '' distribution = ('' if is_platform_pg() else "DISTRIBUTED BY ({0}{1})".format( -grouping_cols_comma, dest)) +grouping_cols_comma, dest)) --- End diff -- maybe indent with the above line, or move the above line backwards to the current place ---
[GitHub] madlib pull request #244: Changes for Personalized Page Rank : Jira:1084
Github user jingyimei commented on a diff in the pull request: https://github.com/apache/madlib/pull/244#discussion_r177917195 --- Diff: src/ports/postgres/modules/graph/pagerank.py_in --- @@ -149,25 +164,39 @@ def pagerank(schema_madlib, vertex_table, vertex_id, edge_table, edge_args, out_cnts = unique_string(desp='out_cnts') out_cnts_cnt = unique_string(desp='cnt') v1 = unique_string(desp='v1') +personalized_nodes = unique_string(desp='personalized_nodes') if is_platform_pg(): cur_distribution = cnts_distribution = '' else: -cur_distribution = cnts_distribution = \ -"DISTRIBUTED BY ({0}{1})".format( -grouping_cols_comma, vertex_id) +cur_distribution = cnts_distribution = "DISTRIBUTED BY ({0}{1})".format( +grouping_cols_comma, vertex_id) cur_join_clause = """{edge_temp_table}.{dest} = {cur}.{vertex_id} """.format(**locals()) out_cnts_join_clause = """{out_cnts}.{vertex_id} = {edge_temp_table}.{src} """.format(**locals()) v1_join_clause = """{v1}.{vertex_id} = {edge_temp_table}.{src} """.format(**locals()) +# Get query params for Personalized Page Rank. +ppr_params = get_query_params_for_ppr(nodes_of_interest, damping_factor, + ppr_join_clause, vertex_id, + edge_temp_table, vertex_table, cur_distribution, + personalized_nodes) +total_ppr_nodes = ppr_params[0] +random_jump_prob_ppr = ppr_params[1] +ppr_join_clause = ppr_params[2] + random_probability = (1.0 - damping_factor) / n_vertices +if total_ppr_nodes > 0: +random_jump_prob = random_jump_prob_ppr +else: +random_jump_prob = random_probability --- End diff -- Got it. ---
[GitHub] madlib issue #253: MLP: Add install check tests for minibatch with grouping
Github user asfgit commented on the issue: https://github.com/apache/madlib/pull/253 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/madlib-pr-build/413/ ---
[GitHub] madlib pull request #253: MLP: Add install check tests for minibatch with gr...
GitHub user kaknikhil opened a pull request: https://github.com/apache/madlib/pull/253 MLP: Add install check tests for minibatch with grouping This PR adds install check tests for MLP minibatch with grouping. You can merge this pull request into a Git repository by running: $ git pull https://github.com/madlib/madlib feature/mlp-minibatch-grouping Alternatively you can review and apply these changes as the patch at: https://github.com/apache/madlib/pull/253.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #253 commit fe0bc93d83fe295658d689f4957eb9d12a513c23 Author: Nikhil Kak Date: 2018-03-26T18:55:25Z MLP: Add install check tests for minibatch with grouping ---
[GitHub] madlib issue #252: leftover minor RF user doc update
Github user asfgit commented on the issue: https://github.com/apache/madlib/pull/252 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/madlib-pr-build/412/ ---
[GitHub] madlib pull request #252: leftover minor RF user doc update
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/252 leftover minor RF user doc update A few remaining RF user doc changes I missed in https://github.com/apache/madlib/commit/7f3aae92f2d84bf7e4501ac5efec1ebfc7a80834 Also added links to 2 prev versions that were missing on front page of user docs You can merge this pull request into a Git repository by running: $ git pull https://github.com/fmcquillan99/apache-madlib doc-tree-1dot14-v2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/madlib/pull/252.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #252 ---