[GitHub] incubator-madlib issue #174: Change test_train_split to train_test_split
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/174 I'll make the change with the merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #170: Multiple: Add quoted input params for te...
Github user iyerr3 closed the pull request at: https://github.com/apache/incubator-madlib/pull/170 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #170: Multiple: Add quoted input params for tests
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/170 Merged with commit f1aa9af --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #173: Measures: Use outer join for in-out degr...
GitHub user iyerr3 opened a pull request: https://github.com/apache/incubator-madlib/pull/173 Measures: Use outer join for in-out degrees computation JIRA: MADLIB-1073 Commit 06788cc added the graph measure functions described in the JIRA. This commit fixes a bug from that commit in the graph_vertex_degrees function. The bug led to results not containing vertices that either had 0 in-degree or out-degree. You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib bugfix/in_out_degrees Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/173.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #173 commit f3697fdaaebeb851dfa23a0503c2c143c54f7f69 Author: Rahul Iyer Date: 2017-08-18T23:19:39Z Measures: Use outer join for in-out degrees computation JIRA: MADLIB-1073 Commit 06788cc added the graph measure functions described in the JIRA. This commit fixes a bug from that commit in the graph_vertex_degrees function. The bug led to results not containing vertices that either had 0 in-degree or out-degree. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #171: DT: Correctly encode unseen categorical ...
GitHub user iyerr3 opened a pull request: https://github.com/apache/incubator-madlib/pull/171 DT: Correctly encode unseen categorical features Changes applied in commit a2f4740 added an option to treat NULL values as a new category. This was applied by changing the encoding process of categorical features to add a new value at the end of the list of values. The intention with the commit was to treat new unseen, non-null values equivalent to NULL. The encoding process, however, still encoded the unseen categorical value as -1, which is interpreted as NULL in underlying functions. This commit updates this process to correctly use the last index as the encoding for the unseen/NULL value. You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib bugfix/dt_unseen_encoding Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/171.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #171 commit 9f8722f410974ef27564623f44891ac2f95fd487 Author: Rahul Iyer Date: 2017-08-18T16:06:20Z DT: Correctly encode unseen categorical features Changes applied in commit a2f4740 added an option to treat NULL values as a new category. This was applied by changing the encoding process of categorical features to add a new value at the end of the list of values. The intention with the commit was to treat new unseen, non-null values equivalent to NULL. The encoding process, however, still encoded the unseen categorical value as -1, which is interpreted as NULL in underlying functions. This commit updates this process to correctly use the last index as the encoding for the unseen/NULL value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #170: Multiple: Add quoted input params for te...
GitHub user iyerr3 opened a pull request: https://github.com/apache/incubator-madlib/pull/170 Multiple: Add quoted input params for tests This commit updates install-check tests to use quoted string inputs. You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib chore/validation_quoted_char Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/170.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #170 commit caed030fc0a17fb96bd0ecbe7ebc898fde9bbc35 Author: Rahul Iyer Date: 2017-08-17T05:16:03Z Multiple: Add quoted input params for tests This commit updates install-check tests to use quoted string inputs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #166: Sample: test_train_split
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/166 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #168: Code refactoring for KNN
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/168#discussion_r133629865 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -127,5 +124,102 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, label_colum "Data type '{0}' is not a valid type for column '{1}' in table '{2}'.".format(colType, id_column_name, test_source)) return k -# -- -m4_changequote(, ) + + + + +def knn(schema_madlib, point_source, point_column_name, label_column_name, +test_source, test_column_name, id_column_name, output_table, operation, k): + +""" +KNN function to find the K Nearest neighbours +Args: +@param schema_madlib Name of the Madlib Schema +@param point_sourceTraining data table +@param point_column_name Name of the column with training data points. +@param label_column_name Name of the column with labels/values of training data points. +@param test_source Name of the table containing the test data points. +@param test_column_nameName of the column with testing data points. +@param id_column_name Name of the column having ids of data points in test data table. +@param output_tableName of the table to store final results. +@param k default: 1. Number of nearest neighbors to consider + + +Returns: +VARCHAR Name of the output table. +""" + + +oldClientMinMessages = plpy.execute("SELECT setting FROM pg_settings WHERE name = 'client_min_messages'")[0]['setting']; + +plpy.execute("SET client_min_messages TO warning"); + + +k_val = knn_validate_src(schema_madlib, point_source, point_column_name, +label_column_name, test_source, +test_column_name, id_column_name, +output_table, operation, k) + + +plpy.execute("SELECT {schema_madlib}.create_schema_pg_temp()".format(schema_madlib = schema_madlib)); + +x_temp_table = unique_string(desp='x_temp_table') +y_temp_table = unique_string(desp='y_temp_table') +label_column_name_unique = unique_string(desp='label_column_name_unique') +test_id = unique_string(desp='test_id') + +convert_boolean_to_int = ''; +if operation == 'c': +convert_boolean_to_int = '::INTEGER'; + +madlib_knn_interm = unique_string(desp='madlib_knn_interm') + +plpy.execute("""DROP TABLE IF EXISTS pg_temp.{madlib_knn_interm}""".format(**locals())); +plpy.execute( +""" +CREATE TEMP TABLE pg_temp.{madlib_knn_interm} AS +SELECT * +FROM +( +SELECT row_number() over (partition by {test_id} order by dist) AS r , {x_temp_table}.* +FROM +( +SELECT test.{id_column_name} AS {test_id} , {schema_madlib}.squared_dist_norm2(train.{point_column_name} ,test.{test_column_name}) AS dist, train.{label_column_name} {convert_boolean_to_int} AS {label_column_name_unique} +FROM {point_source} AS train, {test_source} AS test +) {x_temp_table} +){y_temp_table} +WHERE {y_temp_table}.r <= {k_val}""".format(**locals())); + +if operation == 'c': +plpy.execute( +""" +CREATE TABLE {output_table} AS +SELECT {test_id} AS id, {test_column_name} , {schema_madlib}.mode({label_column_name_unique}) AS prediction +FROM pg_temp.{madlib_knn_interm} join {test_source} ON {test_id} = {id_column_name} +GROUP BY {test_id} , {test_column_name}""".format(**locals())) + + +else: +plpy.execute( +""" +CREATE TABLE {output_table} AS +SELECT {test_id} AS id, {test_column_name} , avg( {label_column_name_unique} ) AS prediction +FROM +pg_temp.{madlib_knn_interm} join {test_source} on {test_id} ={id_column_name} +GROUP BY {test_id} , {test_column_name} +ORDER BY {test_id}""".format(**locals())) + +
[GitHub] incubator-madlib pull request #168: Code refactoring for KNN
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/168#discussion_r133628535 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -127,5 +124,102 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, label_colum "Data type '{0}' is not a valid type for column '{1}' in table '{2}'.".format(colType, id_column_name, test_source)) return k -# -- -m4_changequote(, ) + + + + +def knn(schema_madlib, point_source, point_column_name, label_column_name, +test_source, test_column_name, id_column_name, output_table, operation, k): + +""" +KNN function to find the K Nearest neighbours +Args: +@param schema_madlib Name of the Madlib Schema +@param point_sourceTraining data table +@param point_column_name Name of the column with training data points. +@param label_column_name Name of the column with labels/values of training data points. +@param test_source Name of the table containing the test data points. +@param test_column_nameName of the column with testing data points. +@param id_column_name Name of the column having ids of data points in test data table. +@param output_tableName of the table to store final results. --- End diff -- Missing details for `operation` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #168: Code refactoring for KNN
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/168#discussion_r133628699 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -127,5 +124,102 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, label_colum "Data type '{0}' is not a valid type for column '{1}' in table '{2}'.".format(colType, id_column_name, test_source)) return k -# -- -m4_changequote(, ) + + + + +def knn(schema_madlib, point_source, point_column_name, label_column_name, +test_source, test_column_name, id_column_name, output_table, operation, k): + +""" +KNN function to find the K Nearest neighbours +Args: +@param schema_madlib Name of the Madlib Schema +@param point_sourceTraining data table +@param point_column_name Name of the column with training data points. +@param label_column_name Name of the column with labels/values of training data points. +@param test_source Name of the table containing the test data points. +@param test_column_nameName of the column with testing data points. +@param id_column_name Name of the column having ids of data points in test data table. +@param output_tableName of the table to store final results. +@param k default: 1. Number of nearest neighbors to consider + + +Returns: +VARCHAR Name of the output table. +""" + + +oldClientMinMessages = plpy.execute("SELECT setting FROM pg_settings WHERE name = 'client_min_messages'")[0]['setting']; + +plpy.execute("SET client_min_messages TO warning"); + + +k_val = knn_validate_src(schema_madlib, point_source, point_column_name, +label_column_name, test_source, +test_column_name, id_column_name, +output_table, operation, k) + + +plpy.execute("SELECT {schema_madlib}.create_schema_pg_temp()".format(schema_madlib = schema_madlib)); + +x_temp_table = unique_string(desp='x_temp_table') +y_temp_table = unique_string(desp='y_temp_table') +label_column_name_unique = unique_string(desp='label_column_name_unique') +test_id = unique_string(desp='test_id') + +convert_boolean_to_int = ''; +if operation == 'c': --- End diff -- Since this comparison is used multiple times, better to create a boolean flag that is equal to this comparison. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #168: Code refactoring for KNN
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/168#discussion_r133628425 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -127,5 +124,102 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, label_colum "Data type '{0}' is not a valid type for column '{1}' in table '{2}'.".format(colType, id_column_name, test_source)) return k -# -- -m4_changequote(, ) + + + + +def knn(schema_madlib, point_source, point_column_name, label_column_name, +test_source, test_column_name, id_column_name, output_table, operation, k): + +""" +KNN function to find the K Nearest neighbours +Args: +@param schema_madlib Name of the Madlib Schema +@param point_sourceTraining data table +@param point_column_name Name of the column with training data points. +@param label_column_name Name of the column with labels/values of training data points. +@param test_source Name of the table containing the test data points. +@param test_column_nameName of the column with testing data points. +@param id_column_name Name of the column having ids of data points in test data table. +@param output_tableName of the table to store final results. +@param k default: 1. Number of nearest neighbors to consider + + +Returns: +VARCHAR Name of the output table. +""" + + +oldClientMinMessages = plpy.execute("SELECT setting FROM pg_settings WHERE name = 'client_min_messages'")[0]['setting']; --- End diff -- Better to use the context manager: `with MinWarning('warning'): ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #165: Release 1.12: Version numbering and upgr...
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/165#discussion_r133625244 --- Diff: src/madpack/changelist.yaml --- @@ -9,27 +9,39 @@ # file installed on the upgrade version. All other files (that don't have # updates), are cleaned up to remove object replacements new module: -# - Changes from 1.10.0 to 1.11 -pagerank: +# - Changes from 1.11 to 1.12 +mlp: +apsp: +bfs: +measures: +wcc: +stratified_sample: # Changes in the types (UDT) including removal and modification udt: - # List of the UDF changes that affect the user externally. This includes change # in function name, return type, argument order or types, or removal of # the function. In each case, the original function is as good as removed and a # new function is created. In such cases, we should abort the upgrade if there # are user views dependent on this function, since the original function will # not be present in the upgraded version. udf: -# - Changes from 1.10.0 to 1.11 -- -- __build_tree: +# - Changes from 1.11 to 1.12 -- +- tree_train: --- End diff -- Are these necessary because of the change in parameter name? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #167: Update RELEASE_NOTES for v1.12 release
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/167 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #167: Update RELEASE_NOTES for v1.12 release
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/167 please retest --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #163: MADLIB-1118. Change tolerance to 1e-2 (from 1e-...
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/163 Jenkins please retest. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #163: MADLIB-1118. Change tolerance to 1e-2 (from 1e-...
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/163 The change looks good. Few comments: - An alternative to changing the threshold is to reduce the max number of iterations. Even with the lower threshold, we're not necessarily guaranteed quicker completion. - The log file can be accessed even with the test passing by adding `-vl` option to the `madpack install-check` command. The options indicate `-v: Verbose` and `-l: Keep logs`. - The install-check itself also provides the run time for execution of the whole file. However, `\timing` is needed if run time for individual queries is desired. @njayaram2 Any idea why the asserts on `log_likelihood` are commented out? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #156: DT: Add option to treat NULL as category
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/156#discussion_r131048326 --- Diff: src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in --- @@ -825,22 +855,34 @@ def _get_bins(schema_madlib, training_table_name, ({{col}})::text AS levels, {{order_fun}} AS dep_avg FROM {training_table_name} -WHERE {filter_null} -AND {{col}} is not NULL +WHERE {filter_str} GROUP BY {{col}} +{union_null_proxy} ) s ) s1 WHERE array_upper(levels, 1) > 1 """.format(training_table_name=training_table_name, - filter_null=filter_null) + filter_str=filter_str, + union_null_proxy=union_null_proxy) + +all_col_expressions = {} +for col in cat_features: +if col in boolean_cats: +all_col_expressions[col] = ("(CASE WHEN " + col + +" THEN 'True' ELSE 'False' END)") +else: +# if null_proxy is not None: +# all_col_expressions[col] = ("COALESCE({0}, {1})". +# format(col, null_proxy)) +# else: --- End diff -- Not needed - will take it out before merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #157: Multiple: Check optimizer_control before updati...
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/157 Added more details in the 2nd commit message which will be used in the final merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #157: Multiple: Check optimizer_control before...
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/157#discussion_r130724667 --- Diff: src/ports/postgres/modules/utilities/control.py_in --- @@ -24,56 +25,66 @@ HAS_FUNCTION_PROPERTIES = m4_ifdef(, , , , ) -class EnableOptimizer(object): +class OptimizerControl(object): """ @brief: A wrapper that enables/disables the optimizer and then sets it back to the original value on exit """ -def __init__(self, to_enable=True): -self.to_enable = to_enable +def __init__(self, enable=True, error_on_fail=False): +self.to_enable = enable +self.error_on_fail = error_on_fail self.optimizer_enabled = False -# we depend on the fact that all GPDB/HAWQ versions that have the + +# use the fact that all GPDB/HAWQ versions that have the # optimizer also define function properties self.guc_exists = True if HAS_FUNCTION_PROPERTIES else False def __enter__(self): -# we depend on the fact that all GPDB/HAWQ versions that have the ORCA -# optimizer also define function properties if self.guc_exists: -optimizer = plpy.execute("show optimizer")[0]["optimizer"] -self.optimizer_enabled = True if optimizer == 'on' else False -plpy.execute("set optimizer={0}".format(('off', 'on')[self.to_enable])) +# check if allowed to change the GUC +self.optimizer_control = bool(strtobool( +plpy.execute("show optimizer_control")[0]["optimizer_control"])) --- End diff -- Good point. Added exception handling for such situations. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #157: Multiple: Check optimizer_control before...
GitHub user iyerr3 opened a pull request: https://github.com/apache/incubator-madlib/pull/157 Multiple: Check optimizer_control before updating optimizer JIRA: MADLIB-1109 This is applicable only for the Greenplum and HAWQ platforms: We disable/enable ORCA using the 'optimizer' GUC in some functions for performance reasons. GPDB/HAWQ has another GUC 'optimizer_control' which allows the user to disable updates to the 'optimizer' GUC. Updating 'optimizer' when 'optimizer_control = off' leads to an ugly error. This commit adds a check for the value of 'optimizer_control' and updates 'optimizer' only if 'optimizer_control = on'. You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib bugfix/optimizer_control Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/157.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #157 commit b86eab834e0f56f5fcb501bf1ef50556000afe8b Author: Rahul Iyer Date: 2017-08-01T18:01:05Z Multiple: Check optimizer_control before updating optimizer JIRA: MADLIB-1109 This is applicable only for the Greenplum and HAWQ platforms: We disable/enable ORCA using the 'optimizer' GUC in some functions for performance reasons. GPDB/HAWQ has another GUC 'optimizer_control' which allows the user to disable updates to the 'optimizer' GUC. Updating 'optimizer' when 'optimizer_control = off' leads to an ugly error. This commit adds a check for the value of 'optimizer_control' and updates 'optimizer' only if 'optimizer_control = on'. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #154: Graph/bugs
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/154 Looks good to merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #154: Graph/bugs
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/154#discussion_r130525989 --- Diff: src/ports/postgres/modules/graph/bfs.py_in --- @@ -47,8 +48,8 @@ def _validate_bfs(vertex_table, vertex_id, edge_table, edge_params, out_table,'BFS') _assert((max_distance >= 0) and isinstance(max_distance,int), -"""Graph BFS: Invalid max_distance type or value ({0}), must be integer, -be greater than or equal to 0 and be less than max allowable integer +"""Graph BFS: Invalid max_distance type or value ({0}), must be integer, +be greater than or equal to 0 and be less than max allowable integer (2147483647).""". --- End diff -- This can be replaced with 'INT_MAX' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #152: Feature/graph measures 1
Github user iyerr3 closed the pull request at: https://github.com/apache/incubator-madlib/pull/152 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #152: Feature/graph measures 1
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/152 Merged with 06788cc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #156: DT: Add option to treat NULL as category
GitHub user iyerr3 opened a pull request: https://github.com/apache/incubator-madlib/pull/156 DT: Add option to treat NULL as category This commit adds an option to treat NULL as a level in the categorical feature. The level is added as a string (instead of a NULL value) to ensure MADlib arrays don't have NULLs in them during the binning procedure. You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib feature/dt_null_handling Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/156.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #156 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #155: Feature: Weakly connected components hel...
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/155#discussion_r129800078 --- Diff: src/ports/postgres/modules/graph/wcc.py_in --- @@ -102,7 +115,8 @@ def wcc(schema_madlib, vertex_table, vertex_id, edge_table, edge_args, toupdate = unique_string(desp='toupdate') temp_out_table = unique_string(desp='tempout') -distribution = '' if is_platform_pg() else "DISTRIBUTED BY ({0})".format(vertex_id) +distribution = '' if is_platform_pg( --- End diff -- please don't do this! If you want multi line then ``` ('' if is_platform_pg() else "DISTRIBUTED BY ({0})".format(vertex_id)) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #155: Feature: Weakly connected components hel...
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/155#discussion_r129799618 --- Diff: src/ports/postgres/modules/graph/wcc.py_in --- @@ -81,18 +88,24 @@ def wcc(schema_madlib, vertex_table, vertex_id, edge_table, edge_args, plpy.execute('SET client_min_messages TO warning') params_types = {'src': str, 'dest': str} default_args = {'src': 'src', 'dest': 'dest'} -edge_params = extract_keyvalue_params(edge_args, params_types, default_args) +edge_params = extract_keyvalue_params( +edge_args, params_types, default_args) -# populate default values for optional params if null +# populate default values for optional params if null, and prepare data +# to be written into the summary table (*_st variable names) if vertex_id is None: --- End diff -- `if not vertex_id`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #155: Feature: Weakly connected components hel...
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/155#discussion_r129800397 --- Diff: src/ports/postgres/modules/graph/wcc.py_in --- @@ -322,10 +617,51 @@ def wcc_help(schema_madlib, message, **kwargs): help_string = get_graph_usage( schema_madlib, 'Weakly Connected Components', -"""out_table TEXT, -- Output table of weakly connected components -grouping_col TEXT -- Comma separated column names to group on - -- (DEFAULT = NULL, no grouping) -""") +"""out_table TEXT, -- Output table of weakly connected components +grouping_col TEXT -- Comma separated column names to group on + -- (DEFAULT = NULL, no grouping) +""") + """ + +Once the above function is used to obtain the out_table, it can be used to --- End diff -- Text below looks nice. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #155: Feature: Weakly connected components hel...
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/155#discussion_r129801476 --- Diff: src/ports/postgres/modules/utilities/validate_args.py_in --- @@ -344,6 +344,50 @@ def get_cols_and_types(tbl): return list(zip(col_names, col_types)) # - +def get_col_type(tbl, col): --- End diff -- There's lots of code overlap with `get_cols_and_types`. We should avoid that redundancy. Also, does `get_expr_type` not work for this need? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #155: Feature: Weakly connected components hel...
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/155#discussion_r129799415 --- Diff: src/ports/postgres/modules/graph/graph_utils.py_in --- @@ -71,6 +70,43 @@ def _grp_from_table(tbl, grp_list): for i in grp_list]) +def validate_output_and_summary_tables(model_out_table, module_name, --- End diff -- The docstring and errors look good. I suggest removing the exclamation (`!`) from end of error messages. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #154: Graph/bugs
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/154#discussion_r129797208 --- Diff: src/ports/postgres/modules/graph/apsp.py_in --- @@ -72,22 +73,25 @@ def graph_apsp(schema_madlib, vertex_table, vertex_id, edge_table, edge_params = extract_keyvalue_params(edge_args, params_types, default_args) # Prepare the input for recording in the summary table -if vertex_id is None: +if (vertex_id is None) or (vertex_id == ''): v_st = "NULL" vertex_id = "id" else: v_st = vertex_id -if edge_args is None: + +if (edge_args is None) or (edge_args == ''): e_st = "NULL" else: e_st = edge_args -if grouping_cols is None: + +if (grouping_cols is None) or (grouping_cols == ''): g_st = "NULL" glist = None else: g_st = grouping_cols glist = split_quoted_delimited_str(grouping_cols) + --- End diff -- Don't need the extra line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #154: Graph/bugs
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/154#discussion_r129797605 --- Diff: src/ports/postgres/modules/graph/bfs.py_in --- @@ -125,35 +127,39 @@ def graph_bfs(schema_madlib, vertex_table, vertex_id, edge_table, default_args) # Prepare the input for recording in the summary table -if vertex_id is None: -v_st= "NULL" +if (vertex_id is None) or (vertex_id == ''): --- End diff -- `if not vertex_id` (similar change in multiple places) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #154: Graph/bugs
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/154#discussion_r129797985 --- Diff: src/ports/postgres/modules/graph/bfs.py_in --- @@ -169,7 +175,7 @@ def graph_bfs(schema_madlib, vertex_table, vertex_id, edge_table, if grouping_cols is not None and grouping_cols is not '': --- End diff -- `if grouping_cols` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #154: Graph/bugs
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/154#discussion_r129796954 --- Diff: src/ports/postgres/modules/graph/apsp.py_in --- @@ -72,22 +73,25 @@ def graph_apsp(schema_madlib, vertex_table, vertex_id, edge_table, edge_params = extract_keyvalue_params(edge_args, params_types, default_args) # Prepare the input for recording in the summary table -if vertex_id is None: +if (vertex_id is None) or (vertex_id == ''): v_st = "NULL" vertex_id = "id" else: v_st = vertex_id -if edge_args is None: + +if (edge_args is None) or (edge_args == ''): --- End diff -- `if not edge_args` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #154: Graph/bugs
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/154#discussion_r129796988 --- Diff: src/ports/postgres/modules/graph/apsp.py_in --- @@ -72,22 +73,25 @@ def graph_apsp(schema_madlib, vertex_table, vertex_id, edge_table, edge_params = extract_keyvalue_params(edge_args, params_types, default_args) # Prepare the input for recording in the summary table -if vertex_id is None: +if (vertex_id is None) or (vertex_id == ''): v_st = "NULL" vertex_id = "id" else: v_st = vertex_id -if edge_args is None: + +if (edge_args is None) or (edge_args == ''): e_st = "NULL" else: e_st = edge_args -if grouping_cols is None: + +if (grouping_cols is None) or (grouping_cols == ''): --- End diff -- `if not grouping_cols` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #154: Graph/bugs
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/154#discussion_r129797858 --- Diff: src/ports/postgres/modules/graph/bfs.py_in --- @@ -125,35 +127,39 @@ def graph_bfs(schema_madlib, vertex_table, vertex_id, edge_table, default_args) # Prepare the input for recording in the summary table -if vertex_id is None: -v_st= "NULL" +if (vertex_id is None) or (vertex_id == ''): +v_st = "NULL" --- End diff -- I suggest using `''` to indicate no input. `"NULL"` as a string is different from `NULL`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #154: Graph/bugs
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/154#discussion_r129796905 --- Diff: src/ports/postgres/modules/graph/apsp.py_in --- @@ -72,22 +73,25 @@ def graph_apsp(schema_madlib, vertex_table, vertex_id, edge_table, edge_params = extract_keyvalue_params(edge_args, params_types, default_args) # Prepare the input for recording in the summary table -if vertex_id is None: +if (vertex_id is None) or (vertex_id == ''): --- End diff -- `if not vertex_id` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #154: Graph/bugs
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/154#discussion_r129797098 --- Diff: src/ports/postgres/modules/graph/apsp.py_in --- @@ -163,15 +167,16 @@ def graph_apsp(schema_madlib, vertex_table, vertex_id, edge_table, # We keep a summary table to keep track of the parameters used for this # APSP run. This table is used in the path finding function to eliminate # the need for repetition. -plpy.execute(""" CREATE TABLE {out_table}_summary ( +summary_table = add_postfix(out_table,"_summary") --- End diff -- space after comma (in multiple places) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #154: Graph/bugs
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/154#discussion_r129798317 --- Diff: src/ports/postgres/modules/graph/pagerank.py_in --- @@ -84,11 +85,11 @@ def pagerank(schema_madlib, vertex_table, vertex_id, edge_table, edge_args, edge_params = extract_keyvalue_params(edge_args, params_types, default_args) # populate default values for optional params if null -if damping_factor is None: +if (damping_factor is None) or (damping_factor == ''): damping_factor = 0.85 -if max_iter is None: +if (max_iter is None) or (max_iter == ''): --- End diff -- Why are `max_iter` and `damping_factor` strings? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #152: Feature/graph measures 1
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/152 Push-forced after rebasing to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #149: MLP: Multilayer Perceptron
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/149#discussion_r128586055 --- Diff: src/modules/convex/task/mlp.hpp --- @@ -0,0 +1,334 @@ +/* --- *//** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * + * @file mlp.hpp + * + * This file contains objective function related computation, which is called + * by classes in algo/, e.g., loss, gradient functions + * + *//* --- */ + +#ifndef MADLIB_MODULES_CONVEX_TASK_MLP_HPP_ +#define MADLIB_MODULES_CONVEX_TASK_MLP_HPP_ + +namespace madlib { + +namespace modules { + +namespace convex { + +// Use Eigen +using namespace madlib::dbal::eigen_integration; + +template +class MLP { +public: +typedef Model model_type; +typedef Tuple tuple_type; +typedef typename Tuple::independent_variables_type +independent_variables_type; +typedef typename Tuple::dependent_variable_type dependent_variable_type; + +static void gradientInPlace( +model_type &model, +const independent_variables_type&y, +const dependent_variable_type &z, +const double&stepsize); + +static double loss( +const model_type&model, +const independent_variables_type&y, +const dependent_variable_type &z); + +static ColumnVector predict( +const model_type&model, +const independent_variables_type&y, +const bool get_class); + +const static int RELU = 0; +const static int SIGMOID = 1; +const static int TANH = 2; + +static double sigmoid(const double &xi) { +return 1. / (1. + std::exp(-xi)); +} + +static double relu(const double &xi) { +return xi*(xi>0); +} + +static double tanh(const double &xi) { +return std::tanh(xi); +} + + +private: + +static double sigmoidDerivative(const double &xi) { +double value = sigmoid(xi); +return value * (1. - value); +} + +static double reluDerivative(const double &xi) { +return xi>0; +} + +static double tanhDerivative(const double &xi) { +double value = tanh(xi); +return 1-value*value; +} + +static void feedForward( +const model_type&model, +const independent_variables_type&y, +std::vector &net, +std::vector &x); + +static void endLayerDeltaError( +const std::vector &net, +const std::vector &x, +const dependent_variable_type &z, +ColumnVector&delta_N); + +static void errorBackPropagation( +const ColumnVector &delta_N, +const std::vector &net, +const model_type&model, +std::vector &delta); +}; + +template +void +MLP::gradientInPlace( +model_type &model, +const independent_variables_type&y, +const dependent_variable_type &z, +const double&stepsize) { +(void) model; +(void) z; +(void) y; +(void) stepsize; +std::vector net; +std::vector x;
[GitHub] incubator-madlib pull request #149: MLP: Multilayer Perceptron
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/149#discussion_r128583687 --- Diff: doc/mainpage.dox.in --- @@ -195,6 +195,9 @@ complete matrix stored as a distributed table. @defgroup grp_robust Robust Variance @} +@defgroup grp_mlp Multilayer Perceptron --- End diff -- This needs to go above `Regression ...` to keep it ordered alphabetically. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #149: MLP: Multilayer Perceptron
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/149#discussion_r128588139 --- Diff: src/ports/postgres/modules/utilities/validate_args.py_in --- @@ -458,6 +458,22 @@ def scalar_col_has_no_null(tbl, col): # - +def array_col_dimension(tbl, col): +""" +What is the dimension of this array column +""" +if tbl is None or tbl.lower() == 'null': +plpy.error('Input error: Table name (NULL) is invalid') +if col is None or col.lower() == 'null': --- End diff -- IMO if we shouldn't be checking for the string 'NULL'. There are multiple strings that are invalid table/column names - why make 'null' an exception? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #149: MLP: Multilayer Perceptron
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/149#discussion_r128583465 --- Diff: doc/design/modules/neural-network.tex --- @@ -0,0 +1,195 @@ +% Licensed to the Apache Software Foundation (ASF) under one +% or more contributor license agreements. See the NOTICE file +% distributed with this work for additional information +% regarding copyright ownership. The ASF licenses this file +% to you under the Apache License, Version 2.0 (the +% "License"); you may not use this file except in compliance +% with the License. You may obtain a copy of the License at + +% http://www.apache.org/licenses/LICENSE-2.0 + +% Unless required by applicable law or agreed to in writing, +% software distributed under the License is distributed on an +% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +% KIND, either express or implied. See the License for the +% specific language governing permissions and limitations +% under the License. + +% When using TeXShop on the Mac, let it know the root document. +% The following must be one of the first 20 lines. +% !TEX root = ../design.tex + +\chapter{Neural Network} + --- End diff -- Let's add `\item[Authors] {Xixuan Feng}` here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #149: MLP: Multilayer Perceptron
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/149#discussion_r128583284 --- Diff: .gitignore --- @@ -1,5 +1,6 @@ # Ignore build directory /build* +/build-docker* --- End diff -- Does the `build*` not cover `build-docker*`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #101: Multiple: Add casting to allow compilati...
Github user iyerr3 closed the pull request at: https://github.com/apache/incubator-madlib/pull/101 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #101: Multiple: Add casting to allow compilation with...
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/101 Closing this for now. Will revisit this in future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #152: Feature/graph measures 1
GitHub user iyerr3 opened a pull request: https://github.com/apache/incubator-madlib/pull/152 Feature/graph measures 1 Note: This PR will have to be rebased after #148 is merged. You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib feature/graph_measures_1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/152.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #152 commit 6461061cfe0b161e0e19eef0d9e7e62c470136a1 Author: Rahul Iyer Date: 2017-07-08T05:23:18Z Graph: Update Python code to follow PEP-8 - Changed indentation to use spaces instead of tabs - Updated to PEP-8 guidelines - Updated to follow style guide convention - Refactored few functions to clean code and design commit 60b0774b71bce90f88f1ddb1573295e6adf0706a Author: Rahul Iyer Date: 2017-06-29T20:31:55Z Graph: Add initial set of centrality measures JIRA: MADLIB-1073 This commit adds the following measures: - Closeness (uses APSP) - Graph diameter (uses APSP) - Average path length (uses APSP) - In/out degrees --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #148: Graph: Update Python code to follow PEP-...
GitHub user iyerr3 opened a pull request: https://github.com/apache/incubator-madlib/pull/148 Graph: Update Python code to follow PEP-8 - Changed indentation to use spaces instead of tabs - Updated to PEP-8 guidelines - Updated to follow style guide convention - Refactored few functions to clean code and design You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib refactor/graph_cleanup Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/148.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #148 commit 13764a0be553e8889c4720843d781dfd2e02a573 Author: Rahul Iyer Date: 2017-07-08T05:23:18Z Graph: Update Python code to follow PEP-8 - Changed indentation to use spaces instead of tabs - Updated to PEP-8 guidelines - Updated to follow style guide convention - Refactored few functions to clean code and design --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #143: Sample: Add stratified sampling
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/143 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #142: DT: Include NULL rows in count for termination ...
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/142 When I ran the RF example thrice - twice I got 100% and once I got 13/14 (~93%). I guess there's some randomness there (which is expected). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #142: DT: Include NULL rows in count for termi...
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/142#discussion_r123622385 --- Diff: src/modules/recursive_partitioning/DT_impl.hpp --- @@ -486,8 +485,18 @@ DecisionTree::expand(const Accumulator &state, Index stats_i = static_cast(state.stats_lookup(i)); assert(stats_i >= 0); -// 1. Set the prediction for current node from stats of all rows -predictions.row(current) = state.node_stats.row(stats_i); +if (statCount(predictions.row(current)) != +statCount(state.node_stats.row(stats_i))){ +// Predictions for each node is set by its parent using stats +// recorded while training parent node. These stats do not include +// rows that had a NULL value for the primary split feature. +// The NULL count is included in the 'node_stats' while training +// current node. Further, presence of NULL rows indicate that +// stats used for deciding 'children_wont_split' are inaccurate. +// Hence avoid using the flag to decide termination. +predictions.row(current) = state.node_stats.row(stats_i); +children_wont_split = false; +} --- End diff -- - `children_wont_split` is **one** of the factors that determines if training should stop after current iteration. `children_wont_split=true` implies training stops; `children_wont_split=false` implies other flags determine termination. - The lines 516-547 are finding the best feature to split on and are necessary - independent of `children_wont_split` and independent of the result of line 490. I could exchange sections 1 and 2 since they're independent, if that helps in reading the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #142: DT: Include NULL rows in count for termination ...
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/142 I need more info on the RF docs discrepancy. - Which example is giving the lower than 100% training accuracy? - How do the trees look? - Can we replicate in decision tree since that is easier to debug? On its own, less than 100% accuracy is not wrong, but if the tree is not as long as it should be (i.e. prematurely terminating) then a problem has been introduced here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #142: DT: Include NULL rows in count for termi...
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/142#discussion_r123597014 --- Diff: src/modules/recursive_partitioning/DT_impl.hpp --- @@ -486,8 +485,18 @@ DecisionTree::expand(const Accumulator &state, Index stats_i = static_cast(state.stats_lookup(i)); assert(stats_i >= 0); -// 1. Set the prediction for current node from stats of all rows -predictions.row(current) = state.node_stats.row(stats_i); +if (statCount(predictions.row(current)) != +statCount(state.node_stats.row(stats_i))){ +// Predictions for each node is set by its parent using stats +// recorded while training parent node. These stats do not include +// rows that had a NULL value for the primary split feature. +// The NULL count is included in the 'node_stats' while training +// current node. Further, presence of NULL rows indicate that +// stats used for deciding 'children_wont_split' are inaccurate. +// Hence avoid using the flag to decide termination. +predictions.row(current) = state.node_stats.row(stats_i); +children_wont_split = false; +} --- End diff -- The if statement is basically checking if a NULL row is present in the `current` node and if yes, then the predictions for that node is updated. I've added an explanation in the comments for both statements on why they're needed. If the explanation is not clear then please add more details on what would help you understand. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #142: DT: Include NULL rows in count for termi...
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/142#discussion_r123596546 --- Diff: src/modules/recursive_partitioning/DT_impl.hpp --- @@ -446,21 +446,20 @@ DecisionTree::updatePrimarySplit( predictions.row(falseChild(node_index)) = false_stats; // true_stats and false_stats only include the tuples for which the primary -// split is NULL. The number of tuples in these stats need to be stored to +// split is not NULL. The number of tuples in these stats need to be stored to // compute a majority branch during surrogate training. uint64_t true_count = statCount(true_stats); uint64_t false_count = statCount(false_stats); -nonnull_split_count(node_index*2) = static_cast(true_count); -nonnull_split_count(node_index*2 + 1) = static_cast(false_count); - -// current node's children won't split if, -// 1. children are pure (responses are too similar to split further) -// 2. children are too small to split further (count < min_split) -bool children_wont_split = (isChildPure(true_stats) && -isChildPure(false_stats) && -true_count < min_split && -false_count < min_split -); +nonnull_split_count(trueChild(node_index)) = static_cast(true_count); +nonnull_split_count(falseChild(node_index)) = static_cast(false_count); + +// current node's child won't split if, +// 1. child is pure (responses are too similar to split further) OR --- End diff -- I've added a new commit with more explanation. The short answer is that the previous logic was incorrect and resulting in longer trees. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #141: Graph: Add Breadth-first Search algorithm with ...
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/141 jenkins, ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #142: DT: Include NULL rows in count for termi...
GitHub user iyerr3 opened a pull request: https://github.com/apache/incubator-madlib/pull/142 DT: Include NULL rows in count for termination check When the primary split feature for a node is computed, the statistics of rows going to the true and false side don't include the rows that have NULL value for this split feature. These "NULL" rows can only be included in the statistics during the next pass when surrogates have been trained. This commit ensures that in the presence of NULL rows, we don't terminate prematurely by comparing with a lower count. You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib bugfix/dt_accurate_termination Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/142.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #142 commit 7213d3008d657df323a577111355aae6354ef663 Author: Rahul Iyer Date: 2017-06-21T06:31:06Z DT: Include NULL rows in count for termination check When the primary split feature for a node is computed, the statistics of rows going to the true and false side don't include the rows that have NULL value for this split feature. These "NULL" rows can only be included in the statistics during the next pass when surrogates have been trained. This commit ensures that in the presence of NULL rows, we don't terminate prematurely by comparing with a lower count. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #138: Summary: Add param to determine num of cols per...
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/138 @rashmi815 Fixed the issues - please check if it looks good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #139: Sketch: Promote sketch methods to top-le...
GitHub user iyerr3 opened a pull request: https://github.com/apache/incubator-madlib/pull/139 Sketch: Promote sketch methods to top-level JIRA: MADLIB-1120 This commit fixes some of the documentation for sketch and moves the module out of "Early stage development". Closes #139 You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib feature/sketch_top_level Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/139.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #139 commit 6a672d48683f6997ff16831bb11841263a54de9e Author: Rahul Iyer Date: 2017-06-06T23:09:30Z Sketch: Promote sketch methods to top-level JIRA: MADLIB-1120 This commit fixes some of the documentation for sketch and moves the module out of "Early stage development". Closes #139 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #138: Summary: Add param to determine num of c...
GitHub user iyerr3 opened a pull request: https://github.com/apache/incubator-madlib/pull/138 Summary: Add param to determine num of cols per run JIRA: MADLIB-1117 Summary used a hard-coded parameter of a maximum of 15 columns per run. This was put in place to avoid out-of-memory errors in most cases. This, however, limits the run time since higher number of columns can be summarized in a single run for a simpler data set (one which leads to smaller sketch data structures). This commit adds a new parameter allowing users to set this limit, while retaining the old default of 15 columns. Closes #138 You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib feature/summary_add_parameter Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/138.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #138 commit 1cca783b63111d004662f314cef67e9be8bb9a92 Author: Rahul Iyer Date: 2017-06-05T23:36:50Z Summary: Add param to determine num of cols per run JIRA: MADLIB-1117 Summary used a hard-coded parameter of a maximum of 15 columns per run. This was put in place to avoid out-of-memory errors in most cases. This, however, limits the run time since higher number of columns can be summarized in a single run for a simpler data set (one which leads to smaller sketch data structures). This commit adds a new parameter allowing users to set this limit, while retaining the old default of 15 columns. Closes #138 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #135: Sketch: Remove per-tuple checks
GitHub user iyerr3 opened a pull request: https://github.com/apache/incubator-madlib/pull/135 Sketch: Remove per-tuple checks Some of the sketch functions have checks running for each tuple in their aggregate. These checks include invalid transition state and invalid types for input data. The checks are important for the functions if run outside an aggregate context, but are a waste of cycles when called as an agg. The checks include caql calls that were estimated to eat a large chunk of the runtime. This work removes these checks - the average time saved is estimated to be around 35% for datasets ranging in size from 10 million to 1 billion tuples. You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib bugfix/sketch_catalog_checks Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/135.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #135 commit 617408a73ef32f25d2f8ae72ce3e9bc78cd10a4a Author: Rahul Iyer Date: 2017-05-16T22:38:08Z Sketch: Remove per-tuple checks Some of the sketch functions have checks running for each tuple in their aggregate. These checks include invalid transition state and invalid types for input data. The checks are important for the functions if run outside an aggregate context, but are a waste of cycles when called as an agg. The checks include caql calls that were estimated to eat a large chunk of the runtime. This work removes these checks - the average time saved is estimated to be around 35% for datasets ranging in size from 10 million to 1 billion tuples. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #134: Two DT/RF enhancements
GitHub user iyerr3 opened a pull request: https://github.com/apache/incubator-madlib/pull/134 Two DT/RF enhancements This PR contains two separate but correlated pieces of work. The 1st commit is a bugfix to filter NULL values in the dependent column. The 2nd commit adds support for array columns as features in DT and RF. You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib feature/dt_array_feature_support Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/134.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #134 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #133: Build: Add CDATA block to avoid invalid ...
GitHub user iyerr3 opened a pull request: https://github.com/apache/incubator-madlib/pull/133 Build: Add CDATA block to avoid invalid xml You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib infra/extract_failed_result Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/133.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #133 commit 798fc9ebf9027d9f44e8482b0ecb2acd2edb3f02 Author: Rahul Iyer Date: 2017-05-11T00:22:12Z Build: Add CDATA block to avoid invalid xml --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #132: DT/RF: Allow array input for features
Github user iyerr3 closed the pull request at: https://github.com/apache/incubator-madlib/pull/132 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #131: RF: Filter NULL dependent values in OOB
Github user iyerr3 closed the pull request at: https://github.com/apache/incubator-madlib/pull/131 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #132: DT/RF: Allow array input for features
GitHub user iyerr3 opened a pull request: https://github.com/apache/incubator-madlib/pull/132 DT/RF: Allow array input for features JIRA: MADLIB-965 Currently array columns are not allowed features in decision tree and random forest train functions. This commit adds support for a mixed list of features: arrays and individual columns of multiple types can be combined into a single list. Each array is expanded to treat each element of the array as a feature. You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib feature/dt_array_feature_support Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/132.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #132 commit 2f1ddee5ab957684988dac575627760a1dfd67bb Author: Rahul Iyer Date: 2017-05-09T21:50:52Z DT/RF: Allow array input for features JIRA: MADLIB-965 Currently array columns are not allowed features in decision tree and random forest train functions. This commit adds support for a mixed list of features: arrays and individual columns of multiple types can be combined into a single list. Each array is expanded to treat each element of the array as a feature. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #131: RF: Filter NULL dependent values in OOB
GitHub user iyerr3 opened a pull request: https://github.com/apache/incubator-madlib/pull/131 RF: Filter NULL dependent values in OOB JIRA: MADLIB-1097 Added `filter_null` string obtained from decision_tree.py into the OOB view to exclude rows that have NULL dependent values. You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib bugfix/rf_null_dep_values Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/131.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #131 commit 9b45ecaaadb9e0d4999dc49e72df8a97cb7692d2 Author: Rahul Iyer Date: 2017-05-04T00:07:55Z RF: Filter NULL dependent values in OOB JIRA: MADLIB-1097 Added `filter_null` string obtained from decision_tree.py into the OOB view to exclude rows that have NULL dependent values. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #129: DT/RF: Allow expressions in feature list
GitHub user iyerr3 opened a pull request: https://github.com/apache/incubator-madlib/pull/129 DT/RF: Allow expressions in feature list JIRA: MADLIB-1087 Changes: - Add numeric as a continuous type - Get data type of features from an expression instead of the table column names - Update to allow expressions in the feature list You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib bugfix/rf_feature_input Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/129.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #129 commit 4d18b07d69ae20475254245d65798b61edce1f31 Author: Rahul Iyer Date: 2017-05-02T19:39:52Z DT/RF: Allow expressions in feature list JIRA: MADLIB-1087 Changes: - Add numeric as a continuous type - Get data type of features from an expression instead of the table column names - Update to allow expressions in the feature list --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #125: DT: Include rows with NULL features in t...
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/125#discussion_r113581023 --- Diff: src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in --- @@ -1582,38 +1578,17 @@ def _create_summary_table( # -def _get_filter_str(schema_madlib, cat_features, con_features, -boolean_cats, dependent_variable, -grouping_cols, max_n_surr=0): +def _get_filter_str(dependent_variable, grouping_cols): --- End diff -- You're right. Updated now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #125: DT: Include rows with NULL features in t...
GitHub user iyerr3 opened a pull request: https://github.com/apache/incubator-madlib/pull/125 DT: Include rows with NULL features in training JIRA: MADLIB-1095 This commit enables the capability of decision tree to include rows with NULL feature values in the training dataset. Features that have NULL values are not used during the training of respective row, but the features with non-null values can be used. You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib bugfix/dt_null_rows Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/125.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #125 commit 7d41ee5f091c5aa56580095b555a6722b519f009 Author: Rahul Iyer Date: 2017-04-26T05:15:35Z DT: Include rows with NULL features in training JIRA: MADLIB-1095 This commit enables the capability of decision tree to include rows with NULL feature values in the training dataset. Features that have NULL values are not used during the training of respective row, but the features with non-null values can be used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #120: DT: Assign memory only for reachable nodes
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/120 @ivannovick you're right. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #120: DT: Assign memory only for reachable nodes
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/120 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #120: DT: Assign memory only for reachable nodes
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/120 @ivannovick Here's an example of how much memory can be saved. The numbers below are for a tree built for the [Poker hand dataset](https://archive.ics.uci.edu/ml/datasets/Poker+Hand), mostly using default parameters and setting `min_splits` and `min_bucket` to `1` to build a deep tree. Currently, memory is allocated for the maximum possible nodes, but only a fraction of that is actually used (indicated in `% usage` column). As expected, this `%` decreases as depth increases, since some nodes stop branching further. With this work, memory will be allocated at each depth only for the fraction of nodes that are actually used. Note, this is for a specific problem/data and results will vary for other datasets. In general, problems that are so big that they hit the 1 GB agg state limit within a couple of tree levels will not benefit for this. | depth | Nodes used | Max nodes (2^k - 1) | % usage | |---|-|-|-| | 2 | 3 | 3 | 100 | | 3 | 7 | 7 | 100 | | 4 | 11 | 15 | 73.3 | | 5 | 17 | 31 | 54.8 | | 6 | 25 | 63 | 39.7 | | 7 | 41 | 127 | 32.2 | | 8 | 73 | 255 | 28.6 | | 9 | 135 | 511 | 26.4 | | 10 | 251 | 1023 | 24.5 | | 11 | 447 | 2047 | 21.8 | --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #124: Bugfix/jenkins xml report
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/124 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #124: Bugfix/jenkins xml report
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/124 Note: Only two files have changed with this commit. For some reason github/master has not updated to upstream (apache). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #124: Bugfix/jenkins xml report
GitHub user iyerr3 opened a pull request: https://github.com/apache/incubator-madlib/pull/124 Bugfix/jenkins xml report You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib bugfix/jenkins_xml_report Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/124.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #124 commit 0d815f2ba3b8421c32a9bfbd7b334285d83fa347 Author: Roman Shaposhnik Date: 2017-04-20T18:02:43Z MADLIB-1076. Review LICENSE file and README.md Closes #123 commit 6a5f60a8a1bce8ce00689f9c11e0636f00fb612e Author: Rahul Iyer Date: 2017-04-21T01:01:05Z Jenkins: Get failure message from install-check FAIL --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #123: MADLIB-1076. Review LICENSE file and REA...
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/123#discussion_r112568270 --- Diff: licenses/MADlib.txt --- @@ -1,10 +0,0 @@ -Portions of this software Copyright (c) 2010-2013 by EMC Corporation. All rights reserved. --- End diff -- Thanks, Roman. Symlink would be the best option if we have to keep the file. Alternatively, we can change the `"${CMAKE_SOURCE_DIR}/licenses/MADlib.txt"` in `deploy/PackageMaker/CMakeLists.txt` to `"${CMAKE_SOURCE_DIR}/LICENSE"` and remove this file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #123: MADLIB-1076. Review LICENSE file and README.md
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/123 Jenkins, OK to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #116: Unnest 2d array
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/116 Jenkins, OK to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #123: MADLIB-1076. Review LICENSE file and REA...
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/123#discussion_r112544856 --- Diff: licenses/MADlib.txt --- @@ -1,10 +0,0 @@ -Portions of this software Copyright (c) 2010-2013 by EMC Corporation. All rights reserved. --- End diff -- Is this file necessary? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #123: MADLIB-1076. Review LICENSE file and REA...
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/123#discussion_r112544703 --- Diff: src/CMakeLists.txt --- @@ -18,10 +18,10 @@ set(BITBUCKET_BASE_URL "${MADLIB_REDIRECT_PREFIX}https://bitbucket.org"; CACHE STRING "Base URL for Bitbucket projects. May be overridden for testing purposes.") -set(GITHUB_MADLIB_BASE_URL +set(EIGEN_BASE_URL --- End diff -- Since this is Eigen specific link, please include the `eigen/archive` part from line 55 in the URL itself. That way it's used specifically for that purpose. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #119: Multiple: Minor changes for GPDB5 and HA...
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/119#discussion_r112535321 --- Diff: src/ports/postgres/modules/elastic_net/test/elastic_net_install_check.sql_in --- @@ -840,27 +840,27 @@ SELECT elastic_net_train( SELECT * FROM house_en; SELECT * FROM house_en_summary; -DROP TABLE if exists house_en, house_en_summary, house_en_cv; -SELECT elastic_net_train( -'lin_housing_wi', -'house_en', -'y', -'x', -'gaussian', -0.1, -0.2, -True, -NULL, -'fista', -$$ eta = 2, max_stepsize = 0.5, use_active_set = f, - n_folds = 3, validation_result=house_en_cv, - n_lambdas = 3, alpha = {0, 0.1, 1}, - warmup = True, warmup_lambdas = {10, 1, 0.1} -$$, -NULL, -100, -1e-6 -); -SELECT * FROM house_en; -SELECT * FROM house_en_summary; -SELECT * FROM house_en_cv; +-- DROP TABLE if exists house_en, house_en_summary, house_en_cv; --- End diff -- Please add a comment here (and all other tests commented out) on why this is done. If there is a JIRA that tracks the progress of these fixes then include that here as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #119: Multiple: Minor changes for GPDB5 and HA...
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/119#discussion_r112534924 --- Diff: src/ports/postgres/modules/graph/sssp.py_in --- @@ -314,9 +314,13 @@ def graph_sssp(schema_madlib, vertex_table, vertex_id, edge_table, {checkg_oo}) UNION SELECT {grp_comma} id, {weight}, parent FROM {oldupdate}; - DROP TABLE {out_table}; - ALTER TABLE {temp_table} RENAME TO {out_table}; - CREATE TABLE {temp_table} AS ( + """ + plpy.execute(sql.format(**locals())) + sql = "DROP TABLE {out_table}" + plpy.execute(sql.format(**locals())) --- End diff -- The above two lines can easily be merged to single statement (same for the ones below). Also avoid using locals() if there are only few variables in the format list. Explicitly adding the variables makes it easy to see their usage. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #119: Multiple: Minor changes for GPDB5 and HA...
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/119#discussion_r112535164 --- Diff: src/ports/postgres/modules/graph/sssp.py_in --- @@ -432,11 +433,17 @@ def graph_sssp(schema_madlib, vertex_table, vertex_id, edge_table, SELECT 1 FROM {oldupdate} as oldupdate WHERE {checkg_oo_sub} - ); - DROP TABLE {out_table}; - ALTER TABLE {temp_table} RENAME TO {out_table};""" - - plpy.execute(sql_del.format(**locals())) + );""" + plpy.execute(sql_del.format(**locals())) + sql_del = "DROP TABLE {out_table}" + plpy.execute(sql_del.format(**locals())) + sql_del = "ALTER TABLE {temp_table} RENAME TO {out_table};" + plpy.execute(sql_del.format(**locals())) --- End diff -- Same as previous comment - all these `plpy.execute` can directly run the string, simplifying the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #117: Decision Tree: Update defaults for max_depth, n...
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/117 The docs/latest corresponds to latest release (1.10) and won't be updated till the next release. We also have [docs/master](http://madlib.incubator.apache.org/docs/master/), which can reflect these changes. We haven't updated those since the 1.10 release - I can update if you're looking for the changes reflected online. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #120: DT: Assign memory only for reachable nodes
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/120 @ivannovick Those are good questions. Short answer is it's problem (data) dependent - the memory reduction depends on how sparse the tree is. I can run some experiments on public classification/regression data and give quantitative numbers on how much we would save in a typical case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #120: DT: Assign memory only for reachable nod...
GitHub user iyerr3 opened a pull request: https://github.com/apache/incubator-madlib/pull/120 DT: Assign memory only for reachable nodes JIRA: MADLIB-1057 TreeAccumulator assigns a matrix to track the statistics of rows reaching the last layer of nodes. This matrix assumes a complete tree and assigns memory for all nodes. As the tree gets deeper, most of the nodes are unreachable, resulting in excessive wasted memory. This commit reduces that waste by only assigning memory for nodes that are reachable and accessing them through a lookup table. You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib feature/dt_reduce_memory Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/120.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #120 commit b1cea55925ee1e3f6569d2d7aafac16e608c43b3 Author: Rahul Iyer Date: 2017-04-15T00:54:31Z Initial commit for sparser stats matrices commit a0875f23ff69f22462a227b500612965976e0358 Author: Rahul Iyer Date: 2017-04-18T20:38:04Z Build lookup index vector commit 67cb1b121a4829f4840f33f7cdc7eabe839ec343 Author: Rahul Iyer Date: 2017-04-19T00:39:24Z Remove warnings --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #119: Multiple: Minor changes for GPDB5 and HAWQ2.2 s...
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/119 OK to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #116: Unnest 2d array
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/116#discussion_r112040453 --- Diff: methods/array_ops/src/pg_gp/array_ops.sql_in --- @@ -636,3 +663,30 @@ CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.array_cum_prod(x anyarray) RETURNS anya AS 'MODULE_PATHNAME', 'array_cum_prod' LANGUAGE C IMMUTABLE m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `NO SQL', `'); + +/** + * @brief This function takes a 2-D array as the input and unnests it + *by one level. + *It returns a set of 1-D arrays that correspond to rows of + *the input array as well as an ID column with values corresponding + *to positions occupied by the 1-D arrays within the 2-D array. + * + * @param x Array x + * @returns Set of 1-D arrays that corrspond to rows of x and an ID column. + * + */ +CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.array_unnest_2d_to_1d(x ANYARRAY, OUT unnest_2d_to_1d_id BIGINT, OUT unnest_2d_to_1d_result ANYARRAY) +RETURNS SETOF RECORD --- End diff -- The "unnest_2d_to_1d_id" can be an INTEGER, we won't have indices bigger than that. Also maybe call it just "id" or "row_id/row_num"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #117: Decision Tree: Update defaults for max_depth, n...
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/117 Reading the commit description again, I would rephrase it as "Reduce the defaults for max_depth to 7 and num_splits to 20 to **minimize the chances of running out of memory** when initializing tree for problems with many features or with features with many categorical values." --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #117: Decision Tree: Update defaults for max_depth, n...
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/117 No, that's a separate JIRA: MADLIB-1057 <https://issues.apache.org/jira/browse/MADLIB-1057>. This one is just about setting the defaults to a more reasonable value considering the data that users have shared. The commit is a little more than just changing two numbers since I updated the way these defaults are set. Previously they were set in overloaded function declaration (in SQL). Changed this to set the default in the main function definition, eliminating redundancy. Thanks, Rahul --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #117: Decision Tree: Update defaults for max_d...
GitHub user iyerr3 opened a pull request: https://github.com/apache/incubator-madlib/pull/117 Decision Tree: Update defaults for max_depth, num_splits Reduce the defaults for max_depth to 7 and num_splits to 20 to ensure we don't run out of memory when initializing tree for problems with many features or with features with many categorical values. You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/117.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #117 commit 352d0c260722980f59a9e0b1e74a0650d0436c29 Author: Rahul Iyer Date: 2017-04-18T18:53:36Z Decision Tree: Update defaults for max_depth, num_splits Reduce the defaults for max_depth to 7 and num_splits to 20 to ensure we don't run out of memory when initializing tree for problems with many features or with features with many categorical values. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #115: Task: Skip install-check for pmml
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/115 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #115: Task: Skip install-check for pmml
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/115 jenkins, ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #111: Decision Tree: Multiple fixes - pruning,...
GitHub user iyerr3 opened a pull request: https://github.com/apache/incubator-madlib/pull/111 Decision Tree: Multiple fixes - pruning, tree_depth, viz Commit includes following changes: - Pruning is not performed when cp = 0 (default behavior) - Integer categorical variable is treated as ordered and hence is not re-ordered (using the response variable) - Visualization is improved: nodes with categorical feature splits only provide the last value in the split, instead of the complete list. This is consistent with the visualization in scikit-learn. - A particular bug is fixed: User input of max_depth starts from 0 and the internal tree_depth starts from 1. This change was not taken into account when tree train termination was checked. You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib bugfix/dt_accuracy_test Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/111.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #111 commit b29f43c56b772325d70f6c2bdaf7660837c32153 Author: Rahul Iyer Date: 2017-04-04T21:55:49Z Decision Tree: Multiple fixes - pruning, tree_depth, viz Commit includes following changes: - Pruning is not performed when cp = 0 (default behavior) - Integer categorical variable is treated as ordered and hence is not re-ordered (using the response variable) - Visualization is improved: nodes with categorical feature splits only provide the last value in the split, instead of the complete list. This is consistent with the visualization in scikit-learn. - A particular bug is fixed: User input of max_depth starts from 0 and the internal tree_depth starts from 1. This change was not taken into account when tree train termination was checked. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request #110: Build: Update pom version for rat check
Github user iyerr3 closed the pull request at: https://github.com/apache/incubator-madlib/pull/110 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #108: Pivot: Add support for array output
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/108 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #110: Build: Update pom version for rat check
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/110 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #110: Build: Update pom version for rat check
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/110 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib issue #110: Build: Update pom version for rat check
Github user iyerr3 commented on the issue: https://github.com/apache/incubator-madlib/pull/110 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---