[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...
Github user asfgit closed the pull request at: https://github.com/apache/madlib/pull/250 ---
[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...
Github user njayaram2 commented on a diff in the pull request: https://github.com/apache/madlib/pull/250#discussion_r178167691 --- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in --- @@ -667,7 +678,8 @@ def _validate_dependent_var(source_table, dependent_varname, if is_classification: # Currently, classification doesn't accept an --- End diff -- Yes, that's correct. ---
[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...
Github user njayaram2 commented on a diff in the pull request: https://github.com/apache/madlib/pull/250#discussion_r178168389 --- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in --- @@ -856,8 +868,16 @@ def mlp_predict(schema_madlib, model_table, data_table, id_col_name, activation = _get_activation_index(summary['activation']) layer_sizes = PY2SQL( summary['layer_sizes'], array_type="DOUBLE PRECISION") -is_classification = int(summary["is_classification"]) is_response = int(pred_type == 'response') +is_classification = int(summary["is_classification"]) +classes = summary['classes'] +# Set a flag to indicate that it is a classification model, with an array +# as the dependent var. The only scenario where classification allows for +# an array dep var is when the user has provided a one-hot encoded dep var +# during training, and mlp_classification does not one-hot encode +# (and hence classes column in model's summary table is NULL). +array_dep_var_for_classification = int(is_classification and not classes) --- End diff -- Will change it to `is_dep_var_an_array_for_classification` ---
[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...
Github user njayaram2 commented on a diff in the pull request: https://github.com/apache/madlib/pull/250#discussion_r178168247 --- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in --- @@ -667,7 +678,8 @@ def _validate_dependent_var(source_table, dependent_varname, if is_classification: # Currently, classification doesn't accept an # array for dep type in IGD -_assert("[]" not in expr_type and expr_type in classification_types, +_assert(("[]" in expr_type and expr_type[:-2] in int_types) \ --- End diff -- - One-hot encoding is normally supposed to be an integer array, hence was checking only for int type. But, that assumption might be too strict, so will make changes to allow for other numeric types too. - Good catch. It might be a good idea to disallow anything greater than a 1-D array. Will make an error check for the same. ---
[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...
Github user njayaram2 commented on a diff in the pull request: https://github.com/apache/madlib/pull/250#discussion_r178168263 --- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in --- @@ -667,7 +678,8 @@ def _validate_dependent_var(source_table, dependent_varname, if is_classification: # Currently, classification doesn't accept an # array for dep type in IGD -_assert("[]" not in expr_type and expr_type in classification_types, +_assert(("[]" in expr_type and expr_type[:-2] in int_types) \ +or expr_type in classification_types, "Dependent variable column should be of type: " --- End diff -- Will change it to cover both the cases. ---
[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...
Github user kaknikhil commented on a diff in the pull request: https://github.com/apache/madlib/pull/250#discussion_r177912997 --- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in --- @@ -667,7 +678,8 @@ def _validate_dependent_var(source_table, dependent_varname, if is_classification: # Currently, classification doesn't accept an --- End diff -- This comment is not relevant anymore. ---
[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...
Github user kaknikhil commented on a diff in the pull request: https://github.com/apache/madlib/pull/250#discussion_r177912480 --- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in --- @@ -667,7 +678,8 @@ def _validate_dependent_var(source_table, dependent_varname, if is_classification: # Currently, classification doesn't accept an # array for dep type in IGD -_assert("[]" not in expr_type and expr_type in classification_types, +_assert(("[]" in expr_type and expr_type[:-2] in int_types) \ --- End diff -- why are we checking for only int types and not all numeric types ? ---
[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...
Github user kaknikhil commented on a diff in the pull request: https://github.com/apache/madlib/pull/250#discussion_r177913869 --- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in --- @@ -856,8 +868,16 @@ def mlp_predict(schema_madlib, model_table, data_table, id_col_name, activation = _get_activation_index(summary['activation']) layer_sizes = PY2SQL( summary['layer_sizes'], array_type="DOUBLE PRECISION") -is_classification = int(summary["is_classification"]) is_response = int(pred_type == 'response') +is_classification = int(summary["is_classification"]) +classes = summary['classes'] +# Set a flag to indicate that it is a classification model, with an array +# as the dependent var. The only scenario where classification allows for +# an array dep var is when the user has provided a one-hot encoded dep var +# during training, and mlp_classification does not one-hot encode +# (and hence classes column in model's summary table is NULL). +array_dep_var_for_classification = int(is_classification and not classes) --- End diff -- can we change the name of `array_dep_var_for_classification` to something like`is_dep_var_an_array` so that it's clear that it's a flag. ---
[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...
GitHub user njayaram2 opened a pull request: https://github.com/apache/madlib/pull/250 MLP: Allow one-hot encoded dependent var for classification JIRA:MADLIB-1222 MLP currently automatically encodes categorical variables for classification but does not allow already encoded arrays for dependent variables in mlp_classification. This commit lets users have an already encoded array for the dependent variable and train a model. You can merge this pull request into a Git repository by running: $ git pull https://github.com/madlib/madlib feature/mlp/support-encoded-dep-var Alternatively you can review and apply these changes as the patch at: https://github.com/apache/madlib/pull/250.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #250 commit f5a87dee6bc8f27c1a13a4921ea726b391b1813d Author: Nandish JayaramDate: 2018-03-20T22:43:25Z MLP: Allow one-hot encoded dependent var for classification JIRA:MADLIB-1222 MLP currently automatically encodes categorical variables for classification but does not allow already encoded arrays for dependent variables in mlp_classification. This commit lets users have an already encoded array for the dependent variable and train a model. ---