[madlib] 02/04: DL: Fix validation in fit, fit multiple, evaluate and predict
This is an automated email from the ASF dual-hosted git repository. nkak pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/madlib.git commit bdc67ec12f0263deaac5e2728f0c01521bd3b9ea Author: Nikhil Kak AuthorDate: Fri Jan 22 16:43:01 2021 -0800 DL: Fix validation in fit, fit multiple, evaluate and predict JIRA: MADLIB-1464 Previously while calling fit/fit_multiple/evaluate/predict with invalid input and output tables (null or missing), we would print the wrong error message. This commit refactors the code so that we print the expected error message. Refactored the validator code such that we don't need to create the info and summary table names in the fit multiple class. Instead we do that in the validator and then the validator object can be used to get the table names. This makes it easier to validate all the tables inside the validator class. This commit also refactors the code so that we move all the validation code inside the validator class except for the source table validation since that needs to be validated before we call the get_data_distribution_per_segment function which has to be called before the validator constructor. To test this, we created a plpython function that asserts that the query failed with the expected error message. Added a couple of wrapper function on top of this function that test for null input and output tables. Co-authored-by: Ekta Khanna --- .../modules/deep_learning/madlib_keras.py_in | 63 ++ .../madlib_keras_fit_multiple_model.py_in | 82 +++- .../deep_learning/madlib_keras_predict.py_in | 3 +- .../deep_learning/madlib_keras_validator.py_in | 222 ++--- .../test/madlib_keras_evaluate.sql_in | 9 + .../deep_learning/test/madlib_keras_fit.sql_in | 42 .../test/madlib_keras_model_selection.sql_in | 37 .../test/madlib_keras_multi_io.sql_in | 25 +++ .../deep_learning/test/madlib_keras_predict.sql_in | 20 ++ .../test/madlib_keras_predict_byom.sql_in | 27 +++ .../test/unit_tests/test_madlib_keras.py_in| 33 ++- .../postgres/modules/utilities/utilities.sql_in| 26 +++ 12 files changed, 355 insertions(+), 234 deletions(-) diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras.py_in b/src/ports/postgres/modules/deep_learning/madlib_keras.py_in index 49892b6..c4f8611 100644 --- a/src/ports/postgres/modules/deep_learning/madlib_keras.py_in +++ b/src/ports/postgres/modules/deep_learning/madlib_keras.py_in @@ -103,6 +103,7 @@ def fit(schema_madlib, source_table, model, model_arch_table, fit_params = "" if not fit_params else fit_params _assert(compile_params, "Compile parameters cannot be empty or NULL.") +input_tbl_valid(source_table, module_name) segments_per_host = get_data_distribution_per_segment(source_table) use_gpus = use_gpus if use_gpus else False if use_gpus: @@ -114,51 +115,27 @@ def fit(schema_madlib, source_table, model, model_arch_table, if object_table is not None: object_table = "{0}.{1}".format(schema_madlib, quote_ident(object_table)) - -source_summary_table = add_postfix(source_table, "_summary") -input_tbl_valid(source_summary_table, module_name) -src_summary_dict = get_source_summary_table_dict(source_summary_table) - -columns_dict = {} -columns_dict['mb_dep_var_cols'] = src_summary_dict['dependent_varname'] -columns_dict['mb_indep_var_cols'] = src_summary_dict['independent_varname'] -columns_dict['dep_shape_cols'] = [add_postfix(i, "_shape") for i in columns_dict['mb_dep_var_cols']] -columns_dict['ind_shape_cols'] = [add_postfix(i, "_shape") for i in columns_dict['mb_indep_var_cols']] - -multi_dep_count = len(columns_dict['mb_dep_var_cols']) -val_dep_var = None -val_ind_var = None - -val_dep_shape_cols = None -val_ind_shape_cols = None -if validation_table: -validation_summary_table = add_postfix(validation_table, "_summary") -input_tbl_valid(validation_summary_table, module_name) -val_summary_dict = get_source_summary_table_dict(validation_summary_table) - -val_dep_var = val_summary_dict['dependent_varname'] -val_ind_var = val_summary_dict['independent_varname'] -val_dep_shape_cols = [add_postfix(i, "_shape") for i in val_dep_var] -val_ind_shape_cols = [add_postfix(i, "_shape") for i in val_ind_var] - fit_validator = FitInputValidator( source_table, validation_table, model, model_arch_table, model_id, -columns_dict['mb_dep_var_cols'], columns_dict['mb_indep_var_cols'], -columns_dict['dep_shape_cols'], columns_dict['ind_shape_cols'], num_iterations, metrics_compute_frequency, warm_start, -use_gpus, accessible_gpus_for_seg, object_table, -val_dep_var, val_ind_va
[madlib] 02/04: DL: Fix validation in fit, fit multiple, evaluate and predict
This is an automated email from the ASF dual-hosted git repository. nkak pushed a commit to branch dl/fit-mult-null-table-rebase-in-progress in repository https://gitbox.apache.org/repos/asf/madlib.git commit 17073577229c185a7a2ab453776e1ac781374655 Author: Nikhil Kak AuthorDate: Fri Jan 22 16:43:01 2021 -0800 DL: Fix validation in fit, fit multiple, evaluate and predict JIRA: MADLIB-1464 Previously while calling fit/fit_multiple/evaluate/predict with invalid input and output tables (null or missing), we would print the wrong error message. This commit refactors the code so that we print the expected error message. Refactored the validator code such that we don't need to create the info and summary table names in the fit multiple class. Instead we do that in the validator and then the validator object can be used to get the table names. This makes it easier to validate all the tables inside the validator class. This commit also refactors the code so that we move all the validation code inside the validator class except for the source table validation since that needs to be validated before we call the get_data_distribution_per_segment function which has to be called before the validator constructor. To test this, we created a plpython function that asserts that the query failed with the expected error message. Added a couple of wrapper function on top of this function that test for null input and output tables. Co-authored-by: Ekta Khanna --- .../modules/deep_learning/madlib_keras.py_in | 63 ++ .../madlib_keras_fit_multiple_model.py_in | 82 +++- .../deep_learning/madlib_keras_predict.py_in | 3 +- .../deep_learning/madlib_keras_validator.py_in | 222 ++--- .../test/madlib_keras_evaluate.sql_in | 9 + .../deep_learning/test/madlib_keras_fit.sql_in | 42 .../test/madlib_keras_model_selection.sql_in | 37 .../test/madlib_keras_multi_io.sql_in | 25 +++ .../deep_learning/test/madlib_keras_predict.sql_in | 20 ++ .../test/madlib_keras_predict_byom.sql_in | 27 +++ .../test/unit_tests/test_madlib_keras.py_in| 33 ++- .../postgres/modules/utilities/utilities.sql_in| 26 +++ 12 files changed, 355 insertions(+), 234 deletions(-) diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras.py_in b/src/ports/postgres/modules/deep_learning/madlib_keras.py_in index 49892b6..c4f8611 100644 --- a/src/ports/postgres/modules/deep_learning/madlib_keras.py_in +++ b/src/ports/postgres/modules/deep_learning/madlib_keras.py_in @@ -103,6 +103,7 @@ def fit(schema_madlib, source_table, model, model_arch_table, fit_params = "" if not fit_params else fit_params _assert(compile_params, "Compile parameters cannot be empty or NULL.") +input_tbl_valid(source_table, module_name) segments_per_host = get_data_distribution_per_segment(source_table) use_gpus = use_gpus if use_gpus else False if use_gpus: @@ -114,51 +115,27 @@ def fit(schema_madlib, source_table, model, model_arch_table, if object_table is not None: object_table = "{0}.{1}".format(schema_madlib, quote_ident(object_table)) - -source_summary_table = add_postfix(source_table, "_summary") -input_tbl_valid(source_summary_table, module_name) -src_summary_dict = get_source_summary_table_dict(source_summary_table) - -columns_dict = {} -columns_dict['mb_dep_var_cols'] = src_summary_dict['dependent_varname'] -columns_dict['mb_indep_var_cols'] = src_summary_dict['independent_varname'] -columns_dict['dep_shape_cols'] = [add_postfix(i, "_shape") for i in columns_dict['mb_dep_var_cols']] -columns_dict['ind_shape_cols'] = [add_postfix(i, "_shape") for i in columns_dict['mb_indep_var_cols']] - -multi_dep_count = len(columns_dict['mb_dep_var_cols']) -val_dep_var = None -val_ind_var = None - -val_dep_shape_cols = None -val_ind_shape_cols = None -if validation_table: -validation_summary_table = add_postfix(validation_table, "_summary") -input_tbl_valid(validation_summary_table, module_name) -val_summary_dict = get_source_summary_table_dict(validation_summary_table) - -val_dep_var = val_summary_dict['dependent_varname'] -val_ind_var = val_summary_dict['independent_varname'] -val_dep_shape_cols = [add_postfix(i, "_shape") for i in val_dep_var] -val_ind_shape_cols = [add_postfix(i, "_shape") for i in val_ind_var] - fit_validator = FitInputValidator( source_table, validation_table, model, model_arch_table, model_id, -columns_dict['mb_dep_var_cols'], columns_dict['mb_indep_var_cols'], -columns_dict['dep_shape_cols'], columns_dict['ind_shape_cols'], num_iterations, metrics_compute_frequency, warm_start, -use_gpus, accessible_gpus_for_seg, object_tabl