GitHub user kaknikhil opened a pull request: https://github.com/apache/madlib/pull/281
Support special characters in MLP, minibatch preprocessor and encode_categorical Support special characters in MLP, minibatch preprocessor and encode_categorical JIRA: MADLIB-1237 JIRA: MADLIB-1238 JIRA: MADLIB-1238 JIRA: MADLIB-1243 The module that needs to support special characters will have to call quote_literal() on all the column values that need to be escaped and quoted and then this list can be passed to the py_list_to_sql_string function We also created a function called get_distinct_col_levels which will call quote_literal and then return a list of escaped column levels. The output of this function can then be safely passed to py_list_to_sql_string with long_format set as True. Co-Authored-by: Jingyi Mei <j...@pivotal.io> Co-Authored-by: Rahul Iyer <ri...@apache.org> Co-Authored-by: Arvind Sridhar <asrid...@pivotal.io> You can merge this pull request into a Git repository by running: $ git pull https://github.com/madlib/madlib bug_minibatch_preprocessor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/madlib/pull/281.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #281 ---- commit 4713b24eac1c27ba09cd1152e502b02bb1e13da4 Author: Jingyi Mei <jmei@...> Date: 2018-05-23T23:29:54Z MLP+Minibatch Preprocessing: Support special characters JIRA: MADLIB-1237 JIRA: MADLIB-1238 This commit enables special character support for column names and column values for mlp and minibatch preprocessor. We decided to use the following strategy for supporting special characters The module that needs to support special characters will have to call quote_literal() on all the column values that need to be escaped and quoted and then this list can be passed to the py_list_to_sql_string function We also created a function called get_distinct_col_levels which will call quote_literal and then return a list of escaped column levels. The output of this function can then be safely passed to py_list_to_sql_string with long_format set as True. Co-Authored-by: Jingyi Mei <j...@pivotal.io> Co-Authored-by: Rahul Iyer <ri...@apache.org> Co-Authored-by: Arvind Sridhar <asrid...@pivotal.io> commit d24cdfe1dbdcfe8ba2379a70a52cafeeba994c0e Author: Arvind Sridhar <asridhar@...> Date: 2018-05-24T00:02:43Z Encode categorical variables: handling special characters JIRA: MADLIB-1238 JIRA: MADLIB-1243 This commit deals with special characters in column name and column values. Also adds install check test cases to cover these scenarios. Co-Authored-by: Jingyi Mei <j...@pivotal.io> Co-Authored-by: Arvind Sridhar <asrid...@pivotal.io> commit 262e796a9cb17c612c1e844ee7354be1abd11f5d Author: Nikhil Kak <nkak@...> Date: 2018-06-18T21:30:20Z Cleanup: Remove unnecessary unit tests. All the unit tests in utilties.py_in were moved to test_utilities.py_in. ---- ---