[GitHub] madlib pull request #260: minibatch preprocessor improvements
Github user asfgit closed the pull request at: https://github.com/apache/madlib/pull/260 ---
[GitHub] madlib pull request #260: minibatch preprocessor improvements
Github user kaknikhil commented on a diff in the pull request: https://github.com/apache/madlib/pull/260#discussion_r180851151 --- Diff: src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in --- @@ -387,6 +397,7 @@ class MiniBatchStandardizer: ) as {ind_colname} FROM {source_table} """.format( +standardized_table = self.standardized_table, --- End diff -- yes, #259 also made a few changes related to this. Will update all that are remaining ---
[GitHub] madlib pull request #260: minibatch preprocessor improvements
Github user njayaram2 commented on a diff in the pull request: https://github.com/apache/madlib/pull/260#discussion_r180603956 --- Diff: src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in --- @@ -397,8 +408,9 @@ class MiniBatchStandardizer: x_std_dev_str = self.x_std_dev_str) return query -def _get_query_for_standardizing_with_grouping(self): +def _create_table_for_standardizing_with_grouping(self): --- End diff -- Why was the method name changed? The older name seems to be more apt, since this function is still returning the query, and not executing it (the same for `_create_table_for_standardizing_without_grouping()` too). ---
[GitHub] madlib pull request #260: minibatch preprocessor improvements
GitHub user kaknikhil opened a pull request: https://github.com/apache/madlib/pull/260 minibatch preprocessor improvements This PR makes two improvements to the preprocessor code 1. Check for all character types for dependent col 2. Create temp table for standardization. See the commit for more details You can merge this pull request into a Git repository by running: $ git pull https://github.com/madlib/madlib feature/minibatch-preprocessing-improvements Alternatively you can review and apply these changes as the patch at: https://github.com/apache/madlib/pull/260.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #260 commit d5e996a1eb3ea1d28151b48e435f40a3a764aa51 Author: Nikhil KakDate: 2018-04-06T18:35:16Z Utilities: Add functions for postgres character/boolean type comparison. This commit adds two functions to check if a given type matches one of the predefined postgres character or boolean types. commit 0f6ca99f4de32f1a235fed612d3b74bf822ef3f9 Author: Nikhil Kak Date: 2018-04-06T18:42:41Z MiniBatch Preprocessor: Check for all character types for dependent col This commit enables support for dependent column type to be any of the postgres character types instead of just `text`. commit e3462580b7d43589c8a52244029e056ce182a529 Author: Nikhil Kak Date: 2018-04-06T20:55:46Z Minibatch Preprocessor: Create temp table for standardization. We did a few experiments and the results proved that creating a temp table for standardization is faster than using a subquery. This commit now creates a temp table for the standardization. Before this commit, we were calling the `utils_normalize_data` function inside the main query but now we create a temp table from the output of `utils_normalize_data` and use the table in the main query. ---