[GitHub] madlib pull request #260: minibatch preprocessor improvements

kaknikhil Tue, 10 Apr 2018 13:46:27 -0700

GitHub user kaknikhil opened a pull request:

    https://github.com/apache/madlib/pull/260


    minibatch preprocessor improvements

    This PR makes two improvements to the preprocessor code
    
    1. Check for all character types for dependent col
    2. Create temp table for standardization.
    
    See the commit for more details

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/madlib/madlib 
feature/minibatch-preprocessing-improvements

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/madlib/pull/260.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #260
    
----
commit d5e996a1eb3ea1d28151b48e435f40a3a764aa51
Author: Nikhil Kak <nkak@...>
Date:   2018-04-06T18:35:16Z

    Utilities: Add functions for postgres character/boolean type comparison.
    
    This commit adds two functions to check if a given type matches one of the 
predefined postgres character or boolean types.

commit 0f6ca99f4de32f1a235fed612d3b74bf822ef3f9
Author: Nikhil Kak <nkak@...>
Date:   2018-04-06T18:42:41Z

    MiniBatch Preprocessor: Check for all character types for dependent col
    
    This commit enables support for dependent column type to be any of the 
postgres character
    types instead of just `text`.

commit e3462580b7d43589c8a52244029e056ce182a529
Author: Nikhil Kak <nkak@...>
Date:   2018-04-06T20:55:46Z

    Minibatch Preprocessor: Create temp table for standardization.
    
    We did a few experiments and the results proved that creating a temp table 
for standardization is faster than using a subquery.
    This commit now creates a temp table for the standardization.
    Before this commit, we were calling the `utils_normalize_data` function 
inside the main query but now we create a temp table from the
    output of `utils_normalize_data` and use the table in the main query.

----


---

[GitHub] madlib pull request #260: minibatch preprocessor improvements

Reply via email to