[GitHub] madlib pull request #241: MiniBatch Pre-Processor: Add new module minibatch_...

kaknikhil Wed, 14 Mar 2018 15:09:13 -0700

GitHub user kaknikhil opened a pull request:

    https://github.com/apache/madlib/pull/241


    MiniBatch Pre-Processor: Add new module minibatch_preprocessing

    JIRA: MADLIB-1200
    
    MiniBatch Preprocessor is a utility function to pre-process the input
    data for use with models that support mini-batching as an optimization.
    TODO add more description here ??
    
    The main purpose of the function is to prepare the training data for 
minibatching algorithms.
    1. If the dependent variable is boolean or text, perform one hot encoding.  
N/A for numeric.
    2. Typecast independent variable to double precision[]
    2. Based on the buffer size, group all the dependent and independent 
variables in a single tuple representative of the buffer.
    
    Notes
    1. Ignore null values in independent and dependent variables
    2. Standardize the input before packing it.
    
    Other changes:
    1. Removed __ from public methods in utils_regularization.py
    Renamed __utils_ind_var_scales and __utils_ind_var_scales_grouping
    so that we can access them from within a class, more specifically
    the minibatch_preprocessing module.
    2. Added new function for regex match and refactored elastic_net.py_in to 
use this function
    
    Co-authored-by: Rahul Iyer <[email protected]>
    Co-authored-by: Jingyi Mei <[email protected]>
    Co-authored-by: Nandish Jayaram <[email protected]>
    Co-authored-by: Orhan Kislal <[email protected]>

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/madlib/madlib feature/minibatch_preprocessing

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/madlib/pull/241.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #241
    
----
commit 7e89d4097d1d889adfa2eff3ed6217c75b519427
Author: Nikhil Kak <nkak@...>
Date:   2018-01-24T20:01:40Z

    MiniBatch Pre-Processor: Add new module minibatch_preprocessing
    
    JIRA: MADLIB-1200
    
    MiniBatch Preprocessor is a utility function to pre-process the input
    data for use with models that support mini-batching as an optimization.
    TODO add more description here ??
    
    The main purpose of the function is to prepare the training data for 
minibatching algorithms.
    1. If the dependent variable is boolean or text, perform one hot encoding.  
N/A for numeric.
    2. Typecast independent variable to double precision[]
    2. Based on the buffer size, group all the dependent and independent 
variables in a single tuple representative of the buffer.
    
    Notes
    1. Ignore null values in independent and dependent variables
    2. Standardize the input before packing it.
    
    Other changes:
    1. Removed __ from public methods in utils_regularization.py
    Rename __utils_ind_var_scales and __utils_ind_var_scales_grouping
    so that we can access them from within a class, more specifically
    the minibatch_preprocessing module.
    2. Added new function for regex match and refactored elastic_net.py_in to 
use this function
    
    Co-authored-by: Rahul Iyer <[email protected]>
    Co-authored-by: Jingyi Mei <[email protected]>
    Co-authored-by: Nandish Jayaram <[email protected]>
    Co-authored-by: Orhan Kislal <[email protected]>

----


---

[GitHub] madlib pull request #241: MiniBatch Pre-Processor: Add new module minibatch_...

Reply via email to