GitHub user njayaram2 opened a pull request:

    https://github.com/apache/madlib/pull/230

    Balanced sets final

    Refactor code, and add keep_null parameter.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/njayaram2/madlib balanced_sets_final

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/madlib/pull/230.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #230
    
----
commit 87f6ffa4c9d1fcfafda5735adc7b76561dec6d9b
Author: Swatisoni <soniswati.2010@...>
Date:   2018-01-10T20:07:36Z

    Balance datasets : re-sampling technique
    
    JIRA:MADLIB-1168
    
    Additional Authors:
    Orhan Kislal okis...@pivotal.io
    Jingyi Mei j...@pivotal.io
    
    Balanced datasets Phase 1 and Phase 2 implementation which performs 
balanced sampling in following specified re-sampling techniques
            1. Under-sampling the majority class(es), with- and without 
replacement
            2. Over-sampling the minority class
            3. Combining over- and under-sampling
                -  Uniform sampling of all classes (default case)
        4. Create ensemble balanced sets
                - Re-sampling given comma-delimited string of specific class 
and respective sample sizes
            5. IC tests
    
    Balanced sampling with grouping functionality will be implemented in phase 3

commit 40d1275504a107e7ae8809ab7f37f0aaa8ed0799
Author: Jingyi Mei <jmei@...>
Date:   2018-01-12T00:33:51Z

    troubleshoot float issue

commit 01ade3c8dbc108237aec0866060f6ee5acaacaac
Author: Jingyi Mei <jmei@...>
Date:   2018-01-12T17:54:18Z

    troubleshoot float issue 2

commit 276b3b8628488eaa281688e5115658cc8318abfa
Author: Jingyi Mei <jmei@...>
Date:   2018-01-22T19:25:07Z

    Wip for refactor

commit f97e2742bb5a0150328798db064c3ab21c335def
Author: Rahul Iyer <riyer@...>
Date:   2018-01-23T01:13:16Z

    Refactor sampling strategy and class counts

commit 445cbfe4d79c48c103b36f1d7bafdd171afb390b
Author: Nandish Jayaram <njayaram@...>
Date:   2018-01-23T02:04:40Z

    refactor WIP

commit a0db130061bf23e857ae85d602db7f6937c71c58
Author: Nandish Jayaram <njayaram@...>
Date:   2018-01-23T20:50:40Z

    update some code and add unit test cases for it

commit c15c5b537aaf233b6da906a988f8f7257fb9e83c
Author: Nandish Jayaram <njayaram@...>
Date:   2018-01-23T23:31:08Z

    handle all cases in _get_target_class_sizes

commit 1e4165eae46200109219920df276774c2b44ec29
Author: Rahul Iyer <riyer@...>
Date:   2018-01-24T01:26:59Z

    Refactor get_target_sizes function

commit 36e4b75edc48ed3e7c7f64ad1944b0d12db191a4
Author: Nandish Jayaram <njayaram@...>
Date:   2018-01-24T18:29:12Z

    add comments, rename some variable, and undo changes to utilities.py_in 
from a previous commit

commit e643e6d8a70ad211152a6a41b64e6b35eee31f32
Author: Nandish Jayaram <njayaram@...>
Date:   2018-01-24T23:26:06Z

    done with creating strategy specific count dict, and subquery for no sample

commit b50a13330e3b7ceff71165f1385d7117f0f1a047
Author: Rahul Iyer <riyer@...>
Date:   2018-01-25T21:42:46Z

    Add with_replacement subquery generation

commit b37a775af9e816cded3e3f35b8ada3bb8e9fbcf0
Author: Nandish Jayaram <njayaram@...>
Date:   2018-01-26T00:29:23Z

    most coding done???? yet to test.

commit 02113b98b670e0118d1766215f8d9619d951c2d3
Author: Nandish Jayaram <njayaram@...>
Date:   2018-01-26T19:02:29Z

    fix issues wip

commit 714db2a68afc325292748b3146afc07d81ab813e
Author: Rahul Iyer <riyer@...>
Date:   2018-01-26T20:26:23Z

    Fix some errors to pass IC, update docs

commit 1dc6c0600b171a475b74e19fa5d39fbd070313ec
Author: Nandish Jayaram <njayaram@...>
Date:   2018-01-27T01:27:46Z

    replace Poisson based sampling with row_number()

commit d67a4f6e926aa1a5189716bb3f06ab53ef0d0cc4
Author: Nandish Jayaram <njayaram@...>
Date:   2018-01-30T01:35:02Z

    add new param for keep_null, code to handle that scenario, and some install 
check test cases to test it

commit cc8f6cd8816470c68df9beccc6141fa2fad4a62c
Author: Nandish Jayaram <njayaram@...>
Date:   2018-01-30T20:00:39Z

    Add more validations, test cases, and rename a function

commit f4d02c67a106dea902a5926fe2cc266ab9d44e0f
Author: Nandish Jayaram <njayaram@...>
Date:   2018-01-30T22:35:30Z

    reverting changes to stratified_sample.sql_in

----


---

Reply via email to