GitHub user njayaram2 opened a pull request: https://github.com/apache/madlib/pull/230
Balanced sets final Refactor code, and add keep_null parameter. You can merge this pull request into a Git repository by running: $ git pull https://github.com/njayaram2/madlib balanced_sets_final Alternatively you can review and apply these changes as the patch at: https://github.com/apache/madlib/pull/230.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #230 ---- commit 87f6ffa4c9d1fcfafda5735adc7b76561dec6d9b Author: Swatisoni <soniswati.2010@...> Date: 2018-01-10T20:07:36Z Balance datasets : re-sampling technique JIRA:MADLIB-1168 Additional Authors: Orhan Kislal okis...@pivotal.io Jingyi Mei j...@pivotal.io Balanced datasets Phase 1 and Phase 2 implementation which performs balanced sampling in following specified re-sampling techniques 1. Under-sampling the majority class(es), with- and without replacement 2. Over-sampling the minority class 3. Combining over- and under-sampling - Uniform sampling of all classes (default case) 4. Create ensemble balanced sets - Re-sampling given comma-delimited string of specific class and respective sample sizes 5. IC tests Balanced sampling with grouping functionality will be implemented in phase 3 commit 40d1275504a107e7ae8809ab7f37f0aaa8ed0799 Author: Jingyi Mei <jmei@...> Date: 2018-01-12T00:33:51Z troubleshoot float issue commit 01ade3c8dbc108237aec0866060f6ee5acaacaac Author: Jingyi Mei <jmei@...> Date: 2018-01-12T17:54:18Z troubleshoot float issue 2 commit 276b3b8628488eaa281688e5115658cc8318abfa Author: Jingyi Mei <jmei@...> Date: 2018-01-22T19:25:07Z Wip for refactor commit f97e2742bb5a0150328798db064c3ab21c335def Author: Rahul Iyer <riyer@...> Date: 2018-01-23T01:13:16Z Refactor sampling strategy and class counts commit 445cbfe4d79c48c103b36f1d7bafdd171afb390b Author: Nandish Jayaram <njayaram@...> Date: 2018-01-23T02:04:40Z refactor WIP commit a0db130061bf23e857ae85d602db7f6937c71c58 Author: Nandish Jayaram <njayaram@...> Date: 2018-01-23T20:50:40Z update some code and add unit test cases for it commit c15c5b537aaf233b6da906a988f8f7257fb9e83c Author: Nandish Jayaram <njayaram@...> Date: 2018-01-23T23:31:08Z handle all cases in _get_target_class_sizes commit 1e4165eae46200109219920df276774c2b44ec29 Author: Rahul Iyer <riyer@...> Date: 2018-01-24T01:26:59Z Refactor get_target_sizes function commit 36e4b75edc48ed3e7c7f64ad1944b0d12db191a4 Author: Nandish Jayaram <njayaram@...> Date: 2018-01-24T18:29:12Z add comments, rename some variable, and undo changes to utilities.py_in from a previous commit commit e643e6d8a70ad211152a6a41b64e6b35eee31f32 Author: Nandish Jayaram <njayaram@...> Date: 2018-01-24T23:26:06Z done with creating strategy specific count dict, and subquery for no sample commit b50a13330e3b7ceff71165f1385d7117f0f1a047 Author: Rahul Iyer <riyer@...> Date: 2018-01-25T21:42:46Z Add with_replacement subquery generation commit b37a775af9e816cded3e3f35b8ada3bb8e9fbcf0 Author: Nandish Jayaram <njayaram@...> Date: 2018-01-26T00:29:23Z most coding done???? yet to test. commit 02113b98b670e0118d1766215f8d9619d951c2d3 Author: Nandish Jayaram <njayaram@...> Date: 2018-01-26T19:02:29Z fix issues wip commit 714db2a68afc325292748b3146afc07d81ab813e Author: Rahul Iyer <riyer@...> Date: 2018-01-26T20:26:23Z Fix some errors to pass IC, update docs commit 1dc6c0600b171a475b74e19fa5d39fbd070313ec Author: Nandish Jayaram <njayaram@...> Date: 2018-01-27T01:27:46Z replace Poisson based sampling with row_number() commit d67a4f6e926aa1a5189716bb3f06ab53ef0d0cc4 Author: Nandish Jayaram <njayaram@...> Date: 2018-01-30T01:35:02Z add new param for keep_null, code to handle that scenario, and some install check test cases to test it commit cc8f6cd8816470c68df9beccc6141fa2fad4a62c Author: Nandish Jayaram <njayaram@...> Date: 2018-01-30T20:00:39Z Add more validations, test cases, and rename a function commit f4d02c67a106dea902a5926fe2cc266ab9d44e0f Author: Nandish Jayaram <njayaram@...> Date: 2018-01-30T22:35:30Z reverting changes to stratified_sample.sql_in ---- ---