----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16911/ -----------------------------------------------------------
(Updated Jan. 25, 2014, 8:10 p.m.) Review request for DataFu and Matthew Hayes. Changes ------- Make the change backward compatible. The old tests were moved to SimpleRandomSampleTestOld and marked deprecated. Repository: datafu Description ------- In the current implementation, SRS takes the sampling probability in the constructor of the UDF, while SRSWR takes the sample size in the function call. The attached patch updates SRS to make it consistent with SRSWR. After the patch, SRS takes a bag of items, a desired sampling probability, and optionally a lower bound of the size of the population as the inputs, while SRSWR takes a bag of items, a desired sample size, and optionally a lower bound of the size of the population as the inputs. Another benefit of the patch is that user doesn't have to create multiple instances of the UDF to sample with different probabilities. It is a rewrite of the UDF, so better check the new file directly instead of the diff. Diffs (updated) ----- src/java/datafu/pig/sampling/SimpleRandomSample.java aff088a test/pig/datafu/test/pig/sampling/SimpleRandomSampleTest.java 7a0ced2 test/pig/datafu/test/pig/sampling/SimpleRandomSampleTestOld.java PRE-CREATION Diff: https://reviews.apache.org/r/16911/diff/ Testing ------- unit tests Thanks, Xiangrui Meng
