-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16911/
-----------------------------------------------------------

(Updated Jan. 25, 2014, 8:10 p.m.)


Review request for DataFu and Matthew Hayes.


Changes
-------

Make the change backward compatible. The old tests were moved to 
SimpleRandomSampleTestOld and marked deprecated.


Repository: datafu


Description
-------

In the current implementation, SRS takes the sampling probability in the 
constructor of the UDF, while SRSWR takes the sample size in the function call. 
The attached patch updates SRS to make it consistent with SRSWR.
After the patch, SRS takes a bag of items, a desired sampling probability, and 
optionally a lower bound of the size of the population as the inputs, while 
SRSWR takes a bag of items, a desired sample size, and optionally a lower bound 
of the size of the population as the inputs.
Another benefit of the patch is that user doesn't have to create multiple 
instances of the UDF to sample with different probabilities.

It is a rewrite of the UDF, so better check the new file directly instead of 
the diff.


Diffs (updated)
-----

  src/java/datafu/pig/sampling/SimpleRandomSample.java aff088a 
  test/pig/datafu/test/pig/sampling/SimpleRandomSampleTest.java 7a0ced2 
  test/pig/datafu/test/pig/sampling/SimpleRandomSampleTestOld.java PRE-CREATION 

Diff: https://reviews.apache.org/r/16911/diff/


Testing
-------

unit tests


Thanks,

Xiangrui Meng

Reply via email to