-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16911/#review32156
-----------------------------------------------------------


The recent changes look good to me.  It seems like this change to the UDF could 
be made in a backwards compatible way though.  A probability passed into the 
constructor could be treated as a default value if no probability is passed 
when the UDF is invoked.  

- Matthew Hayes


On Jan. 17, 2014, 2:49 a.m., Xiangrui Meng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16911/
> -----------------------------------------------------------
> 
> (Updated Jan. 17, 2014, 2:49 a.m.)
> 
> 
> Review request for DataFu and Matthew Hayes.
> 
> 
> Repository: datafu
> 
> 
> Description
> -------
> 
> In the current implementation, SRS takes the sampling probability in the 
> constructor of the UDF, while SRSWR takes the sample size in the function 
> call. The attached patch updates SRS to make it consistent with SRSWR.
> After the patch, SRS takes a bag of items, a desired sampling probability, 
> and optionally a lower bound of the size of the population as the inputs, 
> while SRSWR takes a bag of items, a desired sample size, and optionally a 
> lower bound of the size of the population as the inputs.
> Another benefit of the patch is that user doesn't have to create multiple 
> instances of the UDF to sample with different probabilities.
> 
> It is a rewrite of the UDF, so better check the new file directly instead of 
> the diff.
> 
> 
> Diffs
> -----
> 
>   src/java/datafu/pig/sampling/SimpleRandomSample.java 98d4e58 
>   test/pig/datafu/test/pig/sampling/SimpleRandomSampleTest.java 8ca9d5d 
> 
> Diff: https://reviews.apache.org/r/16911/diff/
> 
> 
> Testing
> -------
> 
> unit tests
> 
> 
> Thanks,
> 
> Xiangrui Meng
> 
>

Reply via email to