----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16911/#review32199 -----------------------------------------------------------
+1 to Matt's comment. This could be made backwards compatible. Otherwise it looks good. - Sam Shah On Jan. 17, 2014, 2:49 a.m., Xiangrui Meng wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/16911/ > ----------------------------------------------------------- > > (Updated Jan. 17, 2014, 2:49 a.m.) > > > Review request for DataFu and Matthew Hayes. > > > Repository: datafu > > > Description > ------- > > In the current implementation, SRS takes the sampling probability in the > constructor of the UDF, while SRSWR takes the sample size in the function > call. The attached patch updates SRS to make it consistent with SRSWR. > After the patch, SRS takes a bag of items, a desired sampling probability, > and optionally a lower bound of the size of the population as the inputs, > while SRSWR takes a bag of items, a desired sample size, and optionally a > lower bound of the size of the population as the inputs. > Another benefit of the patch is that user doesn't have to create multiple > instances of the UDF to sample with different probabilities. > > It is a rewrite of the UDF, so better check the new file directly instead of > the diff. > > > Diffs > ----- > > src/java/datafu/pig/sampling/SimpleRandomSample.java 98d4e58 > test/pig/datafu/test/pig/sampling/SimpleRandomSampleTest.java 8ca9d5d > > Diff: https://reviews.apache.org/r/16911/diff/ > > > Testing > ------- > > unit tests > > > Thanks, > > Xiangrui Meng > >