Xiangrui Meng created DATAFU-5:
----------------------------------
Summary: Update SimpleRandomSample (SRS) to be consistent with
SimpleRandomSampleWithReplacement (SRSWR)
Key: DATAFU-5
URL: https://issues.apache.org/jira/browse/DATAFU-5
Project: DataFu
Issue Type: Improvement
Reporter: Xiangrui Meng
In the current implementation, SRS takes the sampling probability in the
constructor of the UDF, while SRSWR takes the sample size in the function call.
The attached patch updates SRS to make it consistent with SRSWR.
After the patch, SRS takes a bag of items, a desired sampling probability, and
optionally a lower bound of the size of the population as the inputs, while
SRSWR takes a bag of items, a desired sample size, and optionally a lower bound
of the size of the population as the inputs.
Another benefit of the patch is that user doesn't have to create multiple
instances of the UDF to sample with different probabilities.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)