[
https://issues.apache.org/jira/browse/DATAFU-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872886#comment-13872886
]
Matthew Hayes edited comment on DATAFU-5 at 1/16/14 12:46 AM:
--------------------------------------------------------------
https://reviews.apache.org/r/16911/
was (Author: matterhayes):
https://reviews.apache.org/r/16895/
> Update SimpleRandomSample (SRS) to be consistent with
> SimpleRandomSampleWithReplacement (SRSWR)
> -----------------------------------------------------------------------------------------------
>
> Key: DATAFU-5
> URL: https://issues.apache.org/jira/browse/DATAFU-5
> Project: DataFu
> Issue Type: Improvement
> Reporter: Xiangrui Meng
> Attachments: DATAFU-5.patch
>
>
> In the current implementation, SRS takes the sampling probability in the
> constructor of the UDF, while SRSWR takes the sample size in the function
> call. The attached patch updates SRS to make it consistent with SRSWR.
> After the patch, SRS takes a bag of items, a desired sampling probability,
> and optionally a lower bound of the size of the population as the inputs,
> while SRSWR takes a bag of items, a desired sample size, and optionally a
> lower bound of the size of the population as the inputs.
> Another benefit of the patch is that user doesn't have to create multiple
> instances of the UDF to sample with different probabilities.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)