[
https://issues.apache.org/jira/browse/DATAFU-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883179#comment-13883179
]
Xiangrui Meng commented on DATAFU-5:
------------------------------------
Thanks for correcting the header!
> Update SimpleRandomSample (SRS) to be consistent with
> SimpleRandomSampleWithReplacement (SRSWR)
> -----------------------------------------------------------------------------------------------
>
> Key: DATAFU-5
> URL: https://issues.apache.org/jira/browse/DATAFU-5
> Project: DataFu
> Issue Type: Improvement
> Reporter: Xiangrui Meng
> Attachments:
> 0001-update-SimpleRandomSample-to-be-consistent-with-Simp.patch,
> DATAFU-5.patch, DATAFU-5.patch, DATAFU-5.patch
>
>
> In the current implementation, SRS takes the sampling probability in the
> constructor of the UDF, while SRSWR takes the sample size in the function
> call. The attached patch updates SRS to make it consistent with SRSWR.
> After the patch, SRS takes a bag of items, a desired sampling probability,
> and optionally a lower bound of the size of the population as the inputs,
> while SRSWR takes a bag of items, a desired sample size, and optionally a
> lower bound of the size of the population as the inputs.
> Another benefit of the patch is that user doesn't have to create multiple
> instances of the UDF to sample with different probabilities.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)