[
https://issues.apache.org/jira/browse/DATAFU-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874361#comment-13874361
]
Xiangrui Meng commented on DATAFU-5:
------------------------------------
I cannot upload the generated diff to the review board, so I uploaded
a normal diff.
> Update SimpleRandomSample (SRS) to be consistent with
> SimpleRandomSampleWithReplacement (SRSWR)
> -----------------------------------------------------------------------------------------------
>
> Key: DATAFU-5
> URL: https://issues.apache.org/jira/browse/DATAFU-5
> Project: DataFu
> Issue Type: Improvement
> Reporter: Xiangrui Meng
> Attachments: DATAFU-5.patch
>
>
> In the current implementation, SRS takes the sampling probability in the
> constructor of the UDF, while SRSWR takes the sample size in the function
> call. The attached patch updates SRS to make it consistent with SRSWR.
> After the patch, SRS takes a bag of items, a desired sampling probability,
> and optionally a lower bound of the size of the population as the inputs,
> while SRSWR takes a bag of items, a desired sample size, and optionally a
> lower bound of the size of the population as the inputs.
> Another benefit of the patch is that user doesn't have to create multiple
> instances of the UDF to sample with different probabilities.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)