[ 
https://issues.apache.org/jira/browse/DATAFU-5?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Hayes updated DATAFU-5:
-------------------------------

    Assignee: Xiangrui Meng

> Update SimpleRandomSample (SRS) to be consistent with 
> SimpleRandomSampleWithReplacement (SRSWR)
> -----------------------------------------------------------------------------------------------
>
>                 Key: DATAFU-5
>                 URL: https://issues.apache.org/jira/browse/DATAFU-5
>             Project: DataFu
>          Issue Type: Improvement
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>         Attachments: 
> 0001-update-SimpleRandomSample-to-be-consistent-with-Simp.patch, 
> DATAFU-5.patch, DATAFU-5.patch, DATAFU-5.patch
>
>
> In the current implementation, SRS takes the sampling probability in the 
> constructor of the UDF, while SRSWR takes the sample size in the function 
> call. The attached patch updates SRS to make it consistent with SRSWR. 
> After the patch, SRS takes a bag of items, a desired sampling probability, 
> and optionally a lower bound of the size of the population as the inputs, 
> while SRSWR takes a bag of items, a desired sample size, and optionally a 
> lower bound of the size of the population as the inputs.
> Another benefit of the patch is that user doesn't have to create multiple 
> instances of the UDF to sample with different probabilities. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to