[ 
https://issues.apache.org/jira/browse/DATAFU-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874361#comment-13874361
 ] 

Xiangrui Meng edited comment on DATAFU-5 at 1/17/14 2:56 AM:
-------------------------------------------------------------

See attached patch for the second rev.


was (Author: mengxr):
I cannot upload the generated diff to the review board, so I uploaded
a normal diff.



> Update SimpleRandomSample (SRS) to be consistent with 
> SimpleRandomSampleWithReplacement (SRSWR)
> -----------------------------------------------------------------------------------------------
>
>                 Key: DATAFU-5
>                 URL: https://issues.apache.org/jira/browse/DATAFU-5
>             Project: DataFu
>          Issue Type: Improvement
>            Reporter: Xiangrui Meng
>         Attachments: DATAFU-5.patch, DATAFU-5.patch
>
>
> In the current implementation, SRS takes the sampling probability in the 
> constructor of the UDF, while SRSWR takes the sample size in the function 
> call. The attached patch updates SRS to make it consistent with SRSWR. 
> After the patch, SRS takes a bag of items, a desired sampling probability, 
> and optionally a lower bound of the size of the population as the inputs, 
> while SRSWR takes a bag of items, a desired sample size, and optionally a 
> lower bound of the size of the population as the inputs.
> Another benefit of the patch is that user doesn't have to create multiple 
> instances of the UDF to sample with different probabilities. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to