[ https://issues.apache.org/jira/browse/DATAFU-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877732#comment-13877732 ]
Xiangrui Meng commented on DATAFU-5: ------------------------------------ Will make it backwards compatible and update the patch this weekend. > Update SimpleRandomSample (SRS) to be consistent with > SimpleRandomSampleWithReplacement (SRSWR) > ----------------------------------------------------------------------------------------------- > > Key: DATAFU-5 > URL: https://issues.apache.org/jira/browse/DATAFU-5 > Project: DataFu > Issue Type: Improvement > Reporter: Xiangrui Meng > Attachments: DATAFU-5.patch, DATAFU-5.patch > > > In the current implementation, SRS takes the sampling probability in the > constructor of the UDF, while SRSWR takes the sample size in the function > call. The attached patch updates SRS to make it consistent with SRSWR. > After the patch, SRS takes a bag of items, a desired sampling probability, > and optionally a lower bound of the size of the population as the inputs, > while SRSWR takes a bag of items, a desired sample size, and optionally a > lower bound of the size of the population as the inputs. > Another benefit of the patch is that user doesn't have to create multiple > instances of the UDF to sample with different probabilities. -- This message was sent by Atlassian JIRA (v6.1.5#6160)