[ https://issues.apache.org/jira/browse/DATAFU-5?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthew Hayes updated DATAFU-5: ------------------------------- Fix Version/s: 1.3.0 > Update SimpleRandomSample (SRS) to be consistent with > SimpleRandomSampleWithReplacement (SRSWR) > ----------------------------------------------------------------------------------------------- > > Key: DATAFU-5 > URL: https://issues.apache.org/jira/browse/DATAFU-5 > Project: DataFu > Issue Type: Improvement > Reporter: Xiangrui Meng > Assignee: Xiangrui Meng > Fix For: 1.3.0 > > Attachments: > 0001-update-SimpleRandomSample-to-be-consistent-with-Simp.patch, > DATAFU-5.patch, DATAFU-5.patch, DATAFU-5.patch > > > In the current implementation, SRS takes the sampling probability in the > constructor of the UDF, while SRSWR takes the sample size in the function > call. The attached patch updates SRS to make it consistent with SRSWR. > After the patch, SRS takes a bag of items, a desired sampling probability, > and optionally a lower bound of the size of the population as the inputs, > while SRSWR takes a bag of items, a desired sample size, and optionally a > lower bound of the size of the population as the inputs. > Another benefit of the patch is that user doesn't have to create multiple > instances of the UDF to sample with different probabilities. -- This message was sent by Atlassian JIRA (v6.1.5#6160)