[
https://issues.apache.org/jira/browse/DATAFU-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872904#comment-13872904
]
Matthew Hayes commented on DATAFU-5:
------------------------------------
I'm starting to look through this now. One thing I wanted to call out though
is that because this is an incompatible API change, if we are following
semantic versioning then we should bump the major version to 2.0. But if we
are going to update all our package names from datafu.* to org.apache.datafu.*
then we will be bumping the major version anyways :) We should probably open
up a discussion on this latter point to confirm others feel the same.
> Update SimpleRandomSample (SRS) to be consistent with
> SimpleRandomSampleWithReplacement (SRSWR)
> -----------------------------------------------------------------------------------------------
>
> Key: DATAFU-5
> URL: https://issues.apache.org/jira/browse/DATAFU-5
> Project: DataFu
> Issue Type: Improvement
> Reporter: Xiangrui Meng
> Attachments: DATAFU-5.patch
>
>
> In the current implementation, SRS takes the sampling probability in the
> constructor of the UDF, while SRSWR takes the sample size in the function
> call. The attached patch updates SRS to make it consistent with SRSWR.
> After the patch, SRS takes a bag of items, a desired sampling probability,
> and optionally a lower bound of the size of the population as the inputs,
> while SRSWR takes a bag of items, a desired sample size, and optionally a
> lower bound of the size of the population as the inputs.
> Another benefit of the patch is that user doesn't have to create multiple
> instances of the UDF to sample with different probabilities.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)