Alex Herbert created RNG-185: -------------------------------- Summary: ArraySampler to have factory methods to sample from arrays Key: RNG-185 URL: https://issues.apache.org/jira/browse/RNG-185 Project: Commons RNG Issue Type: Wish Components: sample Affects Versions: 1.6 Reporter: Alex Herbert
The ArraySampler currently offers shuffle support for arrays, similar to the ListSampler which shuffles a List. It does not offer an equivalent method to sample a subset from a list. The ListSampler API is: {code:java} // Sample a List of size k from the input list public static <T> List<T> sample(UniformRandomProvider rng, List<T> collection, int k){code} The subset is chosen using a permutation from the PermutationSampler. This method is static and each invocation creates a new PermutationSampler. That class maintains an array of indices for all elements of the list. Thus repeat invocation must recreate this list. An improvement would be: * Return a Sampler<double[]> * Allow choice between a permutation (the order of the sample does matter) or a combination (the order of the sample does not matter) A suggested API would be: {code:java} public static ObjectSampler<double[]> permutationSampler(UniformRandomProvider rng, double[] array, int k) public static ObjectSampler<double[]> combinationSampler(UniformRandomProvider rng, double[] array, int k) {code} To implement this for all array types is a lot of repeat boiler plate code, and currently does not have a use case to merit its inclusion. Note that sampling of this type for any array can be performed using e.g.: {code:java} final PermutationSampler s = new PermutationSampler(rng, array.length, k); ObjectSampler<double[]> sampler = () -> { final int[] indices = s.sample(); final double[] sample = new double[indices.length]; for (int i = 0; i < sample.length; i++) { sample[i] = array[indices[i]]; } return sample; };{code} Note that one advantage of a direct implementation is that the indices array created by the PermutationSampler can be created as a subset of the input array using the same method. This removes generation of an int[] for each sample. This would be effectively extending the package-private method in SubsetSamplerUtils that performs a partial shuffle of an array to all array types: {code:java} static int[] partialSample(int[] domain, int steps, UniformRandomProvider rng, boolean upper){code} That method is used by both the PermutationSampler and CombinationSampler to partially shuffle the indices. The choice to return the upper or lower half of the part-shuffled array is an optimisation for the CombinationSampler. This ticket is a placeholder for discussion on this type of functionality and possible use cases. -- This message was sent by Atlassian Jira (v8.20.10#820010)