Alex Herbert created RNG-185:
--------------------------------

             Summary: ArraySampler to have factory methods to sample from arrays
                 Key: RNG-185
                 URL: https://issues.apache.org/jira/browse/RNG-185
             Project: Commons RNG
          Issue Type: Wish
          Components: sample
    Affects Versions: 1.6
            Reporter: Alex Herbert


The ArraySampler currently offers shuffle support for arrays, similar to the 
ListSampler which shuffles a List.

It does not offer an equivalent method to sample a subset from a list. The 
ListSampler API is:

 
{code:java}
// Sample a List of size k from the input list
public static <T> List<T> sample(UniformRandomProvider rng,
                                 List<T> collection,
                                 int k){code}
The subset is chosen using a permutation from the PermutationSampler. This 
method is static and each invocation creates a new PermutationSampler. That 
class maintains an array of indices for all elements of the list. Thus repeat 
invocation must recreate this list.

 

An improvement would be:
 * Return a Sampler<double[]>
 * Allow choice between a permutation (the order of the sample does matter) or 
a combination (the order of the sample does not matter)

A suggested API would be:

 
{code:java}
public static ObjectSampler<double[]> 
    permutationSampler(UniformRandomProvider rng,
                       double[] array,
                       int k)
public static ObjectSampler<double[]>
    combinationSampler(UniformRandomProvider rng,
                       double[] array,
                       int k) {code}
To implement this for all array types is a lot of repeat boiler plate code, and 
currently does not have a use case to merit its inclusion. Note that sampling 
of this type for any array can be performed using e.g.:

 

 
{code:java}
final PermutationSampler s = new PermutationSampler(rng, array.length, k); 

ObjectSampler<double[]> sampler = () -> {
    final int[] indices = s.sample();
    final double[] sample = new double[indices.length];
    for (int i = 0; i < sample.length; i++) {
        sample[i] = array[indices[i]];
    }
    return sample;
};{code}
Note that one advantage of a direct implementation is that the indices array 
created by the PermutationSampler can be created as a subset of the input array 
using the same method. This removes generation of an int[] for each sample. 
This would be effectively extending the package-private method in 
SubsetSamplerUtils that performs a partial shuffle of an array to all array 
types:
{code:java}
static int[] partialSample(int[] domain,
                           int steps,
                           UniformRandomProvider rng,
                           boolean upper){code}
That method is used by both the PermutationSampler and CombinationSampler to 
partially shuffle the indices. The choice to return the upper or lower half of 
the part-shuffled array is an optimisation for the CombinationSampler.

This ticket is a placeholder for discussion on this type of functionality and 
possible use cases.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to