+1 to updating RNG to Java 8 Gary
On Fri, Jun 14, 2019 at 10:45 AM Alex Herbert <alex.d.herb...@gmail.com> wrote: > > On 14/06/2019 13:29, sebb wrote: > > On Fri, 14 Jun 2019 at 12:57, Alex Herbert <alex.d.herb...@gmail.com> > wrote: > >> > >> On 14/06/2019 12:01, sebb wrote: > >>> I meant that the iterator would use the shuffled and/or selected > >>> indices to return the appropriate entry from the original array. > >>> > >>> No need to modify the original array. > >> To shuffle an array requires storage as elements are swapped. Either > >> store an index or store the data. So for primitive types with 4-bytes or > >> less storage it is more efficient (memory wise) to copy the array and > >> shuffle in place. For those with size of 8-bytes then you can create an > >> int array of indices and shuffle that. The index can be used to take the > >> sample from the original array. > >> > >> In terms of the RNG library the functionality to shuffle an entire array > >> once (as per ArrayUtils) would be the same as the functions from [lang] > >> ArrayUtils. So a decision should be made on whether to copy that > >> functionality from [lang] and mark it as deprecated there. > >> > >> The idea of unlimited shuffling would be done as a sampler so this would > >> go into the rng-sampling module. A single shuffle creates a permutation > >> of the sequence. The sample would do this dynamically and repeatedly > >> pass over the array. > >> > >> The sort of code would be: > >> > >> double[] array = { 1.1, 2.2, 3.3 }; > >> UniformRandomProvider rng = ...; > >> DoubleContinuousPermutationSampler sampler = new > >> DoubleContinuousPermutationSampler(rng, array); > >> for (int i = 0; i < 100; i++) { > >> double sample = sampler.next(); > >> } > >> > >> The output would be the n values of the array in a random order without > >> replacement. Once the entire set n is produced then the output would > >> repeat in a new random order, and so on: > >> > >> 2.2, 1.1, 3.3, 3.3, 2.2, 1.1, 2.2, 3.3, 1.1, ... > >> > >> Currently the library does not specialise for primitives so a more > >> generic sampler would only output a stream of int indices: > >> > >> double[] array = { 1.1, 2.2, 3.3 }; > >> UniformRandomProvider rng = ...; > >> ContinuousPermutationSampler sampler = new > >> ContinuousPermutationSampler(rng, array.length); > >> for (int i = 0; i < 100; i++) { > >> double sample = array[sampler.next()]; > >> } > >> > >> If you are only interested in a complete single pass then creating a > >> shuffled int[] of indices can be done using the existing > >> PermutationSampler. This returns an int[] permutation from its sample > >> method. The array can be used in place of an iterator: > >> > >> double[] array = { 1.1, 2.2, 3.3 }; > >> UniformRandomProvider rng = ...; > >> for (int index : new PermutationSampler(rng, array.length, > >> array.length).sample()) { > >> double sample = array[index]; > >> } > > My proposal was basically to implement the above in wrapper methods > > using an interface such as List or Iterator. > You then have boxing of primitives. Commons RNG is not on JDK 8 so we > cannot use the primitive specialisations of iterator for int, long and > double. > > The basic Iterator does not support more functionality than a for loop > over a primitive array. The extra from iterator is remove() which is not > functionality we are discussing. > > This would be nice: > > IntStream PermutationSampler.stream(UniformRandomProvider rng, int n) > > The stream would be shuffled indices. Without JDK 8 you use the 1 liner: > > IntStream.of(new PermutationSampler(rng, array.length, > array.length).sample()); > > But with the lack of support for Stream materialisation when a > terminating operation is called (i.e. the shuffle will be done even when > the stream is not consumed). Ensuring lazy initialisation would require > some atomic state in a PrimitiveIterator.OfInt implementation used to > construct the stream. Any speed advantage of parallel streams would not > help the shuffle as the full shuffle must be done before sub sections of > the shuffled array can be used. > > I'd put all this on the maybe pile. > > > > >> I'm not opposed to the continuous permutation sampler idea. I just can't > >> think of a use case not already satisfied by the existing > >> PermutationSampler, i.e. when you want to sample permutations using each > >> index one at a time and not in chunks of indices. This type of idea: > >> > >> double[] array = { 1.1, 2.2, 3.3 }; > >> UniformRandomProvider rng; > >> // The PermutationSampler will validate the length is >0 > >> final PermutationSampler sampler = new PermutationSampler(rng, > >> array.length, array.length); > >> IntSupplier supplier = new IntSupplier() { > >> int i; > >> int[] sample; > >> @Override > >> public int getAsInt() { > >> if (i == 0) { > >> sample = sampler.sample(); > >> i = sample.length; > >> } > >> return sample[--i]; > >> } > >> }; > >> for (;;) { > >> double sample = array[supplier.getAsInt()]; > >> } > >> > >> > >>> On Thu, 13 Jun 2019 at 18:15, Eric Barnhill <ericbarnh...@gmail.com> > wrote: > >>>> An iterator that dynamically shuffles as you go along. That's really > nice, > >>>> I had never even thought of that. Thanks. > >>>> > >>>> On Thu, Jun 13, 2019 at 10:11 AM Alex Herbert < > alex.d.herb...@gmail.com> > >>>> wrote: > >>>> > >>>>> On 13/06/2019 17:56, Eric Barnhill wrote: > >>>>>> On Thu, Jun 13, 2019 at 9:36 AM sebb <seb...@gmail.com> wrote: > >>>>>> > >>>>>>> Rather than shuffle etc in place, how about various > >>>>>>> iterators/selectors to return entries in randomised order? > >>>>>>> [Or does that already exist?] > >>>>>>> > >>>>>> I am pretty sure random draws, and shuffling, are implemented with > >>>>>> different algorithms. Though sampling without replacement the full > length > >>>>>> of the set would yield a shuffled set, I think there are more > efficient > >>>>>> ways to shuffle a set. > >>>>> Iterators to return a random draw *without* replacement over the full > >>>>> length of the array? The iterator would dynamically shuffle the > array on > >>>>> each call to next() so could be stopped early or can be called > >>>>> infinitely as if a continuous stream. Is that your idea? > >>>>> > >>>>> UniformRandomProvider rng = ...; > >>>>> int[] big = new int[1000000]; > >>>>> // > >>>>> // Fill big with lots of data > >>>>> // > >>>>> IntIterator iter = ShuffleIterators.create(rng, big); > >>>>> int x = iter.next(); > >>>>> int y = iter.next(); > >>>>> int z = iter.next(); > >>>>> > >>>>> This doesn't exist but it is easy to do. Memory requirements would > >>>>> require a copy of the data, or it could be marked as destructive to > the > >>>>> input array order and shuffle in place. > >>>>> > >>>>> If you want a random draw *with* replacement then you can just call > >>>>> nextInt(int) with the size of the array to pick something. > >>>>> > >>>>> > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >>>>> For additional commands, e-mail: dev-h...@commons.apache.org > >>>>> > >>>>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >>> For additional commands, e-mail: dev-h...@commons.apache.org > >>> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >> For additional commands, e-mail: dev-h...@commons.apache.org > >> > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > >