Le jeu. 9 mai 2019 à 17:00, Alex Herbert <alex.d.herb...@gmail.com> a écrit : > > > On 09/05/2019 15:39, Gilles Sadowski wrote: > > Le jeu. 9 mai 2019 à 15:41, Alex Herbert <alex.d.herb...@gmail.com> a écrit > > : > >> On Sat, 4 May 2019 at 23:52, Alex Herbert <alex.d.herb...@gmail.com> wrote: > >> > >>> > >>>> On 4 May 2019, at 22:34, Gilles Sadowski <gillese...@gmail.com> wrote: > >>>> > >>>> Hi. > >>>> > >>>> Le sam. 4 mai 2019 à 21:31, Alex Herbert <alex.d.herb...@gmail.com> a > >>> écrit : > >>>>> > >>>>> > >>>>>> On 4 May 2019, at 14:46, Gilles Sadowski <gillese...@gmail.com> wrote: > >>>>>> > >>>>>> Hello. > >>>>>> > >>>>>> Le ven. 3 mai 2019 à 16:57, Alex Herbert <alex.d.herb...@gmail.com > >>> <mailto:alex.d.herb...@gmail.com>> a écrit : > >>>>>>> Most of the samplers in the library have very small states that are > >>> easy > >>>>>>> to compute. Some have computations that are more expensive, such as > >>> the > >>>>>>> LargeMeanPoissonSampler or the DiscreteProbabilityCollectionSampler. > >>>>>>> > >>>>>>> However once the state is computed the only part of the state that > >>>>>>> changes is the RNG. I would like to suggest a way to copy samplers as > >>>>>>> something like: > >>>>>>> > >>>>>>> DiscreteSampler newInstance(UniformRandomProvider) > >>>>>>> > >>>>>>> The new instance would share all the private state of the first > >>> sampler > >>>>>>> except the RNG. This can be used for multi-threaded applications which > >>>>>>> require a new sampler per thread but sample from the same > >>> distribution. > >>>>>>> A particular case in point is the as yet not integrated > >>>>>>> MarsagliaTsangWangSmallMeanPoissonSampler (see RNG-91 [1]) which has a > >>>>>>> "large" state [2] that takes a "long" time [3] to compute but is > >>>>>>> effectively immutable. This could be shared across instances saving > >>>>>>> memory for parallel application. > >>>>>>> > >>>>>>> A copy instance would be almost zero set-up time and provide > >>> opportunity > >>>>>>> for caching of commonly used samplers. > >>>>>> The goal is sharing (immutable) state so it seems that the semantics is > >>>>>> not "copy". > >>>>>> > >>>>>> Isn't it a "factory" that we are after? E.g. something like: > >>>>>> public final class CachedSamplingFactory { > >>>>>> private static PoissonSamplerCache poisson = new > >>> PoissonSamplerCache(); > >>>>>> public PoissonSampler createPoissonSampler(UniformRandomProvider > >>>>>> rng, double mean) { > >>>>>> if (!poisson.isCached(mean)) { > >>>>>> poisson.createCache(mean); // Initialize (requires > >>>>>> synchronization) ... > >>>>>> } > >>>>>> return new PoissonSampler(poisson.getCache(mean), rng); // > >>>>>> Construct using pre-built state. > >>>>>> } > >>>>>> } > >>>>>> [It may be overkill, more work, and less performant…] > >>>>> But you need a factory for every class you want to share state for. And > >>> the factory actually has to look in a cache. If you operate on an instance > >>> then you get what you want. Another working version of the same sampler. > >>> It > >>> would also be thread safe without synchronisation as long as the state is > >>> immutable. The only mutable state is the passed in RNG. > >>>> Agreed. It was what I meant by the last sentence. > >>>> > >>>>>> IIUC, you suggest to add "newInstance" in the "DiscreatSampler" > >>> interface (?). > >>>>> I did think of extending DiscreteSampler with this functionality. Not > >>> adding to the interface as it currently is ‘functional’ as it has only one > >>> method. I think that should not change. Having thought about it a bit more > >>> I like the idea of a new functional interface. Perhaps: > >>>>> interface DiscreteSamplerProvider { > >>>>> DiscreteSampler create(UniformRandomProvider rng); > >>>>> } > >>>>> > >>>>> Substitute ‘Provider’ for: > >>>>> > >>>>> - Generator > >>>>> - Supplier (possible clash or alignment with Java 8 depending on the > >>> way it is done) > >>>>> - Factory (though the method is not static so I do not like this) > >>>>> - etc > >>>>> > >>>>> So this then becomes a functional interface that can be used by > >>> anything. However instances of a sampler would be expected to return a > >>> sampler matching their own functionality. > >>>>> Note there are some samplers not implementing an interface that also > >>> could benefit from this. Namely CollectionSampler and > >>> DiscreteProbabilityCollectionSampler. So does this need a generic > >>> interface: > >>>>> Sampler<T> { > >>>>> T sample(); > >>>>> } > >>>>> > >>>>> To be complimented with: > >>>>> > >>>>> SamplerProvider<T> { > >>>>> Sampler<T> create(UniformRandomProvider rng); > >>>>> } > >>>>> > >>>>> So the library would require: > >>>>> > >>>>> SamplerProvider<T> > >>>>> DiscreteSamplerProvider > >>>>> ContinuousSamplerProvider > >>>>> > >>>>> Any sampler can choose to implement being a Provider. There are some > >>> cases where it is mute. For example a ZigguratNormalizedGaussianSampler > >>> just stores the rng in the constructor. However it could still be a > >>> Provider just the method would only call the constructor. It would allow > >>> writing a generic multi-threaded application that just uses e.g. a > >>> DiscreteSamplerProvider to create samplers for each thread. You can then > >>> drop in the actual implementation you require. For example you could swap > >>> the type of PoissonSampler in your simulation by swapping the provider for > >>> the Poisson distribution. > >>>>> How does that sound? > >>>> Fine to have > >>>> DiscreteSamplerProvider > >>>> ContinuousSamplerProvider > >>>> [Perhaps the "Supplier" suffix would be better to avoid confusion with > >>>> "UniformRandomProvider".] > >>>> > >>>> At first sight, I don't think that the generic interface would have > >>>> any actual use since, ultimately, the return value of "sample()" > >>>> will be either "int" or "double" (no polymorphism). > >>>> > >>> The generic interface is for the samplers that are typed for collections > >>> and currently return a sample T, or those that return arrays. It would not > >>> be for Integer or Double from the probability distribution samplers. Here > >>> are what could use it: > >>> > >>> CombinationSampler implements Sampler<int[]> > >>> PermutationSampler implements Sampler<int[]> > >>> CollectionSampler implements Sampler<T> > >>> DiscreteProbabilityCollectionSampler implements Sampler<T> > >>> > >>> All are in the package org.apache.commons.rng.sampling. > >>> > >>> Each could also implement SamplerSupplier<T>. > >>> > >>> The set-up cost for the CombinationSampler/PermutationSampler would not be > >>> much different from the constructor and no state can be shared. No real > >>> benefit here other than convenience. But the two CollectionSamplers could > >>> shared the final collection that is created and stored from the > >>> constructor > >>> input data. For the case of a large discrete probability collection > >>> sampler > >>> this could be a noticeable memory footprint as it also stores the > >>> cumulative distribution table. This would also save on the construction > >>> cost by not having to recompute it. > >>> > >>> Alex > >>> > >> Any further thoughts on this? I think that Supplier is perhaps the wrong > >> term. A Java 8 Supplier has a get() functional method with no parameters. > >> These interfaces would require a UniformRandomProvider as the argument. > >> However the Java 8 Function<T, R> apply method which is applicable here is > >> is a poorer name. So: > >> > >> DiscreteSampler > >> ContinuousSampler > >> Sampler<T> > >> > >> and trying a few options out: > >> > >> DiscreteSamplerFactory createDiscreteSampler(UniformRandomProvider) > >> ContinuousSamplerFactory createContinuousSampler(UniformRandomProvider) > >> SamplerFactory<T> createSampler(UniformRandomProvider) > >> > >> vs. > >> > >> DiscreteSamplerFactory newDiscreteSampler(UniformRandomProvider) > >> ContinuousSamplerFactory newContinuousSampler(UniformRandomProvider) > >> SamplerFactory<T> newSampler(UniformRandomProvider) > >> > >> vs. > >> > >> DiscreteSamplerSupplier getDiscreteSampler(UniformRandomProvider) > >> ContinuousSamplerSupplier getContinuousSampler(UniformRandomProvider) > >> SamplerSupplier<T> getSampler(UniformRandomProvider) > >> > >> vs. > >> > >> DiscreteSamplerGenerator newDiscreteSampler(UniformRandomProvider) > >> ContinuousSamplerGenerator newContinuousSampler(UniformRandomProvider) > >> SamplerGenerator<T> newSampler(UniformRandomProvider) > >> > >> The 'create/new' nomenclature does convey that a new instance is expected, > >> so I prefer that over get. I'm undecided on which is the most appropriate > >> noun for the interface name. > > How about making clearer that the purpose is to share state, and > > use the "fluent API": > > > > interface SharedStateSampler<R> { > > R withUniformRandomProvider(UniformRandomProvider rng); > > } > > > > E.g. > > > > public class CollectionSampler<T> > > implements SharedStateSampler<CollectionSampler<T>> { > > // ... > > public CollectionSampler<T> > > withUniformRandomProvider(UniformRandomProvider rng) { > > return /* new instance that shares the immutable state */; > > } > > } > > > > Gilles > > Well that is much nicer. I am fine with that. > > I note that this idea can be applied to any sampler even with a very > small state. Should we aim for that or only pick the low hanging fruit > of those samplers that have a relatively large construction cost or > internal state? > > I would favour doing it for all samplers that have a state just to be > consistent. It just needs a bit more work to put into the library.
+1 for consistency. Gilles > > > > >>>>> > >>>>> > >>>>>> I'm a bit wary that this would compound two different functionalities: > >>>>>> * data generator (method "sample"), > >>>>>> * generator generator (method "newInstance"). > >>>>>> [But I currently don't have an example where this would be a problem.] > >>>>>> > >>>>>> Regards, > >>>>>> Gilles > >>>>>> > >>>>>>> Alex > >>>>>>> > >>>>>>> [1] https://issues.apache.org/jira/browse/RNG-91 < > >>> https://issues.apache.org/jira/browse/RNG-91> > >>>>>>> [2] kB, or possibly MB, of tabulated data > >>>>>>> > >>>>>>> [3] Set-up cost for a Poisson sampler is in the order of 30 to 165 > >>> times > >>>>>>> as long as a SmallMeanPoissonSampler for a mean of 2 and 32. Note > >>>>>>> however that construction still takes only 1.1 and 4.5 microseconds > >>> for > >>>>>>> the "long" time. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org