Le jeu. 9 mai 2019 à 17:00, Alex Herbert <alex.d.herb...@gmail.com> a écrit :
>
>
> On 09/05/2019 15:39, Gilles Sadowski wrote:
> > Le jeu. 9 mai 2019 à 15:41, Alex Herbert <alex.d.herb...@gmail.com> a écrit 
> > :
> >> On Sat, 4 May 2019 at 23:52, Alex Herbert <alex.d.herb...@gmail.com> wrote:
> >>
> >>>
> >>>> On 4 May 2019, at 22:34, Gilles Sadowski <gillese...@gmail.com> wrote:
> >>>>
> >>>> Hi.
> >>>>
> >>>> Le sam. 4 mai 2019 à 21:31, Alex Herbert <alex.d.herb...@gmail.com> a
> >>> écrit :
> >>>>>
> >>>>>
> >>>>>> On 4 May 2019, at 14:46, Gilles Sadowski <gillese...@gmail.com> wrote:
> >>>>>>
> >>>>>> Hello.
> >>>>>>
> >>>>>> Le ven. 3 mai 2019 à 16:57, Alex Herbert <alex.d.herb...@gmail.com
> >>> <mailto:alex.d.herb...@gmail.com>> a écrit :
> >>>>>>> Most of the samplers in the library have very small states that are
> >>> easy
> >>>>>>> to compute. Some have computations that are more expensive, such as
> >>> the
> >>>>>>> LargeMeanPoissonSampler or the DiscreteProbabilityCollectionSampler.
> >>>>>>>
> >>>>>>> However once the state is computed the only part of the state that
> >>>>>>> changes is the RNG. I would like to suggest a way to copy samplers as
> >>>>>>> something like:
> >>>>>>>
> >>>>>>> DiscreteSampler newInstance(UniformRandomProvider)
> >>>>>>>
> >>>>>>> The new instance would share all the private state of the first
> >>> sampler
> >>>>>>> except the RNG. This can be used for multi-threaded applications which
> >>>>>>> require a new sampler per thread but sample from the same
> >>> distribution.
> >>>>>>> A particular case in point is the as yet not integrated
> >>>>>>> MarsagliaTsangWangSmallMeanPoissonSampler (see RNG-91 [1]) which has a
> >>>>>>> "large" state [2] that takes a "long" time [3] to compute but is
> >>>>>>> effectively immutable. This could be shared across instances saving
> >>>>>>> memory for parallel application.
> >>>>>>>
> >>>>>>> A copy instance would be almost zero set-up time and provide
> >>> opportunity
> >>>>>>> for caching of commonly used samplers.
> >>>>>> The goal is sharing (immutable) state so it seems that the semantics is
> >>>>>> not "copy".
> >>>>>>
> >>>>>> Isn't it a "factory" that we are after?  E.g. something like:
> >>>>>> public final class CachedSamplingFactory {
> >>>>>>    private static PoissonSamplerCache poisson = new
> >>> PoissonSamplerCache();
> >>>>>>    public PoissonSampler createPoissonSampler(UniformRandomProvider
> >>>>>> rng, double mean) {
> >>>>>>        if (!poisson.isCached(mean)) {
> >>>>>>            poisson.createCache(mean); // Initialize (requires
> >>>>>> synchronization) ...
> >>>>>>        }
> >>>>>>        return new PoissonSampler(poisson.getCache(mean), rng); //
> >>>>>> Construct using pre-built state.
> >>>>>>    }
> >>>>>> }
> >>>>>> [It may be overkill, more work, and less performant…]
> >>>>> But you need a factory for every class you want to share state for. And
> >>> the factory actually has to look in a cache. If you operate on an instance
> >>> then you get what you want. Another working version of the same sampler. 
> >>> It
> >>> would also be thread safe without synchronisation as long as the state is
> >>> immutable. The only mutable state is the passed in RNG.
> >>>> Agreed.  It was what I meant by the last sentence.
> >>>>
> >>>>>> IIUC, you suggest to add "newInstance" in the "DiscreatSampler"
> >>> interface (?).
> >>>>> I did think of extending DiscreteSampler with this functionality. Not
> >>> adding to the interface as it currently is ‘functional’ as it has only one
> >>> method. I think that should not change. Having thought about it a bit more
> >>> I like the idea of a new functional interface. Perhaps:
> >>>>> interface DiscreteSamplerProvider {
> >>>>>     DiscreteSampler create(UniformRandomProvider rng);
> >>>>> }
> >>>>>
> >>>>> Substitute ‘Provider’ for:
> >>>>>
> >>>>> - Generator
> >>>>> - Supplier (possible clash or alignment with Java 8 depending on the
> >>> way it is done)
> >>>>> - Factory (though the method is not static so I do not like this)
> >>>>> - etc
> >>>>>
> >>>>> So this then becomes a functional interface that can be used by
> >>> anything. However instances of a sampler would be expected to return a
> >>> sampler matching their own functionality.
> >>>>> Note there are some samplers not implementing an interface that also
> >>> could benefit from this. Namely CollectionSampler and
> >>> DiscreteProbabilityCollectionSampler. So does this need a generic 
> >>> interface:
> >>>>> Sampler<T> {
> >>>>>     T sample();
> >>>>> }
> >>>>>
> >>>>> To be complimented with:
> >>>>>
> >>>>> SamplerProvider<T> {
> >>>>>     Sampler<T> create(UniformRandomProvider rng);
> >>>>> }
> >>>>>
> >>>>> So the library would require:
> >>>>>
> >>>>> SamplerProvider<T>
> >>>>> DiscreteSamplerProvider
> >>>>> ContinuousSamplerProvider
> >>>>>
> >>>>> Any sampler can choose to implement being a Provider. There are some
> >>> cases where it is mute. For example a ZigguratNormalizedGaussianSampler
> >>> just stores the rng in the constructor. However it could still be a
> >>> Provider just the method would only call the constructor. It would allow
> >>> writing a generic multi-threaded application that just uses e.g. a
> >>> DiscreteSamplerProvider to create samplers for each thread. You can then
> >>> drop in the actual implementation you require. For example you could swap
> >>> the type of PoissonSampler in your simulation by swapping the provider for
> >>> the Poisson distribution.
> >>>>> How does that sound?
> >>>> Fine to have
> >>>>   DiscreteSamplerProvider
> >>>>   ContinuousSamplerProvider
> >>>> [Perhaps the "Supplier" suffix would be better to avoid confusion with
> >>>> "UniformRandomProvider".]
> >>>>
> >>>> At first sight, I don't think that the generic interface would have
> >>>> any actual use since, ultimately, the return value of "sample()"
> >>>> will be either "int" or "double" (no polymorphism).
> >>>>
> >>> The generic interface is for the samplers that are typed for collections
> >>> and currently return a sample T, or those that return arrays. It would not
> >>> be for Integer or Double from the probability distribution samplers. Here
> >>> are what could use it:
> >>>
> >>> CombinationSampler implements Sampler<int[]>
> >>> PermutationSampler implements Sampler<int[]>
> >>> CollectionSampler implements Sampler<T>
> >>> DiscreteProbabilityCollectionSampler implements Sampler<T>
> >>>
> >>> All are in the package org.apache.commons.rng.sampling.
> >>>
> >>> Each could also implement SamplerSupplier<T>.
> >>>
> >>> The set-up cost for the CombinationSampler/PermutationSampler would not be
> >>> much different from the constructor and no state can be shared. No real
> >>> benefit here other than convenience. But the two CollectionSamplers could
> >>> shared the final collection that is created and stored from the 
> >>> constructor
> >>> input data. For the case of a large discrete probability collection 
> >>> sampler
> >>> this could be a noticeable memory footprint as it also stores the
> >>> cumulative distribution table. This would also save on the construction
> >>> cost by not having to recompute it.
> >>>
> >>> Alex
> >>>
> >> Any further thoughts on this? I think that Supplier is perhaps the wrong
> >> term. A Java 8 Supplier has a get() functional method with no parameters.
> >> These interfaces would require a UniformRandomProvider as the argument.
> >> However the Java 8 Function<T, R> apply method which is applicable here is
> >> is a poorer name. So:
> >>
> >> DiscreteSampler
> >> ContinuousSampler
> >> Sampler<T>
> >>
> >> and trying a few options out:
> >>
> >> DiscreteSamplerFactory createDiscreteSampler(UniformRandomProvider)
> >> ContinuousSamplerFactory createContinuousSampler(UniformRandomProvider)
> >> SamplerFactory<T> createSampler(UniformRandomProvider)
> >>
> >> vs.
> >>
> >> DiscreteSamplerFactory newDiscreteSampler(UniformRandomProvider)
> >> ContinuousSamplerFactory newContinuousSampler(UniformRandomProvider)
> >> SamplerFactory<T> newSampler(UniformRandomProvider)
> >>
> >> vs.
> >>
> >> DiscreteSamplerSupplier getDiscreteSampler(UniformRandomProvider)
> >> ContinuousSamplerSupplier getContinuousSampler(UniformRandomProvider)
> >> SamplerSupplier<T> getSampler(UniformRandomProvider)
> >>
> >> vs.
> >>
> >> DiscreteSamplerGenerator newDiscreteSampler(UniformRandomProvider)
> >> ContinuousSamplerGenerator newContinuousSampler(UniformRandomProvider)
> >> SamplerGenerator<T> newSampler(UniformRandomProvider)
> >>
> >> The 'create/new' nomenclature does convey that a new instance is expected,
> >> so I prefer that over get. I'm undecided on which is the most appropriate
> >> noun for the interface name.
> > How about making clearer that the purpose is to share state, and
> > use the "fluent API":
> >
> > interface SharedStateSampler<R> {
> >      R withUniformRandomProvider(UniformRandomProvider rng);
> > }
> >
> > E.g.
> >
> > public class CollectionSampler<T>
> >      implements SharedStateSampler<CollectionSampler<T>> {
> >      // ...
> >      public CollectionSampler<T>
> > withUniformRandomProvider(UniformRandomProvider rng) {
> >          return /* new instance that shares the immutable state */;
> >      }
> > }
> >
> > Gilles
>
> Well that is much nicer. I am fine with that.
>
> I note that this idea can be applied to any sampler even with a very
> small state. Should we aim for that or only pick the low hanging fruit
> of those samplers that have a relatively large construction cost or
> internal state?
>
> I would favour doing it for all samplers that have a state just to be
> consistent. It just needs a bit more work to put into the library.

+1 for consistency.

Gilles

>
> >
> >>>>>
> >>>>>
> >>>>>> I'm a bit wary that this would compound two different functionalities:
> >>>>>> * data generator (method "sample"),
> >>>>>> * generator generator (method "newInstance").
> >>>>>> [But I currently don't have an example where this would be a problem.]
> >>>>>>
> >>>>>> Regards,
> >>>>>> Gilles
> >>>>>>
> >>>>>>> Alex
> >>>>>>>
> >>>>>>> [1] https://issues.apache.org/jira/browse/RNG-91 <
> >>> https://issues.apache.org/jira/browse/RNG-91>
> >>>>>>> [2] kB, or possibly MB, of tabulated data
> >>>>>>>
> >>>>>>> [3] Set-up cost for a Poisson sampler is in the order of 30 to 165
> >>> times
> >>>>>>> as long as a SmallMeanPoissonSampler for a mean of 2 and 32. Note
> >>>>>>> however that construction still takes only 1.1 and 4.5 microseconds
> >>> for
> >>>>>>> the "long" time.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to