Hi.

Le sam. 4 mai 2019 à 21:31, Alex Herbert <alex.d.herb...@gmail.com> a écrit :
>
>
>
> > On 4 May 2019, at 14:46, Gilles Sadowski <gillese...@gmail.com> wrote:
> >
> > Hello.
> >
> > Le ven. 3 mai 2019 à 16:57, Alex Herbert <alex.d.herb...@gmail.com 
> > <mailto:alex.d.herb...@gmail.com>> a écrit :
> >>
> >> Most of the samplers in the library have very small states that are easy
> >> to compute. Some have computations that are more expensive, such as the
> >> LargeMeanPoissonSampler or the DiscreteProbabilityCollectionSampler.
> >>
> >> However once the state is computed the only part of the state that
> >> changes is the RNG. I would like to suggest a way to copy samplers as
> >> something like:
> >>
> >> DiscreteSampler newInstance(UniformRandomProvider)
> >>
> >> The new instance would share all the private state of the first sampler
> >> except the RNG. This can be used for multi-threaded applications which
> >> require a new sampler per thread but sample from the same distribution.
> >>
> >> A particular case in point is the as yet not integrated
> >> MarsagliaTsangWangSmallMeanPoissonSampler (see RNG-91 [1]) which has a
> >> "large" state [2] that takes a "long" time [3] to compute but is
> >> effectively immutable. This could be shared across instances saving
> >> memory for parallel application.
> >>
> >> A copy instance would be almost zero set-up time and provide opportunity
> >> for caching of commonly used samplers.
> >
> > The goal is sharing (immutable) state so it seems that the semantics is
> > not "copy".
> >
> > Isn't it a "factory" that we are after?  E.g. something like:
> > public final class CachedSamplingFactory {
> >    private static PoissonSamplerCache poisson = new PoissonSamplerCache();
> >
> >    public PoissonSampler createPoissonSampler(UniformRandomProvider
> > rng, double mean) {
> >        if (!poisson.isCached(mean)) {
> >            poisson.createCache(mean); // Initialize (requires
> > synchronization) ...
> >        }
> >        return new PoissonSampler(poisson.getCache(mean), rng); //
> > Construct using pre-built state.
> >    }
> > }
> > [It may be overkill, more work, and less performant…]
>
> But you need a factory for every class you want to share state for. And the 
> factory actually has to look in a cache. If you operate on an instance then 
> you get what you want. Another working version of the same sampler. It would 
> also be thread safe without synchronisation as long as the state is 
> immutable. The only mutable state is the passed in RNG.

Agreed.  It was what I meant by the last sentence.

> >
> > IIUC, you suggest to add "newInstance" in the "DiscreatSampler" interface 
> > (?).
>
> I did think of extending DiscreteSampler with this functionality. Not adding 
> to the interface as it currently is ‘functional’ as it has only one method. I 
> think that should not change. Having thought about it a bit more I like the 
> idea of a new functional interface. Perhaps:
>
> interface DiscreteSamplerProvider {
>     DiscreteSampler create(UniformRandomProvider rng);
> }
>
> Substitute ‘Provider’ for:
>
> - Generator
> - Supplier (possible clash or alignment with Java 8 depending on the way it 
> is done)
> - Factory (though the method is not static so I do not like this)
> - etc
>
> So this then becomes a functional interface that can be used by anything. 
> However instances of a sampler would be expected to return a sampler matching 
> their own functionality.
>
> Note there are some samplers not implementing an interface that also could 
> benefit from this. Namely CollectionSampler and 
> DiscreteProbabilityCollectionSampler. So does this need a generic interface:
>
> Sampler<T> {
>     T sample();
> }
>
> To be complimented with:
>
> SamplerProvider<T> {
>     Sampler<T> create(UniformRandomProvider rng);
> }
>
> So the library would require:
>
> SamplerProvider<T>
> DiscreteSamplerProvider
> ContinuousSamplerProvider
>
> Any sampler can choose to implement being a Provider. There are some cases 
> where it is mute. For example a ZigguratNormalizedGaussianSampler just stores 
> the rng in the constructor. However it could still be a Provider just the 
> method would only call the constructor. It would allow writing a generic 
> multi-threaded application that just uses e.g. a DiscreteSamplerProvider to 
> create samplers for each thread. You can then drop in the actual 
> implementation you require. For example you could swap the type of 
> PoissonSampler in your simulation by swapping the provider for the Poisson 
> distribution.
>
> How does that sound?

Fine to have
  DiscreteSamplerProvider
  ContinuousSamplerProvider
[Perhaps the "Supplier" suffix would be better to avoid confusion with
"UniformRandomProvider".]

At first sight, I don't think that the generic interface would have
any actual use since, ultimately, the return value of "sample()"
will be either "int" or "double" (no polymorphism).

Gilles

>
> Alex
>
>
>
> > I'm a bit wary that this would compound two different functionalities:
> >  * data generator (method "sample"),
> >  * generator generator (method "newInstance").
> > [But I currently don't have an example where this would be a problem.]
> >
> > Regards,
> > Gilles
> >
> >> Alex
> >>
> >> [1] https://issues.apache.org/jira/browse/RNG-91 
> >> <https://issues.apache.org/jira/browse/RNG-91>
> >>
> >> [2] kB, or possibly MB, of tabulated data
> >>
> >> [3] Set-up cost for a Poisson sampler is in the order of 30 to 165 times
> >> as long as a SmallMeanPoissonSampler for a mean of 2 and 32. Note
> >> however that construction still takes only 1.1 and 4.5 microseconds for
> >> the "long" time.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to