[ https://issues.apache.org/jira/browse/RNG-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gilles resolved RNG-50. ----------------------- Resolution: Implemented Fix Version/s: 1.1 Improvement implemented as of commit edb3eed76e5a50ddce94dd5510f0c9d2f54be35a ("master"); discussion and further changes postponed to after the release of version 1.1. > PoissonSampler single use speed improvements > -------------------------------------------- > > Key: RNG-50 > URL: https://issues.apache.org/jira/browse/RNG-50 > Project: Commons RNG > Issue Type: Improvement > Affects Versions: 1.0 > Reporter: Alex D Herbert > Priority: Minor > Fix For: 1.1 > > Attachments: PoissonSamplerTest.java, jmh-result.csv > > > The Sampler architecture of {{org.apache.commons.rng.sampling.distribution}} > is nicely written for fast sampling of small dataset sizes. The constructors > for the samplers do not check the input parameters are valid for the > respective distributions (in contrast to the old > {{org.apache.commons.math3.random.distribution}} classes). I assume this is a > design choice for speed. Thus most of the samplers can be used within a loop > to sample just one value with very little overhead. > The {{PoissonSampler}} precomputes log factorial numbers upon construction if > the mean is above 40. This is done using the {{InternalUtils.FactorialLog}} > class. As of version 1.0 this internal class is currently only used in the > {{PoissonSampler}}. > The cache size is limited to 2*PIVOT (where PIVOT=40). But it creates and > precomputes the cache every time a PoissonSampler is constructed if the mean > is above the PIVOT value. > Why not create this once in a static block for the PoissonSampler? > {code:java} > /** {@code log(n!)}. */ > private static final FactorialLog factorialLog; > > static > { > factorialLog = FactorialLog.create().withCache((int) (2 * > PoissonSampler.PIVOT)); > } > {code} > This will make the construction cost of a new {{PoissonSampler}} negligible. > If the table is computed dynamically as a static construction method then the > overhead will be in the first use. Thus the following call will be much > faster: > {code:java} > UniformRandomProvider rng = ...; > int value = new PoissonSampler(rng, 50).sample(); > {code} > I have tested this modification (see attached file) and the results are: > {noformat} > Mean 40 Single construction ( 7330792) vs Loop construction > (24334724) (3.319522.2x faster) > Mean 40 Single construction ( 7330792) vs Loop construction with static > FactorialLog ( 7990656) (1.090013.2x faster) > Mean 50 Single construction ( 6390303) vs Loop construction > (19389026) (3.034132.2x faster) > Mean 50 Single construction ( 6390303) vs Loop construction with static > FactorialLog ( 6146556) (0.961857.2x faster) > Mean 60 Single construction ( 6041165) vs Loop construction > (21337678) (3.532047.2x faster) > Mean 60 Single construction ( 6041165) vs Loop construction with static > FactorialLog ( 5329129) (0.882136.2x faster) > Mean 70 Single construction ( 6064003) vs Loop construction > (23963516) (3.951765.2x faster) > Mean 70 Single construction ( 6064003) vs Loop construction with static > FactorialLog ( 5306081) (0.875013.2x faster) > Mean 80 Single construction ( 6064772) vs Loop construction > (26381365) (4.349935.2x faster) > Mean 80 Single construction ( 6064772) vs Loop construction with static > FactorialLog ( 6341274) (1.045591.2x faster) > {noformat} > Thus the speed improvements would be approximately 3-4 fold for single use > Poisson sampling. -- This message was sent by Atlassian JIRA (v7.6.3#76005)