[jira] [Commented] (RNG-179) The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions
[ https://issues.apache.org/jira/browse/RNG-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17544623#comment-17544623 ] Alex Herbert commented on RNG-179: -- I updated the code to remove the use of BigInteger to store the relative weights. The integers are now stored as a 53-bit long integer (created from the mantissa of the input double weight) and an exponent offset. The weight is thus: {code:java} weight = value * 2^exponent BigInteger.valueOf(value).shiftLeft(exponent) {code} Each bit of the integer can now be tested by determining if the bit is within the integer using: {code:java} // Old (the BigInteger value was cached) BigInteger.valueOf(value).shiftLeft(offset).testBit(n) // New private static boolean testBit(long value, int offset, int n) { if (n < offset) { // All logical trailing bits are zero return false; } final int bit = n - offset; return bit <= 53 && (value & (1L << bit)) != 0; }{code} The new code is marginally faster at testing the bits. Looking at the source for BigInteger there are some branch statements that are avoided and the number of instructions is lower (since BigInteger does some extra logic to handle zero and negatives). The construction performance can be improved further by ignoring very small weights. I added an alpha parameter that will ignore any weights that are at least 2^alpha smaller in magnitude than the largest weight. The comparison uses only the exponent offsets. Any filtered weights are set to zero and ignored from further processing. This does not reduce the in-memory size of the discrete distribution generating tree. The following table shows the previous BigInteger performance against the new version. A version with alpha=53 is shown. This only affects the Binomial distribution which reduces from 68 weights down to 55 with a noticeable improvement in construction speed. The speed is still much slower than just approximately mapping to long[]. ||Distribution||double[] BigInteger||double[]||double[] alpha=53||long[]|| |4SidedLoadedDie|842.9|703.2|738.1|666.6| |Binomial_N67_P0.7|55466.3|34948.1|23003.8|6833.3| |Geometric_P0.2|48722.1|37459.9|36418.5|10702.3| |Poisson_Mean10_Mean20|19013.6|15893.4|13786.8|5862.4| |Poisson_Mean3.22|7434.5|6600.3|6395.6|2417.2| |Random 3072|2184490.83|1551266.0|1560548.4|1009663.4| The slowest to construct is the Random 3072 data. This has 3072 weights. In this case alpha=53 makes no difference as the weights are sampled from the 2^53 dyadic rationals and no weights can be excluded. The speed of the new version is approximately 0.71x the old version. For the Binomial distribution the affect of alpha on sampling is negligible (data not shown). This is due to the fact that the weights that were excluded will be sampled at most every 2^53 times. Thus the bulk of the distribution targeted by the sampler is essentially unchanged. h2. Conclusion The sampler construction is slow. To maintain the property of exact sampling from the input distribution requires exact math to be performed on the weights. This can be changed to ignore very small relative weights, or the weights mapped to long. Both improve construction performance but result is the sampling no longer being exact. The performance of sampling is not affected (if only very small weights are ignored). > The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete > Probability Distributions > > > Key: RNG-179 > URL: https://issues.apache.org/jira/browse/RNG-179 > Project: Commons RNG > Issue Type: New Feature > Components: sampling >Reporter: Vladimir Sitnikov >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > It might make sense to implement Fast Loaded Dice Roller sampler. > See https://arxiv.org/pdf/2003.03830v2.pdf, > https://github.com/probcomp/fast-loaded-dice-roller > The authors claim that Fast Loaded Dice Roller is faster than the Alias > Method. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (RNG-179) The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions
[ https://issues.apache.org/jira/browse/RNG-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542388#comment-17542388 ] Alex Herbert commented on RNG-179: -- I tested the speed of construction with long[] vs double[]. The frequencies summing to approximately 2^62 can be created using: {code:java} final double sum = Arrays.stream(probabilities).sum(); final long[] frequencies = Arrays.stream(probabilities) .mapToLong(x -> Math.round(0x1.0p62 * x / sum)) .toArray(); {code} Here are the times for the fixed size distributions: ||Distribution||n||double[]||long[]|| |4SidedLoadedDie|4|842.9|761.0| |Binomial_N67_P0.7|68|55466.3|8491.8| |Geometric_P0.2|93|48722.1|12566.4| |Poisson_Mean10_Mean20|53|19013.6|6807.2| |Poisson_Mean3.22|20|7434.5|2984.9| The longer length arrays are significantly slower when using the double[] constructor. Here are the times for random Dirichlet distributions: ||k||alpha||double[]||long[]|| |4|0.5|1065.4|1048.7| | |1|930.8|858.6| | |2|900.7|835.8| |8|0.5|2080.8|1328.0| | |1|1604.0|1478.3| | |2|1582.2|1428.0| |16|0.5|3539.6|2554.2| | |1|3146.1|2480.8| | |2|2786.4|2429.9| Again, longer length double[] see a more significant slowdown. Note that I used BigInteger for convenience. The BigIntegers have the value of the mantissa with a shift created from the exponent offset. The offset is the (exponent - min exponent). This may merit replacing the use of BigInteger with a custom object to store the integers. This is because at most only 53 bits of the BigInteger are non-zero but the length may be hundreds or thousands of bits if there is a large range of magnitude in the input weights. I am reluctant to implement a custom routine to sum the bit representation of the integers (this is done in the reference c code). However once the sum is performed using BigInteger (O(n)) then the O(nm) for-loop could use a custom representation. This avoids costly repetition in the use of: {code:java} BigInteger.testBit(int) {code} Since most of the integer is zeros (due to left shift of the mantissa), then a custom representation should be able to speed this up significantly. I will look at the code and see if it can be optimised. > The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete > Probability Distributions > > > Key: RNG-179 > URL: https://issues.apache.org/jira/browse/RNG-179 > Project: Commons RNG > Issue Type: New Feature > Components: sampling >Reporter: Vladimir Sitnikov >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > It might make sense to implement Fast Loaded Dice Roller sampler. > See https://arxiv.org/pdf/2003.03830v2.pdf, > https://github.com/probcomp/fast-loaded-dice-roller > The authors claim that Fast Loaded Dice Roller is faster than the Alias > Method. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (RNG-179) The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions
[ https://issues.apache.org/jira/browse/RNG-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542000#comment-17542000 ] Gilles Sadowski commented on RNG-179: - bq. So perhaps the name should change to [2]. +1 [IMHO, the slightly lower verbosity is not worth giving away the naming consistency.] > The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete > Probability Distributions > > > Key: RNG-179 > URL: https://issues.apache.org/jira/browse/RNG-179 > Project: Commons RNG > Issue Type: New Feature > Components: sampling >Reporter: Vladimir Sitnikov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > It might make sense to implement Fast Loaded Dice Roller sampler. > See https://arxiv.org/pdf/2003.03830v2.pdf, > https://github.com/probcomp/fast-loaded-dice-roller > The authors claim that Fast Loaded Dice Roller is faster than the Alias > Method. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (RNG-179) The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions
[ https://issues.apache.org/jira/browse/RNG-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541982#comment-17541982 ] Alex Herbert commented on RNG-179: -- API decision on the sampler name: # FastLoadedDiceRollerSampler # FastLoadedDiceRollerDiscreteSampler I used [1], just to reduce verbosity. However all the other enumerated samplers that can be constructed from weights have DiscreteSampler in their name: * AliasMethodDiscreteSampler * GuideTableDiscreteSampler * MarsagliaTsangWangDiscreteSampler.Enumerated So perhaps the name should change to [2]. > The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete > Probability Distributions > > > Key: RNG-179 > URL: https://issues.apache.org/jira/browse/RNG-179 > Project: Commons RNG > Issue Type: New Feature > Components: sampling >Reporter: Vladimir Sitnikov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > It might make sense to implement Fast Loaded Dice Roller sampler. > See https://arxiv.org/pdf/2003.03830v2.pdf, > https://github.com/probcomp/fast-loaded-dice-roller > The authors claim that Fast Loaded Dice Roller is faster than the Alias > Method. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (RNG-179) The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions
[ https://issues.apache.org/jira/browse/RNG-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541972#comment-17541972 ] Alex Herbert commented on RNG-179: -- {quote}I thought integer sampling (the case when all weights are integer) does not require going for floats, doubles, etc. {quote} The API allows for construction using long[] as frequencies. This corresponds to {{{}fldr.c{}}}. The construction using double[] as weights corresponds to {{{}fldrf.c{}}}. However to plug the sampler into the existing performance benchmark requires a double[] for the distribution. This only affects the performance measure on the construction cost (single sample). Once the sampler is built then the sampling algorithm is the same. It uses only integer look-up tables and a source of boolean bits. For reference I have computed the Shannon entropy for the distributions in the performance benchmark. ||Distribution||n||Shannon entropy (bits)|| |4 Sided Loaded Die|4|1.626| |Binomial(n=67, p=0.7)|68|3.953| |Geometric(p=0.2)|93|3.610| |Poisson (mean=3.22)|20|2.842| |Poisson (mean1=10, mean2=20)|53|7.236| So as mentioned above the 4 sided loaded die has the lowest entropy. Note that for the probability distributions these are truncated with a CDF of 0.9 (1 - 1e-9). The benchmark should perhaps be updated to use a [Dirichlet distribution |https://en.wikipedia.org/wiki/Dirichlet_distribution]. This sampler was added in RNG 1.4 after the benchmark was created. A quick test shows that the size and alpha parameter can be used to create distributions of various entropy: {code:java} ObjectSampler s = DirichletSampler.symmetric(rng, k, alpha) {code} ||k||alpha||entropy mean||sd|| |4|0.500|1.299|0.374| |4|1.000|1.531|0.294| |4|2.000|1.754|0.172| |8|0.500|2.087|0.348| |8|1.000|2.490|0.266| |8|2.000|2.707|0.142| |16|0.500|3.023|0.287| |16|1.000|3.454|0.166| |16|2.000|3.693|0.095| |32|0.500|4.008|0.182| |32|1.000|4.406|0.125| |32|2.000|4.692|0.075| |64|0.500|4.986|0.151| |64|1.000|5.392|0.115| |64|2.000|5.680|0.048| {quote}Given that you've already done all the work, and that it is a documented algorithm, I'd think that it should be included {quote} OK. I will update the code with some more information about performance and the benefit of exact sampling. I have already added some notes to the factory constructor javadoc using the double[] weights to indicate when the long[] method can be used. Note that exact sampling is a strange advantage. In order to notice this would require that a lot of samples are obtained from the sampler, for example 2^53 samples. This is likely to be generated with days of continuous sampling. It also requires that the source of bits is unbiased, which is never ensured with a RNG unless the full period of an unbiased sampler is used, for example the entire 2^64 longs output from a SplitMix generator. However it may be that a user wishes to construct a distribution such as: {code:java} double[] weights = {1, 1, Math.scalb(1.0, -53), Math.scalb(1.0, -100)}; {code} There is no other sampler in the library that can perform this and ensure that category 3 is observed with a probability of approximately 2^-99 (i.e. 2^-100 / sum(weights)). All other samplers would ignore 2^-100 and some may ignore 2^-53 when converting the weights to probabilities. You would have to run the sampler a long time to obtain category 2 or 3, but at least it would be possible. > The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete > Probability Distributions > > > Key: RNG-179 > URL: https://issues.apache.org/jira/browse/RNG-179 > Project: Commons RNG > Issue Type: New Feature > Components: sampling >Reporter: Vladimir Sitnikov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > It might make sense to implement Fast Loaded Dice Roller sampler. > See https://arxiv.org/pdf/2003.03830v2.pdf, > https://github.com/probcomp/fast-loaded-dice-roller > The authors claim that Fast Loaded Dice Roller is faster than the Alias > Method. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (RNG-179) The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions
[ https://issues.apache.org/jira/browse/RNG-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541936#comment-17541936 ] Gilles Sadowski commented on RNG-179: - bq. Based solely on performance this method offers no benefit to the library. The question is whether the advantage of being able to exactly represent a distribution merits inclusion in the library. Given that you've already done all the work, and that it is a documented algorithm, I'd think that it should be included (there are other instances of an implementation that coexists with a more performant alternative). It would be nice if a use case exists where the "exactness" property make an actual difference... > The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete > Probability Distributions > > > Key: RNG-179 > URL: https://issues.apache.org/jira/browse/RNG-179 > Project: Commons RNG > Issue Type: New Feature > Components: sampling >Reporter: Vladimir Sitnikov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > It might make sense to implement Fast Loaded Dice Roller sampler. > See https://arxiv.org/pdf/2003.03830v2.pdf, > https://github.com/probcomp/fast-loaded-dice-roller > The authors claim that Fast Loaded Dice Roller is faster than the Alias > Method. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (RNG-179) The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions
[ https://issues.apache.org/jira/browse/RNG-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541921#comment-17541921 ] Vladimir Sitnikov commented on RNG-179: --- WOW. Thanks for the analysis! {quote} Here the sampler is relatively slow to construct. This is due to the construction using exact representation of the input probabilities. All probability values are converted to rational numbers (p / q) where q is a power of 2.{quote} I thought integer sampling (the case when all weights are integer) does not require going for floats, doubles, etc. This looks like a pure int/long/long long math: https://github.com/probcomp/fast-loaded-dice-roller/blob/master/src/c/fldr.c > The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete > Probability Distributions > > > Key: RNG-179 > URL: https://issues.apache.org/jira/browse/RNG-179 > Project: Commons RNG > Issue Type: New Feature > Components: sampling >Reporter: Vladimir Sitnikov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > It might make sense to implement Fast Loaded Dice Roller sampler. > See https://arxiv.org/pdf/2003.03830v2.pdf, > https://github.com/probcomp/fast-loaded-dice-roller > The authors claim that Fast Loaded Dice Roller is faster than the Alias > Method. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (RNG-179) The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions
[ https://issues.apache.org/jira/browse/RNG-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541883#comment-17541883 ] Alex Herbert commented on RNG-179: -- For reference the API is: {code:java} UniformRandomProvider rng = ...; // Represented exactly long[] frequencies = {1, 1, 2, 3, 1}; DiscreteSampler sampler1 = FastLoadedDiceRollerSampler.of(rng, frequencies); // Represented exactly using relative weights double[] weights = {1, 0.5, 0.125, 3.5, 1}; DiscreteSampler sampler2 = FastLoadedDiceRollerSampler.of(rng, weights); {code} When using frequencies the maximum number of categories is approximately ((2^31 - 1) / 63) = 34,087,042 and the sum of frequencies must be below Long.MAX_VALUE. When using weights the maximum number of categories is approximately 1,000,000 if all weights are Double.MAX_VALUE with the exception of 1 weight of Double.MIN_VALUE. In practice this range of weights is unrealistic and more categories should be possible. The construction algorithm does not require the weights to sum to a finite double value. > The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete > Probability Distributions > > > Key: RNG-179 > URL: https://issues.apache.org/jira/browse/RNG-179 > Project: Commons RNG > Issue Type: New Feature > Components: sampling >Reporter: Vladimir Sitnikov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > It might make sense to implement Fast Loaded Dice Roller sampler. > See https://arxiv.org/pdf/2003.03830v2.pdf, > https://github.com/probcomp/fast-loaded-dice-roller > The authors claim that Fast Loaded Dice Roller is faster than the Alias > Method. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (RNG-179) The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions
[ https://issues.apache.org/jira/browse/RNG-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541878#comment-17541878 ] Alex Herbert commented on RNG-179: -- I have created an implementation of the Fast Loaded Dice Roller (FDLR). For review the code is in [PR 112|https://github.com/apache/commons-rng/pull/112]. The adaption is very close to the reference c code. The sample algorithm is the same. A copy-paste between c and java for this code requires no changes. The algorithm uses a modified version of the boolean bit cache functionality in IntProvider to provide the bit source for boolean bits (\{0, 1}). This returns an int rather than a boolean. Creation of the sampler from frequencies (long[]) is identical to the reference c code. Creation of the sampler from weights (double[]) uses some BigInteger arithmetic. This avoids the custom code for unlimited precision math used in the original c code. It affects only construction and not the sampling routine. h2. Performance I added it to the existing JMH benchmark for enumerated distributions. This tests samples from: * 4-sided loaded die: p = 1/2, 1/3, 1/12, 1/12 * Binomial(n=67, p=0.7) * Geometric(p=0.2) * Poisson(mean=3.22) * Bimodal distribution of Poisson(mean=10) + Poisson(mean=20) * Random probability arrays with different sizes The benchmark was performed using XO_RO_SHI_RO_128_PLUS generator using: {noformat} Java version: 11.0.12, vendor: Eclipse Foundation, runtime: /Library/Java/JavaVirtualMachines/temurin-11.jdk/Contents/Home OS name: "mac os x", version: "11.3", arch: "x86_64", family: "mac" {noformat} *Sample* This tests the time for a sample from an already created sampler. ||Distribution||BinarySearch||FLDR||GuideTable||AliasMethod||MarsagliaTsangWang|| |4 Sided Loaded Die|16.67|18.45|19.55|13.52|5.55| |Binomial(n=67, p=0.7)|32.57|49.66|12.44|14.51|8.49| |Geometric(p=0.2)|31.03|28.21|15.72|11.74|7.38| |Poisson (mean=3.22)|23.47|23.41|14.21|14.21|6.56| |Poisson (mean1=10, mean2=20)|37.58|25.92|14.45|12.87|9.32| |Random 6|20.36|31.68|17.73|12.47|6.08| |Random 96|44.74|38.00|17.66|14.34|11.45| |Random 3072|89.40|51.50|18.17|15.38|11.00| | |295.83|266.83|129.92|109.04|65.83| Here the sampler is better than binary search. But not as fast as other more optimised samplers. The original paper states the sampler is fast on low entropy distributions. Entropy is defined as: {noformat} H = sum [ p * log2(1 / p) ]{noformat} where p are the probabilities for each category. The entropy increases with the number of categories. It is highest when the distribution is uniform. As the distribution becomes more skewed (i.e. weighted to a few categories) the entropy decreases. So the sampler should do best on small, skewed distributions. This is observed for the 4 sided loaded die. *Single Sample* This tests the time to create a sampler then perform a single sample. ||Distribution||BinarySearch||FLDR||GuideTable||AliasMethod||MarsagliaTsangWang|| |4 Sided Loaded Die|40.09|905.11|55.51|72.10|215.23| |Binomial(n=67, p=0.7)|277.57|49098.62|360.37|818.00|2209.45| |Geometric(p=0.2)|357.48|44719.90|491.57|1032.77|3146.81| |Poisson (mean=3.22)|105.14|7506.44|137.38|260.71|893.37| |Poisson (mean1=10, mean2=20)|226.58|19303.88|297.53|561.43|2354.08| |Random 3072|11440.83|2184490.83|15225.15|56482.53|355280.61| |Random 6|51.95|1185.35|64.31|100.88|383.89| |Random 96|390.91|15681.71|539.40|1155.64|5326.40| | |12890.56|2322891.84|17171.22|60484.06|369809.84| Here the sampler is relatively slow to construct. This is due to the construction using exact representation of the input probabilities. All probability values are converted to rational numbers (p / q) where q is a power of 2. These can then be summed exactly to unity when adjusted to use a common denominator. This process extracts the mantissa and exponent from each weight. It uses BigInteger if the range of magnitude of the probabilities is large, otherwise can use long arithmetic. The Discrete Distribution Generating (DDG) tree construction also requires a nested for loop over the rational number numerators. This may also be slow compared to some of the single loop construction routines in other samplers. h2. Conclusion The sampler is not as fast as other implementations in the library. It has a big construction cost. The sampler does have the advantage of offering _exact_ samples from the input weighted distribution. In a typical inexact implementation the input weights are summed and probabilities computed using weight / sum. This can alter the relative weights due to rounding. The sampler may also discard small probabilities that cannot be represented. For example the Alias sampler implementation converts probabilities to integers that sum to 2^53. This will discard any probability below 2^-53. Similarly the MarsagliaTsangWang represents probabilities as fractions with a denominator of 2^30. As the
[jira] [Commented] (RNG-179) The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions
[ https://issues.apache.org/jira/browse/RNG-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17540619#comment-17540619 ] Alex Herbert commented on RNG-179: -- Thanks for the suggestion. The provided paper shows it may be faster with a lower entropy distribution. It certainly uses less bits for each sample. The memory usage seems to be higher although initialisation cost is lower. I can have a look at implementing this for the distributions specified using integer ratios: {code:java} UniformRandomProvider rng = ...; int[] distribution = {1, 1, 2, 3, 1}; DiscreteSampler sampler = FastLoadedDiceRoller.of(rng, distribution); {code} > The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete > Probability Distributions > > > Key: RNG-179 > URL: https://issues.apache.org/jira/browse/RNG-179 > Project: Commons RNG > Issue Type: New Feature > Components: sampling >Reporter: Vladimir Sitnikov >Priority: Major > > It might make sense to implement Fast Loaded Dice Roller sampler. > See https://arxiv.org/pdf/2003.03830v2.pdf, > https://github.com/probcomp/fast-loaded-dice-roller > The authors claim that Fast Loaded Dice Roller is faster than the Alias > Method. -- This message was sent by Atlassian Jira (v8.20.7#820007)