[jira] [Commented] (RNG-179) The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions

2022-05-31 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/RNG-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17544623#comment-17544623
 ] 

Alex Herbert commented on RNG-179:
--

I updated the code to remove the use of BigInteger to store the relative 
weights. The integers are now stored as a 53-bit long integer (created from the 
mantissa of the input double weight) and an exponent offset. The weight is thus:
{code:java}
weight = value * 2^exponent

BigInteger.valueOf(value).shiftLeft(exponent) {code}
Each bit of the integer can now be tested by determining if the bit is within 
the integer using:
{code:java}
// Old (the BigInteger value was cached)
BigInteger.valueOf(value).shiftLeft(offset).testBit(n)

// New
private static boolean testBit(long value, int offset, int n) {
    if (n < offset) {
        // All logical trailing bits are zero
        return false;
    }
    final int bit = n - offset;
    return bit <= 53 && (value & (1L << bit)) != 0;
}{code}
The new code is marginally faster at testing the bits. Looking at the source 
for BigInteger there are some branch statements that are avoided and the number 
of instructions is lower (since BigInteger does some extra logic to handle zero 
and negatives).

The construction performance can be improved further by ignoring very small 
weights. I added an alpha parameter that will ignore any weights that are at 
least 2^alpha smaller in magnitude than the largest weight. The comparison uses 
only the exponent offsets. Any filtered weights are set to zero and ignored 
from further processing. This does not reduce the in-memory size of the 
discrete distribution generating tree.

The following table shows the previous BigInteger performance against the new 
version. A version with alpha=53 is shown. This only affects the Binomial 
distribution which reduces from 68 weights down to 55 with a noticeable 
improvement in construction speed. The speed is still much slower than just 
approximately mapping to long[].
||Distribution||double[] BigInteger||double[]||double[] alpha=53||long[]||
|4SidedLoadedDie|842.9|703.2|738.1|666.6|
|Binomial_N67_P0.7|55466.3|34948.1|23003.8|6833.3|
|Geometric_P0.2|48722.1|37459.9|36418.5|10702.3|
|Poisson_Mean10_Mean20|19013.6|15893.4|13786.8|5862.4|
|Poisson_Mean3.22|7434.5|6600.3|6395.6|2417.2|
|Random 3072|2184490.83|1551266.0|1560548.4|1009663.4|

The slowest to construct is the Random 3072 data. This has 3072 weights. In 
this case alpha=53 makes no difference as the weights are sampled from the 2^53 
dyadic rationals and no weights can be excluded. The speed of the new version 
is approximately 0.71x the old version.

For the Binomial distribution the affect of alpha on sampling is negligible 
(data not shown). This is due to the fact that the weights that were excluded 
will be sampled at most every 2^53 times. Thus the bulk of the distribution 
targeted by the sampler is essentially unchanged.
h2. Conclusion

The sampler construction is slow. To maintain the property of exact sampling 
from the input distribution requires exact math to be performed on the weights. 
This can be changed to ignore very small relative weights, or the weights 
mapped to long. Both improve construction performance but result is the 
sampling no longer being exact. The performance of sampling is not affected (if 
only very small weights are ignored).

 

> The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete 
> Probability Distributions
> 
>
> Key: RNG-179
> URL: https://issues.apache.org/jira/browse/RNG-179
> Project: Commons RNG
>  Issue Type: New Feature
>  Components: sampling
>Reporter: Vladimir Sitnikov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It might make sense to implement Fast Loaded Dice Roller sampler.
> See https://arxiv.org/pdf/2003.03830v2.pdf, 
> https://github.com/probcomp/fast-loaded-dice-roller
> The authors claim that Fast Loaded Dice Roller is faster than the Alias 
> Method.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (RNG-179) The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions

2022-05-26 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/RNG-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542388#comment-17542388
 ] 

Alex Herbert commented on RNG-179:
--

I tested the speed of construction with long[] vs double[]. The frequencies 
summing to approximately 2^62 can be created using:
{code:java}
final double sum = Arrays.stream(probabilities).sum();
final long[] frequencies = Arrays.stream(probabilities)
 .mapToLong(x -> Math.round(0x1.0p62 * x / sum))
 .toArray();
{code}
Here are the times for the fixed size distributions:
||Distribution||n||double[]||long[]||
|4SidedLoadedDie|4|842.9|761.0|
|Binomial_N67_P0.7|68|55466.3|8491.8|
|Geometric_P0.2|93|48722.1|12566.4|
|Poisson_Mean10_Mean20|53|19013.6|6807.2|
|Poisson_Mean3.22|20|7434.5|2984.9|

The longer length arrays are significantly slower when using the double[] 
constructor. 

Here are the times for random Dirichlet distributions:
||k||alpha||double[]||long[]||
|4|0.5|1065.4|1048.7|
| |1|930.8|858.6|
| |2|900.7|835.8|
|8|0.5|2080.8|1328.0|
| |1|1604.0|1478.3|
| |2|1582.2|1428.0|
|16|0.5|3539.6|2554.2|
| |1|3146.1|2480.8|
| |2|2786.4|2429.9|

Again, longer length double[] see a more significant slowdown.

Note that I used BigInteger for convenience. The BigIntegers have the value of 
the mantissa with a shift created from the exponent offset. The offset is the 
(exponent - min exponent). This may merit replacing the use of BigInteger with 
a custom object to store the integers. This is because at most only 53 bits of 
the BigInteger are non-zero but the length may be hundreds or thousands of bits 
if there is a large range of magnitude in the input weights.

I am reluctant to implement a custom routine to sum the bit representation of 
the integers (this is done in the reference c code). However once the sum is 
performed using BigInteger (O(n)) then the O(nm) for-loop could use a custom 
representation. This avoids costly repetition in the use of:
{code:java}
BigInteger.testBit(int)
{code}
Since most of the integer is zeros (due to left shift of the mantissa), then a 
custom representation should be able to speed this up significantly. I will 
look at the code and see if it can be optimised.

> The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete 
> Probability Distributions
> 
>
> Key: RNG-179
> URL: https://issues.apache.org/jira/browse/RNG-179
> Project: Commons RNG
>  Issue Type: New Feature
>  Components: sampling
>Reporter: Vladimir Sitnikov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It might make sense to implement Fast Loaded Dice Roller sampler.
> See https://arxiv.org/pdf/2003.03830v2.pdf, 
> https://github.com/probcomp/fast-loaded-dice-roller
> The authors claim that Fast Loaded Dice Roller is faster than the Alias 
> Method.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (RNG-179) The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions

2022-05-25 Thread Gilles Sadowski (Jira)


[ 
https://issues.apache.org/jira/browse/RNG-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542000#comment-17542000
 ] 

Gilles Sadowski commented on RNG-179:
-

bq. So perhaps the name should change to [2].

+1
[IMHO, the slightly lower verbosity is not worth giving away the naming 
consistency.]


> The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete 
> Probability Distributions
> 
>
> Key: RNG-179
> URL: https://issues.apache.org/jira/browse/RNG-179
> Project: Commons RNG
>  Issue Type: New Feature
>  Components: sampling
>Reporter: Vladimir Sitnikov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It might make sense to implement Fast Loaded Dice Roller sampler.
> See https://arxiv.org/pdf/2003.03830v2.pdf, 
> https://github.com/probcomp/fast-loaded-dice-roller
> The authors claim that Fast Loaded Dice Roller is faster than the Alias 
> Method.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (RNG-179) The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions

2022-05-25 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/RNG-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541982#comment-17541982
 ] 

Alex Herbert commented on RNG-179:
--

API decision on the sampler name:
 # FastLoadedDiceRollerSampler
 # FastLoadedDiceRollerDiscreteSampler

I used [1], just to reduce verbosity.

However all the other enumerated samplers that can be constructed from weights 
have DiscreteSampler in their name:
 * AliasMethodDiscreteSampler
 * GuideTableDiscreteSampler
 * MarsagliaTsangWangDiscreteSampler.Enumerated

So perhaps the name should change to [2].

 

> The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete 
> Probability Distributions
> 
>
> Key: RNG-179
> URL: https://issues.apache.org/jira/browse/RNG-179
> Project: Commons RNG
>  Issue Type: New Feature
>  Components: sampling
>Reporter: Vladimir Sitnikov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It might make sense to implement Fast Loaded Dice Roller sampler.
> See https://arxiv.org/pdf/2003.03830v2.pdf, 
> https://github.com/probcomp/fast-loaded-dice-roller
> The authors claim that Fast Loaded Dice Roller is faster than the Alias 
> Method.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (RNG-179) The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions

2022-05-25 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/RNG-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541972#comment-17541972
 ] 

Alex Herbert commented on RNG-179:
--

{quote}I thought integer sampling (the case when all weights are integer) does 
not require going for floats, doubles, etc.
{quote}
The API allows for construction using long[] as frequencies. This corresponds 
to {{{}fldr.c{}}}.

The construction using double[] as weights corresponds to {{{}fldrf.c{}}}.

However to plug the sampler into the existing performance benchmark requires a 
double[] for the distribution. This only affects the performance measure on the 
construction cost (single sample).

Once the sampler is built then the sampling algorithm is the same. It uses only 
integer look-up tables and a source of boolean bits.

For reference I have computed the Shannon entropy for the distributions in the 
performance benchmark.
||Distribution||n||Shannon entropy (bits)||
|4 Sided Loaded Die|4|1.626|
|Binomial(n=67, p=0.7)|68|3.953|
|Geometric(p=0.2)|93|3.610|
|Poisson (mean=3.22)|20|2.842|
|Poisson (mean1=10, mean2=20)|53|7.236|

So as mentioned above the 4 sided loaded die has the lowest entropy. Note that 
for the probability distributions these are truncated with a CDF of 0.9 
(1 - 1e-9).

The benchmark should perhaps be updated to use a [Dirichlet distribution 
|https://en.wikipedia.org/wiki/Dirichlet_distribution]. This sampler was added 
in RNG 1.4 after the benchmark was created. A quick test shows that the size 
and alpha parameter can be used to create distributions of various entropy:
{code:java}
ObjectSampler s = DirichletSampler.symmetric(rng, k, alpha)
{code}
||k||alpha||entropy mean||sd||
|4|0.500|1.299|0.374|
|4|1.000|1.531|0.294|
|4|2.000|1.754|0.172|
|8|0.500|2.087|0.348|
|8|1.000|2.490|0.266|
|8|2.000|2.707|0.142|
|16|0.500|3.023|0.287|
|16|1.000|3.454|0.166|
|16|2.000|3.693|0.095|
|32|0.500|4.008|0.182|
|32|1.000|4.406|0.125|
|32|2.000|4.692|0.075|
|64|0.500|4.986|0.151|
|64|1.000|5.392|0.115|
|64|2.000|5.680|0.048|

 
{quote}Given that you've already done all the work, and that it is a documented 
algorithm, I'd think that it should be included 
{quote}
OK. I will update the code with some more information about performance and the 
benefit of exact sampling. I have already added some notes to the factory 
constructor javadoc using the double[] weights to indicate when the long[] 
method can be used.

Note that exact sampling is a strange advantage. In order to notice this would 
require that a lot of samples are obtained from the sampler, for example 2^53 
samples. This is likely to be generated with days of continuous sampling. It 
also requires that the source of bits is unbiased, which is never ensured with 
a RNG unless the full period of an unbiased sampler is used, for example the 
entire 2^64 longs output from a SplitMix generator.

However it may be that a user wishes to construct a distribution such as:
{code:java}
double[] weights = {1, 1, Math.scalb(1.0, -53), Math.scalb(1.0, -100)};
{code}
There is no other sampler in the library that can perform this and ensure that 
category 3 is observed with a probability of approximately 2^-99 (i.e. 2^-100 / 
sum(weights)). All other samplers would ignore 2^-100 and some may ignore 2^-53 
when converting the weights to probabilities.

You would have to run the sampler a long time to obtain category 2 or 3, but at 
least it would be possible.

 

> The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete 
> Probability Distributions
> 
>
> Key: RNG-179
> URL: https://issues.apache.org/jira/browse/RNG-179
> Project: Commons RNG
>  Issue Type: New Feature
>  Components: sampling
>Reporter: Vladimir Sitnikov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It might make sense to implement Fast Loaded Dice Roller sampler.
> See https://arxiv.org/pdf/2003.03830v2.pdf, 
> https://github.com/probcomp/fast-loaded-dice-roller
> The authors claim that Fast Loaded Dice Roller is faster than the Alias 
> Method.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (RNG-179) The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions

2022-05-25 Thread Gilles Sadowski (Jira)


[ 
https://issues.apache.org/jira/browse/RNG-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541936#comment-17541936
 ] 

Gilles Sadowski commented on RNG-179:
-

bq. Based solely on performance this method offers no benefit to the library. 
The question is whether the advantage of being able to exactly represent a 
distribution merits inclusion in the library.

Given that you've already done all the work, and that it is a documented 
algorithm, I'd think that it should be included (there are other instances of 
an implementation that coexists with a more performant alternative).
It would be nice if a use case exists where the "exactness" property make an 
actual difference...

> The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete 
> Probability Distributions
> 
>
> Key: RNG-179
> URL: https://issues.apache.org/jira/browse/RNG-179
> Project: Commons RNG
>  Issue Type: New Feature
>  Components: sampling
>Reporter: Vladimir Sitnikov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It might make sense to implement Fast Loaded Dice Roller sampler.
> See https://arxiv.org/pdf/2003.03830v2.pdf, 
> https://github.com/probcomp/fast-loaded-dice-roller
> The authors claim that Fast Loaded Dice Roller is faster than the Alias 
> Method.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (RNG-179) The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions

2022-05-25 Thread Vladimir Sitnikov (Jira)


[ 
https://issues.apache.org/jira/browse/RNG-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541921#comment-17541921
 ] 

Vladimir Sitnikov commented on RNG-179:
---

WOW. Thanks for the analysis!

{quote} Here the sampler is relatively slow to construct. This is due to the 
construction using exact representation of the input probabilities. All 
probability values are converted to rational numbers (p / q) where q is a power 
of 2.{quote}
I thought integer sampling (the case when all weights are integer) does not 
require going for floats, doubles, etc.

This looks like a pure int/long/long long math: 
https://github.com/probcomp/fast-loaded-dice-roller/blob/master/src/c/fldr.c

> The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete 
> Probability Distributions
> 
>
> Key: RNG-179
> URL: https://issues.apache.org/jira/browse/RNG-179
> Project: Commons RNG
>  Issue Type: New Feature
>  Components: sampling
>Reporter: Vladimir Sitnikov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It might make sense to implement Fast Loaded Dice Roller sampler.
> See https://arxiv.org/pdf/2003.03830v2.pdf, 
> https://github.com/probcomp/fast-loaded-dice-roller
> The authors claim that Fast Loaded Dice Roller is faster than the Alias 
> Method.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (RNG-179) The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions

2022-05-25 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/RNG-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541883#comment-17541883
 ] 

Alex Herbert commented on RNG-179:
--

For reference the API is:
{code:java}
UniformRandomProvider rng = ...;

// Represented exactly
long[] frequencies = {1, 1, 2, 3, 1};
DiscreteSampler sampler1 = FastLoadedDiceRollerSampler.of(rng, frequencies);

// Represented exactly using relative weights
double[] weights = {1, 0.5, 0.125, 3.5, 1};
DiscreteSampler sampler2 = FastLoadedDiceRollerSampler.of(rng, weights);
{code}
When using frequencies the maximum number of categories is approximately ((2^31 
- 1) / 63) = 34,087,042 and the sum of frequencies must be below Long.MAX_VALUE.

When using weights the maximum number of categories is approximately 1,000,000 
if all weights are Double.MAX_VALUE with the exception of 1 weight of 
Double.MIN_VALUE. In practice this range of weights is unrealistic and more 
categories should be possible. The construction algorithm does not require the 
weights to sum to a finite double value.

> The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete 
> Probability Distributions
> 
>
> Key: RNG-179
> URL: https://issues.apache.org/jira/browse/RNG-179
> Project: Commons RNG
>  Issue Type: New Feature
>  Components: sampling
>Reporter: Vladimir Sitnikov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It might make sense to implement Fast Loaded Dice Roller sampler.
> See https://arxiv.org/pdf/2003.03830v2.pdf, 
> https://github.com/probcomp/fast-loaded-dice-roller
> The authors claim that Fast Loaded Dice Roller is faster than the Alias 
> Method.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (RNG-179) The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions

2022-05-25 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/RNG-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541878#comment-17541878
 ] 

Alex Herbert commented on RNG-179:
--

I have created an implementation of the Fast Loaded Dice Roller (FDLR). For 
review the code is in [PR 112|https://github.com/apache/commons-rng/pull/112].

The adaption is very close to the reference c code. The sample algorithm is the 
same. A copy-paste between c and java for this code requires no changes. The 
algorithm uses a modified version of the boolean bit cache functionality in 
IntProvider to provide the bit source for boolean bits (\{0, 1}). This returns 
an int rather than a boolean. Creation of the sampler from frequencies (long[]) 
is identical to the reference c code. Creation of the sampler from weights 
(double[]) uses some BigInteger arithmetic. This avoids the custom code for 
unlimited precision math used in the original c code. It affects only 
construction and not the sampling routine.
h2. Performance

I added it to the existing JMH benchmark for enumerated distributions. This 
tests samples from:
 * 4-sided loaded die: p = 1/2, 1/3, 1/12, 1/12
 * Binomial(n=67, p=0.7)
 * Geometric(p=0.2)
 * Poisson(mean=3.22)
 * Bimodal distribution of Poisson(mean=10) + Poisson(mean=20)
 * Random probability arrays with different sizes

The benchmark was performed using XO_RO_SHI_RO_128_PLUS generator using:
{noformat}
Java version: 11.0.12, vendor: Eclipse Foundation, runtime: 
/Library/Java/JavaVirtualMachines/temurin-11.jdk/Contents/Home
OS name: "mac os x", version: "11.3", arch: "x86_64", family: "mac"
{noformat}
*Sample*
This tests the time for a sample from an already created sampler.
||Distribution||BinarySearch||FLDR||GuideTable||AliasMethod||MarsagliaTsangWang||
|4 Sided Loaded Die|16.67|18.45|19.55|13.52|5.55|
|Binomial(n=67, p=0.7)|32.57|49.66|12.44|14.51|8.49|
|Geometric(p=0.2)|31.03|28.21|15.72|11.74|7.38|
|Poisson (mean=3.22)|23.47|23.41|14.21|14.21|6.56|
|Poisson (mean1=10, mean2=20)|37.58|25.92|14.45|12.87|9.32|
|Random 6|20.36|31.68|17.73|12.47|6.08|
|Random 96|44.74|38.00|17.66|14.34|11.45|
|Random 3072|89.40|51.50|18.17|15.38|11.00|
| |295.83|266.83|129.92|109.04|65.83|

Here the sampler is better than binary search. But not as fast as other more 
optimised samplers.

The original paper states the sampler is fast on low entropy distributions. 
Entropy is defined as:
{noformat}
H = sum [ p * log2(1 / p) ]{noformat}
where p are the probabilities for each category. The entropy increases with the 
number of categories. It is highest when the distribution is uniform. As the 
distribution becomes more skewed (i.e. weighted to a few categories) the 
entropy decreases. So the sampler should do best on small, skewed 
distributions. This is observed for the 4 sided loaded die.

*Single Sample*
This tests the time to create a sampler then perform a single sample.
||Distribution||BinarySearch||FLDR||GuideTable||AliasMethod||MarsagliaTsangWang||
|4 Sided Loaded Die|40.09|905.11|55.51|72.10|215.23|
|Binomial(n=67, p=0.7)|277.57|49098.62|360.37|818.00|2209.45|
|Geometric(p=0.2)|357.48|44719.90|491.57|1032.77|3146.81|
|Poisson (mean=3.22)|105.14|7506.44|137.38|260.71|893.37|
|Poisson (mean1=10, mean2=20)|226.58|19303.88|297.53|561.43|2354.08|
|Random 3072|11440.83|2184490.83|15225.15|56482.53|355280.61|
|Random 6|51.95|1185.35|64.31|100.88|383.89|
|Random 96|390.91|15681.71|539.40|1155.64|5326.40|
| |12890.56|2322891.84|17171.22|60484.06|369809.84|

Here the sampler is relatively slow to construct. This is due to the 
construction using exact representation of the input probabilities. All 
probability values are converted to rational numbers (p / q) where q is a power 
of 2. These can then be summed exactly to unity when adjusted to use a common 
denominator. This process extracts the mantissa and exponent from each weight. 
It uses BigInteger if the range of magnitude of the probabilities is large, 
otherwise can use long arithmetic. The Discrete Distribution Generating (DDG) 
tree construction also requires a nested for loop over the rational number 
numerators. This may also be slow compared to some of the single loop 
construction routines in other samplers.
h2. Conclusion

The sampler is not as fast as other implementations in the library. It has a 
big construction cost.

The sampler does have the advantage of offering _exact_ samples from the input 
weighted distribution. In a typical inexact implementation the input weights 
are summed and probabilities computed using weight / sum. This can alter the 
relative weights due to rounding. The sampler may also discard small 
probabilities that cannot be represented. For example the Alias sampler 
implementation converts probabilities to integers that sum to 2^53. This will 
discard any probability below 2^-53. Similarly the MarsagliaTsangWang 
represents probabilities as fractions with a denominator of 2^30. As the 

[jira] [Commented] (RNG-179) The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions

2022-05-22 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/RNG-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17540619#comment-17540619
 ] 

Alex Herbert commented on RNG-179:
--

Thanks for the suggestion. The provided paper shows it may be faster with a 
lower entropy distribution. It certainly uses less bits for each sample. The 
memory usage seems to be higher although initialisation cost is lower.

I can have a look at implementing this for the distributions specified using 
integer ratios:
{code:java}
UniformRandomProvider rng = ...;
int[] distribution = {1, 1, 2, 3, 1};
DiscreteSampler sampler = FastLoadedDiceRoller.of(rng, distribution);
{code}

> The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete 
> Probability Distributions
> 
>
> Key: RNG-179
> URL: https://issues.apache.org/jira/browse/RNG-179
> Project: Commons RNG
>  Issue Type: New Feature
>  Components: sampling
>Reporter: Vladimir Sitnikov
>Priority: Major
>
> It might make sense to implement Fast Loaded Dice Roller sampler.
> See https://arxiv.org/pdf/2003.03830v2.pdf, 
> https://github.com/probcomp/fast-loaded-dice-roller
> The authors claim that Fast Loaded Dice Roller is faster than the Alias 
> Method.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)