[ https://issues.apache.org/jira/browse/RNG-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641910#comment-16641910 ]
Alex D Herbert commented on RNG-57: ----------------------------------- Apologies in advance for the long post. This stems from the desire to understand why my simple changes for RNG-57 have broken the test suite for commons-rng-core. I ran the {{ProvidersCommonParametricTest}} test 1000 times with a random seed generated by SecureRandom as a 1024 byte array. Code snippet to do this is posted in [#comment-16640722]. The raw data were large (>40MB) so I extracted the number of failures for each test and have uploaded this to my current PC. Unfortunately I produced the raw data incorrectly as the test stops at the first failure and later tests will be missing repeats. So for example this test for nextInteger has incomplete data because early tests fail: {code:java} @Test public void testUniformNextIntegerInRange() { checkNextIntegerInRange(4, 1000); checkNextIntegerInRange(10, 1000); checkNextIntegerInRange(12, 1000); // If this fails ... checkNextIntegerInRange(31, 1000); // Data for the remaining will be missing !!! checkNextIntegerInRange(32, 1000); checkNextIntegerInRange(2016128993, 1000); checkNextIntegerInRange(1834691456, 1000); checkNextIntegerInRange(869657561, 1000); checkNextIntegerInRange(1570504788, 1000); } {code} However the number of tests is always above 900 and so I did a Chi squared test to determine if the number of failures fits a Binomial distribution. Here are some demo results for the MWC_256 method. The table shows the method under test, the mean number of failures, the p-value from a chi squared test matching the histogram of failures to a Binomial and then a cumulative histogram of the number of failures. The final column value is the number of tests (sometimes less than 1000). |Method|Mean fails|p(Binomial(n,p=0.01)|Cumul Fails (X<=x\|x=0)|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15| |nextInteger(1834691456)|5.027|0.999|6|49|130|256|406|565|710|812|892|923|937|947|950|950|951| | |nextLong(6917529027641081853)|4.904|1|9|41|127|252|418|580|719|828|867|898|920|926|928| | | | |nextDouble()|4.928|0.989|8|42|131|265|446|633|772|876|940|973|989|998|999|1000| | | |nextInteger(32)|4.889|0.997|6|38|130|268|451|619|761|848|909|941|958|964|967|969| | | |nextInteger(10)|4.914|0.991|9|36|121|270|460|636|776|872|940|976|994|997|999|1000| | | |nextInteger(1570504788)|4.902|1|4|38|118|256|429|594|725|809|867|899|918|921|923|925|925|926| |nextInteger(2016128993)|4.98|1|6|39|110|230|428|585|749|839|900|935|951|954|957|958| | | |nextInteger(31)|4.969|0.996|9|40|132|265|450|606|753|839|901|952|969|975|979|980| | | |nextLong(31)|4.99|0.999|7|32|106|253|429|601|747|845|911|947|964|968|969|970|971| | |nextLong(4611686018427387902)|4.989|1|7|44|122|254|404|572|721|819|876|912|928|940|940|940|941| | |nextLong(32)|5.053|1|7|36|113|240|408|585|728|845|897|929|949|961|963|964| | | |nextInteger(12)|5.016|0.994|6|44|129|261|436|610|752|856|925|961|980|985|992|993|994| | |nextFloat()|4.953|0.991|8|42|129|261|428|625|777|878|949|974|984|994|998|1000| | | |nextLong(11)|4.922|0.991|7|36|135|272|440|631|776|890|940|973|986|995|997|1000| | | |nextLong(2305843009213693951)|5.046|1|1|33|109|241|401|582|704|826|894|924|941|945|947|949| | | |nextInteger(4)|0|0|1000| | | | | | | | | | | | | | | | |nextLong(4)|0|0|1000| | | | | | | | | | | | | | | | |nextLong(19)|4.977|0.997|5|37|130|271|448|620|744|855|915|948|971|983|984|986| | | |nextInteger(869657561)|4.918|1|5|33|98|252|409|599|749|831|886|915|926|934|936|937| | | Here are some conclusions: - Repeating the Chi squared test for uniformity 500 times with a critical p-value of 0.01 produces counts that can be modelled as a binomial distribution. - The mean is 5 (n*p = 500*0.01). - The test for nextInteger/Long(4) always passes. The test is performed using 10 bins and there only 4 discrete values. Basically the test is invalid and the Chi square test should be done using 3 degrees of freedom. - The test fail threshold of 10 allowed failures corresponds to a cumulative probability of 0.9868 (see [#comment-16640891]). This means it will fail 1.32% of the time. Note that currently there are 19 tests using this approach. We can discount 2 (nextInteger(4) and nextLong(4)) leaving 17. There are also 3 random walk tests with a 1% critical value on the range allowed and 2 tests for uniform next bytes using a single chi squared p-value of 1%. So for a single RNG the probability that no tests will fail is: {noformat} (1 - 0.0132)^17 * (1 - 0.01)^3 * (1 - 0.01)^2 = 0.7587 {noformat} The test is performed on 16 variants of the RNG. So the probability no tests will fail is: {noformat} 0.7587^16 = 0.0121 {noformat} This seems very low. I don't have data for the random walk and nextBytes tests. But the data for the 19 variants of the nextUniform tests show the following histogram of the highest fail count recorded for a single RNG: |RNG|0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18| |ISAACRandom|0|0|0|0|0|0|13|85|206|279|217|130|43|19|6|1|0|0|1| |JDKRandom|0|0|0|0|0|0|9|76|206|275|225|120|60|17|10|0|2| | | |KISSRandom|0|0|0|0|0|0|18|87|221|272|203|122|46|20|7|3|0|1| | |MersenneTwister|0|0|0|0|0|0|9|73|209|300|187|141|47|21|6|5|1|1| | |MersenneTwister64|0|0|0|0|0|0|7|73|204|312|197|120|48|28|9|1|1| | | |MultiplyWithCarry256|0|0|0|0|0|0|12|82|233|288|206|113|40|21|4|1|0| | | |SplitMix64|0|0|0|0|0|0|9|67|236|299|200|114|47|21|6|1| | | | |TwoCmres|0|0|0|0|0|1|10|83|225|287|189|122|52|25|5|1| | | | |TwoCmres2|0|0|0|0|0|0|12|93|223|276|194|129|43|24|6| | | | | |Well1024a|0|0|0|0|0|0|8|67|223|283|207|130|47|23|9|3| | | | |Well19937a|0|0|0|0|0|0|4|82|226|266|210|135|49|16|9|2|1| | | |Well19937c|0|0|0|0|0|0|11|83|192|273|239|123|45|21|10|2|1| | | |Well44497a|0|0|0|0|0|0|9|90|242|276|198|108|37|28|9|3| | | | |Well44497b|0|0|0|0|0|0|11|68|232|271|234|111|41|18|8|5|1|0| | |Well512a|0|0|0|0|0|0|13|87|214|263|211|121|54|27|4|4|1|1| | |XorShift1024Star|0|0|0|0|0|0|14|63|210|283|240|102|60|18|4|6| | | | |Total|0|0|0|0|0|1|169|1259|3502|4503|3357|1941|759|347|112|38|8|3|1| |Cumulative| |0|0|0|0|1|170|1429|4931|9434|12791|14732|15491|15838|15950|15988|15996|15999|16000| |Mean pass rate| |0|0|0|0|0.0000625|0.010625|0.0893125|0.3081875|0.589625|0.7994375|0.92075|0.9681875|0.989875|0.996875|0.99925|0.99975|0.9999375|1| The mean pass rate for a single RNG is 78.9% if the allowed fail count is 10. This matches the theory of {noformat} (1 - 0.0132)^17 = 0.7978 {noformat} Here is the histogram for the highest fail count across the 16 RNGs: |Max Fail (x)|Count|Frequency|Cumulative|P(X<=x)| |10|24|0.024|24|0.024| |11|252|0.252|276|0.276| |12|327|0.327|603|0.603| |13|254|0.254|857|0.857| |14|95|0.095|952|0.952| |15|36|0.036|988|0.988| |16|8|0.008|996|0.996| |17|3|0.003|999|0.999| |18|1|0.001|1000|1| So the entire test suite of nextUniform tests only passes 2.4% of the time from 1000 repeats. This matches the theory of: {noformat} 0.7978^16 = 0.0269 {noformat} I.e. 97.3% of the time the test suite will fail. I think it is fair to say that the test suite is operating as expected. All the RNGs can be modelled as perfect uniform providers in that they pass the tests with the expected probabilities. However due to the number of tests being run the test suite is likely to fail if random seeds are used. Q. What to do about it? A simple solution is to increase the cut-off for failures in the repeat Chi-square test to 15. Then the suite will pass a lot more. This is equivalent to setting the p-value for the test from 0.0132 to <0.0001. So actually makes the test very weak as a standalone test. A [Bonferroni correction|https://en.wikipedia.org/wiki/Bonferroni_correction] could be used to get a different p-value other than 1%. This also makes each individual test very weak and relies on the suite having a lot of tests. An alternative approach would be to reduce the number of tests per RNG. These could be limited to nextBytes and either nextInt or nextLong depending on the type of provider. The later may be redundant and the abstract {{IntProvider/LongProvider}} implementation of nextInt(int)/nextLong(long) tested using an external source of randomness, i.e. something not in the core library. SecureRandom can be configured (at least in JDK 8) to use the underlying operating systems source of entropy for seeding and should be a strong source of randomness. So a test could be written for the abstract providers using e.g. {code:java} public class SecureRandom32 extends IntProvider { /** Delegate. */ private SecureRandom delegate; /** * Creates an instance. */ public SecureRandom32() { delegate = new SecureRandom(); } @Override public int next() { return delegate.nextInt(); } } {code} The tests for preconditions and save/restore state can be in one class. A test for randomness could be moved to a dedicated class for the provider depending on the type of random source. If future providers are developed that do not produce a source of int/long randomness, e.g. directly a source of float randomness, then those can be tested differently. If the number of tests is limited to just 1 (i.e. nextBytes) then a p-value of 1% can be used and the suite will pass at the following rate for 16 providers: {noformat} 0.99^16 = 0.85 {noformat} Obviously this will have to be revisited if future providers are added. However it will allow the suite to run most of the time with a random seed and Travis can handle repeats. It is not very convenient for local development though as builds will fail a fair amount. Possibly fixed using fixed seeds for local builds and a random seed configured via a Maven profile for Travis. > CachedUniformRandomProvider for nextBoolean() and nextInt() > ----------------------------------------------------------- > > Key: RNG-57 > URL: https://issues.apache.org/jira/browse/RNG-57 > Project: Commons RNG > Issue Type: Improvement > Components: sampling > Affects Versions: 1.2 > Reporter: Alex D Herbert > Priority: Minor > Labels: performance > > Implement a wrapper around a {{UniformRandomProvider}} that can cache the > underlying source of random bytes for use in the methods {{nextBoolean()}} > and {{nextInt()}} (in the case of {{LongProvider}}). E.g. > {code:java} > LongProvider provider = RandomSource.create(RandomSource.SPLIT_MIX_64); > CachedLongProvider rng = new CachedLongProvider(provider); > // Uses cached nextLong() 64 times > rng.nextBoolean(); > // Uses cached nextLong() twice > rng.nextInt(); > IntProvider provider = RandomSource.create(RandomSource.KISS); > CachedIntProvider rng2 = new CachedIntProvider(provider); > // Uses cached nextInt() 32 times > rng2.nextBoolean(); > // This could be wrapped by a factory method: > UniformRandomProvider rng = CachedUniformRandomProviderFactory.wrap( > // Any supported source: IntProvider or LongProvider > RandomSource.create(RandomSource...)); > {code} > The implementation should be speed tested to determine the benefit for > {{nextBoolean()}} and if {{nextInt()}} can be improved for {{LongProviders}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)