> On 18 Mar 2019, at 10:20, Gilles Sadowski <gillese...@gmail.com> wrote:
> 
> Le dim. 17 mars 2019 à 01:01, Alex Herbert <alex.d.herb...@gmail.com 
> <mailto:alex.d.herb...@gmail.com>> a écrit :
>> 
>> 
>> 
>>> On 16 Mar 2019, at 23:10, Alex Herbert <alex.d.herb...@gmail.com 
>>> <mailto:alex.d.herb...@gmail.com>> wrote:
>>> 
>>> 
>>> 
>>>> On 16 Mar 2019, at 02:54, Gilles Sadowski <gillese...@gmail.com 
>>>> <mailto:gillese...@gmail.com> <mailto:gillese...@gmail.com 
>>>> <mailto:gillese...@gmail.com>>> wrote:
>>>>> This is read by dieharder which directly reads from stdin. This worked to 
>>>>> collect all the generated bits and the serial and xor composites failed 
>>>>> the test suite.
>>>>> 
>>>>> It is also read by the stdin2testu01.c program to pass to TestU01.
>>>>> 
>>>>> What is happening is that the stdin2testu01.c is reading 64-bits using an 
>>>>> unsigned long.
>>>> 
>>>> I don't remember why I wrote that, but as you pointed outit now looks
>>>> like a plain bug.
>>> 
>>> It may be more complicated again...
>>> 
>>> I’ve had a play around with the data being pushed through to the testU01 
>>> library using the c bridge. I wanted to check that the int value that is 
>>> generated by the RNG is passed through to the c program. So I wrote a 
>>> simple BridgeTester class to do this. It writes all the int values to a 
>>> data file (for reference) then passes them to the c executable with the 
>>> same method as the RandomStressTester. I then modified the stdin2testu01.c 
>>> program to have an extra hidden debug mode where all the data is just 
>>> written to stdout.
>>> 
>>> I found the data file written from Java did not match the data that the c 
>>> program had. I bit more digging found that the problem was that Java uses a 
>>> big endian representation and the c program is little endian. This is true 
>>> on my linux and Mac OSX platforms. So the raw bytes read from stdin are in 
>>> the wrong order.
>>> 
>>> When I updated the program to self detect endianness and swap the byte 
>>> order of each set of 4 bytes from the stdin then the data in the c program 
>>> matched the original.
>>> 
>>> Since it was non destructive to the module I added all this to master. You 
>>> can see this working by rebuilding the c bridge and running the new profile 
>>> to test it:
>>> 
>>>> cd commons-rng-examples/examples-stress
>>>> gcc src/main/c/stdin2testu01.c -o stdin2testu01 -ltestu01 
>>>> -ltestu01probdist -ltestu01mylib -lm
>>>> mvn test -P bridge
>>> 
>>> You should see two files:
>>> 
>>> target/bridge.data
>>> target/bridge.out
>>> 
>>> These should have the same contents. The .data file is written by the java 
>>> program, and the .out file is the stdout captured from the c program with 
>>> its view of the data.
>>> 
>>> This should fix running TestU01.
>>> 
>>> BUT I’ve not had time to determine how Dieharder is reading the stdin. 
>>> Given it is a c library it may be reading it using little endian as well. 
>>> I’ll look into that next.
>>> 
>>> Composite update:
>>> 
>>> For some reason all my BigCrush simulations crashed. It could be a RAM 
>>> issue. The runs did take longer than expected but I did not monitor memory 
>>> usage. I’ve started them again but using only the serial composite. I think 
>>> the xor one is really broken.
>>> 
>>> FYI. Using the new bridge code with 3 runs of SmallCrush finds [6, 6, 6] / 
>>> 15 failed tested for the serial composite and [9, 9, 10] / 15 for the xor 
>>> composite.
>>> 
>>> I’m expecting BigCrush to fail a lot. I’m now more interested in seeing if 
>>> it will complete.
>>> 
>>> Alex
>>> 
>> 
>> 
>> PS. Thinking about the endianness it might not matter. The test suite 
>> ideally will be able to detect if the bits are not random in the lower or 
>> upper most significant byte of the 32 bits. I.e. it should always find a 
>> problem. I am not clear if this is the case. I have read that some 
>> generators can pass BigCrush but fail if the bits are reversed (not the 
>> bytes but the bits). I’m happy to think that endianness is not an issue.
>> 
>> It was a good exercise in debugging if the bridge was working though.
>> 
>> One actual issue is that we are testing long providers using the long to 
>> create 2 int values. Should we test using a series of the upper 32 bits and 
>> then a series of the lower 32 bits?
> 
> Is that useful since the test now sees the integers as they are produced 
> (i.e. 2
> values per long)?
> 

It is not relevant if you are concerned about int quality. But if you are 
concerned about long quality then it is relevant. The long quality is important 
for the quality of nextDouble(). Although in that case only the upper 53 bits 
of the long. This means that the quality of a long from an int provider is also 
not covered by the benchmark as that would require testing alternating ints 
twice using the series: 1, 3, 5…, 2n+1 and 2, 4, 6, … 2n.

Given that half of the int values were previously discarded from the BigCrush 
analysis, the current results on the user guide page actually represent 
BigCrush running on the upper 32-bits of the long, byte reversed due to the 
big/little endian interpretation of the bytes in Java and linux. 

So maybe the an update to the RandomStressTester to support analysis for int or 
long quality is needed. For now the quality section on the website should just 
state that the quality is for the ‘nextInt()’ method of the RNG.

I have the results of BigCrush using the new bridge c program:

XorShiftSerialComposite : 40, 39, 39 : 608.2 +/- 3.9

So it fails.

The XorShiftXorComposite crashed after 2 hours about 1/4 of the results file 
complete. I am running again so I can monitor it for memory usage. Something in 
the BigCrush suite just cannot handle this generator output.

Alex


> Gilles
> 
>> I may set an unused workstation on this task to see what happens.
>> 
>> Alex
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org 
> <mailto:dev-unsubscr...@commons.apache.org>
> For additional commands, e-mail: dev-h...@commons.apache.org 
> <mailto:dev-h...@commons.apache.org>

Reply via email to