On 20/03/2019 13:08, Gilles Sadowski wrote:

- Perhaps there is another place for them?

I am happy to not clutter the source repo and keep them somewhere for
devs. But where? You tell me. Maybe in the examples-stress module they
are OK as that one is not officially part of the library.
Unless a script is robust and readily usable by a newbie (i.e. anyone
but the author), there is the risk that it becomes cruft.
What you did is great work but I doubt that many would be interested
in detailed tables of the failures, once one can easily compare the number
of failures.
If more is needed, there is no shortcut to reading the doc of the test suites
themselves and other resources on the web...

So, to summarize, what I think is interesting is to make it easy to rerun
the test suites and update the site (e.g. for the "current" JDK):
  * a document that mentions the external tools requirement
  * a standalone application for running "RandomStressTester"
  * a script that collects results from the above run, formats them into the
    "quality" table ready to be pasted into the "rng.apt" file, and copies the
    output files to their appropriate place (*not* assuming a git repository)

It would also be great to be able to easily start the many benchmarks
and similarly collects the results into the user guide tables.

I will write a script that will collect the results from the benchmark, build the apt table and copy the output files to a directory.

Note that the benchmark files only contain the RNG class name:

# RNG: org.apache.commons.rng.core.source32.JDKRandom

This means the script has to be kept up to date with the corresponding enum name, and the desired output order for the apt table. I can add these to a simple array as the top which any newbie should be able to extend.

It would be desirable for the script to search for results files in a set of directories, identify them as dieharder (dh) or TestU01 (tu) results and then output the files for each run N of a unique generator M to the existing directory structure, e.g:

> collate-benchmark-results.pl output dir1 dir2 ...

output/(dh|tu)/run_N/(dh|tu)_M

This should be done robustly such that the script can be pointed at the git source tree for the site and it will figure out that the files are already in the correct place. I think this would just involve sorting the paths to all the results numerically not alphabetically (e.g. dh_1 < dh_2 < dh_10) before processing them to output.


For the JMH benchmarks I tend to run e.g.

> mvn test -P benchmark -Dbenchmark=NextDoublePerformance

I then have a script to parse the JSON in target/jmh-result.NextDoublePerformance.json to a Jira format or CSV format table. Usually I add some relative score columns in a spreadsheet then paste into Jira (which nicely accepts the tables).

It should not be too hard to generate the existing apt tables for performance. I can look into this. I am assuming that the tables are based on the highest 'number of samples' from each generator.



[...]
You then have a utility for dumping output of any random source to file
in a variety of formats.

Although long output is not needed for the test suites it is useful for
native long generators.

WDYT?
Looks good!
OK. I will work on the raw data dumper as a Jira ticket. It is
encapsulated work that does not really effect anything else.


DieHarder has finished!

I think my stupidity is what caused previous crashes. I was running the
stress test within the source tree and possibly git checkout onto
another branch makes some of the directory paths stale killing any
processes linked to those paths. I'll not do that again.
Hence, the "standalone" application is the right choice it seems.

FYI: Here are the old results with incorrect byte order:

XorShiftSerialComposite : 24, 25, 23 : 134.1 +/- 16.1
XorShiftXorComposite : 88, 105, 89 : 396.2 +/- 9.9
SplitXorComposite : 0, 0, 0 : 90.8 +/- 21.9

Here are the new results with correct byte order:

XorShiftSerialComposite : 13, 15, 10 : 105.5 +/- 1.8
XorShiftXorComposite : 57, 57, 57 : 102.9 +/- 1.5
SplitXorComposite : 0, 0, 0 : 99.9 +/- 3.2

So interestingly passing the correct byte order lowers the number of
failures. There are still lots.

And BigCrush (with the fix for passing the correct byte order):

XorShiftSerialComposite : 40, 39, 39 : 608.2 +/- 3.9
XorShiftXorComposite : 54, 53, 53 : 646.8 +/- 10.9
SplitXorComposite : 0, 0, 0 : 625.8 +/- 0.2
Curious to know whether it is also affected by the byte ordering.

I'll re-run BigCrush with the wrong byte ordering when I have updated the stress test code. I finished it yesterday but will look it over with fresh eyes before committing it.

Alex



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to