Re: [Math] About the refactoring of RNGs

Gilles Tue, 29 Dec 2015 07:42:06 -0800

On Mon, 28 Dec 2015 20:33:24 -0700, Phil Steitz wrote:

On 12/28/15 8:08 PM, Gilles wrote:

On Mon, 28 Dec 2015 11:08:56 -0700, Phil Steitz wrote:

The significant refactoring to eliminate the (standard) next(int)
included in these changes has the possibility of introducing subtle
bugs or performance issues.  Please run some tests to verify that
the same sequences are generated by the 3_X code


IIUC our unit tests of the RNGs, this is covered.


No.  Not sufficient.  What you have done is changed the internal
implementation of all of the Bitstream generators.  I am not
convinced that you have not broken anything.  I will have to do the
testing myself.  I see no point in fiddling with the internals of
this code that has had a lot of eyeballs and testing on it.  I was
not personally looking forward to researching the algorithms to make
sure any invariants may be broken by these changes; but I am now
going to have to do this.  I have to ask why.  Please at some point
read [1], especially the sections on "Avoid Flexibility Syndrom" and
"Value Laziness as a Virtue."  Gratuitous refactoring drains
community energy.


It seems that again you don't read what I write.[1]
Hence the above paragraph is spreading
 F -> "I am not convinced that you have not broken anything."
 U -> "I will have to do the testing myself."
 D -> "I see no point in fiddling [...]"
.

Or maybe I have to rant about email communication.
Please reread the thread to fully appreciate that you could have shared

your doubts among the opinions which you gave about some of myquestions.

When the reply answers to only some of the direct questions, the OP is
legitimately tempted to assume that the non-comment is akin to "I don't
care" (as in "left to the judgment of the one who does the job").

As is often the case, you can dislike my ideas of improvements (ideas
that come on the basis of the information provided by what the code
does) but I don't appreciate the use of the word "gratuitous", given
that I clearly stated the purpose of making explicit what the code
actually does (that is, *generating* 32 random bits and not the number
of bits passed as a parameter to "next(int)").
So, the code is now self-documenting; it is a small change, to be sure,
but hardly gratuitous.

I actually did go some way towards "Avoiding [a] Flexibility Syndrom"
that *was* present in a seeming flexibility of "next(int)".
This method was indeed letting users assume that it is able to generate
_less than 32 bits_ whereas the randomness generators implemented in CM
*always* generate exactly 32 (hopefully random) bits.

I stand guilty on the last count as I do indeed not always "value

laziness as a virtue": I indeed attribute more value to design (andcode)

aesthetics than to laziness.
IMO, ugly code is often an early hint that the design is broken, even
if the functionality may not be (yet?).
[But note again that this change was not "just because of aesthetic
reasons"; in the wording of your reference, I think that "just because"
is important.]

In a real community,[2] you'd value that some people are willing to
tackle different tasks than you do.
Rather than stifling any change on the sole ground that it is a change,
it would be less "draining" on the community if reviewers would only
voice concrete concerns about the resulting code, and not just assume
that the coder's motivation is pointless.[3]

I understand that something can more likely become broken when it is
being touched rather than when left alone.  Of course, I do!
But with the help of an extensive and sensitive test suite, I felt I
could give this small refactoring a try, being fairly confident that
mistakes would not go unnoticed.
Your doubting that the test suite could let this happen should question
our assuming that it could assess the correct behaviour of the previous

code. Alternately, all of the numerous tests passing should mean thatthenew code is not buggier; and visual inspection can assess that itcannot

be slower.[4]

My last thought about how standard the method "next(int)" is, I let it
be conveyed by what is not mentioned in the following unquestionably
standard source:

http://docs.oracle.com/javase/8/docs/api/java/util/SplittableRandom.html


[I could have made additional comments on how various suggestions in
  http://www.apachecon.com/eu2007/materials/ac2006.2.pdf
are either not applied in the CM project, or not taken "with a grain of
salt" whenever it suits you.  But this post has already drained me far
too much.]

Gilles

[1] I can make mistakes (and I did, as told previously) in "fiddling"withthe code, but that can be spotted relatively easily by inspectingthreeone-line changes commits and their few lines consequences on thegenericmethods where "nextInt()" replaced "next(int)", mutatis mutandis,in a

    single and quite small class.
    That is a far cry from "researching the algorithms".

[2] To be contrasted with the "common good" for which your standagainst

    changes may be the right one for many (conservative) CM users.

[3] I can assure you that it is quite draining to have to defend anyand

    every change, especially when they are obviously towards greater
    "standardness", even when they would occur in a new major version,

against an opposition that is only based on a conservatismargument:When the code can be made more standard or more elegant or moreflexibleor more state-of-the-art, it is deemed not necessary because "we'vealways

    done it that way".

This is a general remark about other discussions. Here I totallyagreethat I favoured non-standard Java in trashing "next(int bits)". Inother

    languages, that signature is _not_ "standard".

[4] In order to defend a small and technically innocuous modification,I haveto myself do some "researching". Mind you, it's "just" reading fromwebpages edited by people who AFAICT seem to know what they aretalking about

    (but I may be missing something, again).

Among other things which I've shared in this post, it appears thatabenchmark including the WELL (1204a) RNG indicates that it is 5 to10 times

    slower than alternative RNGs.

In this light, if you are so worried about performance issuesinduced by afunctionally no-op change, then you probably should not use CM forRNG.

and the refactored
code and benchmarks to show there is no loss in performance.


Given that there are exactly two operations _less_ (a subtraction
and a shift), it would be surprising.

It
would also be good to have some additional review of this code by
PRNG experts.


The "nextInt()" code is exactly the same as the "next(int)" modulo
the little change above (in the last line of the "nextInt/next"
code).

That change in "nextInt/next" implied similarly tiny recodings in

the generic methods "nextDouble()", "nextBoolean()", ... which,apart

from that, were copied from "BitsStreamGenerator".

[However tiny a change, I had made a mistake... and dozens of tests
started to fail. Found the typo and all was quiet again...]

About "next(int)" being standard, it would be interesting to know
what that means.


Have a look at the source code for the JDK generators, for example.

As I indicated quite clearly in one of my first posts about this
refactoring
1. all the CM implementations generate random bits in batches
   of 32 bits, and
2. before returning, the "next(int bits)" method was truncating
   the generated "int":
     return x >>> (32 - bits);

In all implementations, that was the only place where the "bits"
parameter was used, from which I concluded that the randomness
provider does not care if the request was to create less than 32
random bits.
Taking "nextBoolean()" for example, it looks like a waste of 31
bits (or am I missing something?).


Quite possibly, yes, you are missing something.


Of course, if some implementation were able to store the bits not
requested by the last call to "next(int)", then I'd understand that
we must really provide access to a "next(int)" method.

Perhaps that the overhead of such bookkeeping is why the practical
algorithms chose to store integers rather than bits (?).

As you dismissed my request about CM being able to care for a RNG
implementation based on a "long", I don't quite understand the
caring for a "next(int)" that serves no more purpose (as of current
CM).

This change is


Gilles

Phil

On 12/28/15 10:23 AM, er...@apache.org wrote:

Repository: commons-math
Updated Branches:
  refs/heads/master 7b62d0155 -> 81585a3c4


MATH-1307

New base class for RNG implementations.
The source of randomness is provided through the "nextInt()"
method (to be defined in subclasses).


[...]

[1] http://www.apachecon.com/eu2007/materials/ac2006.2.pdf



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Math] About the refactoring of RNGs

Reply via email to