Okay, folks, this is definitely getting out of hand. Let's put a moratorium on this thread for the weekend or something and try to come back together next week and try to move forward. I would urge folks to watch this while we wait:
https://m.youtube.com/watch?v=rOWmrlft2FI p.s. Phil, I do hope you'll reconsider. On Fri, Feb 5, 2016 at 10:47 PM Phil Steitz <phil.ste...@gmail.com> wrote: > OK, I give up. I am withdrawing as volunteer chair or member of the > new TLP. > > Phil > > On 2/5/16 7:23 PM, Gilles wrote: > > Phil, > > > > You talk again about me trying to push forward changes that > > serve no purpose besides "trash performance and correctness". > > This is again baseless FUD to which I've already answered > > (with detailed list of facts which you chose to ignore). > > You declare anything for which you don't have an answer as > > "bogus argument". Why is the reference to multi-threaded > > implementations bogus? You contradict yourself in pretending > > that CM RNGs could be so good as to make people want to use > > them while refusing to consider whether another design might > > be better suited to such high(er)-performance extensions. > > This particular case is a long shot but if any and all > > discussions are stopped dead, how do you imagine that we can > > go anywhere? > > As you could read from experts, micro-benchmarks are deceiving; > > but you refuse to even consider alternative designs if there > > might be a slight suspicion of degradation. > > How can we ever set up a constructive discussion on how to > > make everybody enjoy this project if the purported chair is > > so bent to protecting existing code rather than nurture a good > > relationship with developers who may sometimes have other ideas? > > I'm trying to improve the code (in a dimension which you can't > > seem to understand unfortunately) but respectfully request > > data points from those users of said code, in order to be > > able to prove that no harm will be done. > > But you seem to prefer to not disclose anything that would > > get us closer to agreement (better design with similar > > performance and room for improvement, to be discussed > > together as a real development team -- Not you requiring, > > as a bad boss, that I bow to your standards for judging > > usefulness). > > This 1% which you throw at me, where does it come from? > > What does 1% mean when the benchmark shows standard deviations > > that vary from 4 to 26% in the "nextInt" case and from 3 to > > 7% in the "nextGaussian" case? > > This 1% looks meaningless without context; context is what I'm > > asking in order to try and establish objectively whether > > another design will have a measurable impact on actual tasks. > > I'm not going to show any "damaged" benchmark because of how > > unwelcome you make me feel every time I wish to talk about > > other aspects of the code. > > There is no development community here. Only solitary > > coders who share a repository. > > > > Not sorry for the top-post, > > Gilles > > > > > > On Fri, 5 Feb 2016 17:07:16 -0700, Phil Steitz wrote: > >> On 2/5/16 12:59 PM, Gilles wrote: > >>> On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote: > >>>> On 2/4/16 3:59 PM, Gilles wrote: > >>>>> Hi. > >>>>> > >>>>> Here is a micro-benchmark report (performed with > >>>>> "PerfTestUtils"): > >>>>> ----- > >>>>> nextInt() (calls per timed block: 2000000, timed blocks: 100, > >>>>> time > >>>>> unit: ms) > >>>>> name time/call std dev total time ratio > >>>>> cv difference > >>>>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000 > >>>>> 0.26 0.0000e+00 > >>>>> o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941 > >>>>> 0.15 -1.2900e+02 > >>>>> o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097 > >>>>> 0.04 2.1032e+02 > >>>>> o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239 > >>>>> 0.14 5.1945e+02 > >>>>> o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374 > >>>>> 0.14 8.1451e+02 > >>>>> o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450 > >>>>> 0.06 9.7816e+02 > >>>>> o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763 > >>>>> 0.08 1.6602e+03 > >>>>> o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795 > >>>>> 0.14 1.7301e+03 > >>>>> o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074 > >>>>> 0.16 1.6139e+02 > >>>>> ----- > >>>>> where "cv" is the ratio of the 3rd to the 2nd column. > >>>>> > >>>>> Questions are: > >>>>> * How meaningful are micro-benchmarks when the timed operation > >>>>> has > >>>>> a very > >>>>> small duration (wrt e.g. the duration of other machine > >>>>> instructions that > >>>>> are required to perform them)? > >>>> > >>>> It is harder to get good benchmarks for shorter duration > >>>> activities, > >>>> but not impossible. One thing that it would be good to do is to > >>>> compare these results with JMH [1]. > >>> > >>> I was expecting insights based on the benchmark which I did run. > >> > >> You asked whether or not benchmarks are meaningful when the task > >> being benchmarked is short duration. I answered that question. > >>> > >>> We have a tool in CM; if it's wrong, we should remove it. > >>> How its results compare with JMH is an interesting question, > >> > >> I will look into this. > >>> I > >>> agree, but I don't have time to make an analysis of benchmarking > >>> tools (on top of what I've been doing since December because > >>> totally innocuous changes in the RNG classes were frowned upon > >>> out of baseless fear). > >> > >> Please cut the hypberbole. > >>> > >>>>> * In a given environment (HW, OS, JVM), is there a lower limit > >>>>> (absolute > >>>>> duration) below which anything will be deemed good enough? > >>>> > >>>> That depends completely on the application. > >>> > >>> Sorry, I thought that it was obvious: I don't speak of applications > >>> that don't care about performance. :-) > >>> > >>> For those that do, I do not agree with the statement: the question > >>> relates to finding a point below which it is the environment that > >>> overwhelms the other conditions. > >>> A point where there will be _unavoidable_ overhead (transferring > >>> data > >>> from/to memory, JVM book-keeping, ...) and perturbations (context > >>> switches, ...) such that their duration adds a constant time (on > >>> average) that may render most enhancements to an already efficient > >>> algorithm barely noticeable in practice. > >>> Similarly, but in the opposite direction, some language constructs > >>> or design choices might slow down things a bit, but without > >>> endangering any user. > >>> > >>> A problem arises when any enhancement to the design is deemed > >>> harmful because it degrades a micro-benchmark, even though that > >>> benchmark may not reflect any real use-cases. > >>> Then, the real harm is against development. > >>> > >>>>> * Can a library like CM admit a trade-off between ultimate > >>>>> performance and > >>>>> good design? IOW, is there an acceptable overhead in exchange > >>>>> for other qualities > >>>>> (clarity, non-redundancy, extensibility, etc.)? > >>>> > >>>> That is too general a question to be meaningful. We need to look > >>>> at specific cases. What exactly are you proposing? > >>> > >>> <rant> > >>> It is quite meaningful even if it refers to general principles. > >>> Those could (should, IMO) be taken into account when managing a > >>> project like CM, on a par with "performance" (whose intrinsic value > >>> is never questioned). > >>> </rant> > >> > >> Rant all you want. Vague generalities and hyperbole have no value. > >>> > >>> Two specific cases are: > >>> * inheritance vs delegation (a.k.a. composition) > >>> * generics (that could require runtime casts) > >> > >> This is getting closer to meaningful. Where exactly in the code are > >> you wanting to use something and seeing benchmark damage? > >>> > >>>>> * Does ultimate performance for the base functionality > >>>>> (generation > >>>>> of a > >>>>> random number) trump any consideration of use-cases that would > >>>>> need an > >>>>> extension (of the base functionality, such as computation to > >>>>> match another > >>>>> distribution) that will unavoidably degrades the performance > >>>>> (hence the > >>>>> micro-benchmark will be completely misleading for those users)? > >>>> > >>>> Again, this is vague and the answer depends on what exactly you > >>>> are > >>>> talking about. Significantly damaging performance of PRNG > >>>> implementations is a bad idea, > >>> > >>> Now, *this* is vague: what do you mean by "significantly"? > >>> That was actually my question in the first place. > >> If you are talking about PRNG performance, I would say a 1% hit is > >> significant. > >>> Referring to the > >>> benchmark above, people who'd know why they require ultimate > >>> performance > >>> should be able to tell what range of numbers they'd find > >>> acceptable in > >>> that table. > >>> > >>> <rant> > >>> Actually my questions are very precise, but the answers would > >>> require > >>> some decent analysis, rather than the usual "bad idea" dismissal. > >>> </rant> > >>> > >>> In the Javadoc of the "random" package, there is information about > >>> performance but no reference as to the benchmarking procedure. > >> > >> It would be great to repeat these using JMH, which is emerging as a > >> de facto standard for java benchmarking. I will look into this. > >>> > >>> I can consistently observe a totally different behaviour (using > >>> "PerfTestUtils"): > >>> 1. "MersenneTwister" is *always* faster than all of the WELL RNGs; > >>> 2. moreover, the ratio *grows* with each of the longer periods > >>> members of the WELL family (see the above table). > >>> > >>> This makes me wonder how someone who purports to need "ultimate" > >>> performance can have any objective basis to determine what is good > >>> or bad for his own applications. > >>> > >>>> unless there are actual practical use > >>>> cases you can point to that whatever changes you are proposing > >>>> enable. > >>> > >>> As I've explained in very much details in another thread, I've > >>> reviewed (from a design POV) the RNG code in "random" and IMHO, > >>> there > >>> is room for improvement (cf. above for what I mean by that term). > >>> <rant> > >>> I have some code ready for review but I had to resort to what I > >>> considered sub-optimal design (preemptively renouncing to propose a > >>> "delegation"-based design) solely because of the destructive > >>> community > >>> process that takes place here.[1] > >>> </rant> > >> > >> More vague hyperbole that serves no purpose. Please focus on actual > >> code or design issues. > >>> > >>> The practical use-cases is anything that needs further > >>> processing of > >>> the numbers produced according to a uniform distribution: > >> > >> Isn't that what the samplers in the distributions package do? What > >> we need from the PRNG implementations is just blocks of bits. Since > >> we wanted a pluggable replacement for j.u.Random, we added uniform > >> ints, longs and floats and gaussian floats. The samplers just need > >> uniform doubles. The practical use case we need is well-supported > >> in the code we have. What is missing, exactly? > >>> I agree that > >>> there would be little sense to code that latter part in a "pure" OO > >>> way[2]. And Luc made it indeed quite efficient, I think, in the > >>> various > >>> concrete classes. > >>> What I want to reconsider is how those concrete low-level > >>> algorithms can > >>> be plugged in a higher-level function that just requires a > >>> "source of > >>> randomness", as I'd call a provider of "int" (or "long") values, > >>> where > >>> the high level functionality does not care at all about the > >>> provider's > >>> inner working (a.o. how it's seeded!). > >> > >> This is why many higher-level samplers and other things that require > >> random data inside [math] have a pluggable RandomGenerator. > >>> > >>> A case in point is the sampling of other distributions (namely the > >>> Normal distribution). > >> > >> Or any of the others. We have a default, inversion-based method > >> that the abstract distribution classes provide and some pretty good > >> specialized implementations within individual distributions. Most > >> of these just require uniform random doubles as source. > >> > >>> > >>> Here is the benchmark report: > >>> ----- > >>> nextGaussian() (calls per timed block: 2000000, timed blocks: 100, > >>> time unit: ms) > >>> name time/call std dev total time ratio > >>> cv difference > >>> o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000 > >>> 0.14 0.0000e+00 > >>> o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371 > >>> 0.07 1.2892e+04 > >>> o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330 > >>> 0.06 1.0393e+04 > >>> o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733 > >>> 0.07 1.1360e+04 > >>> o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797 > >>> 0.04 1.1513e+04 > >>> o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052 > >>> 0.03 1.2125e+04 > >>> o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970 > >>> 0.06 1.1928e+04 > >>> o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804 > >>> 0.04 1.3931e+04 > >>> o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882 > >>> 0.06 1.4118e+04 > >>> o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603 > >>> 0.08 1.1049e+04 > >>> ----- > >>> where the first line is JDK's "nextInt()" and the remaining are > >>> "nextGaussian()". > >>> > >>> The generation time is thus about 4-fold that of "nextInt()". > >>> Thus, degrading the performance of "nextInt()" by 10% would > >>> degrade the > >>> performance of "nextGaussian()" by half that. > >>> > >>> For a performance discussion to be meaningful, I think that we'd > >>> need > >>> to know how that fact would affect, even modestly, any moderately > >>> complex > >>> post-processing of the generated values. > >>> > >>> Another case, for modularity, would be to consider that other > >>> algorithms could > >>> be implemented to provide the required distribution.[3] > >>> In the current design (inheritance-based), that can only be done > >>> by creating > >>> a subclass, even though the core functionality ("nextDouble()") is > >>> not > >>> overridden. > >>> > >>>>> * What are usages of the CM RNGs? > >>>>> Do those use-cases strictly forbid "loosing" a dozen > >>>>> milliseconds per > >>>>> million calls? > >>>> > >>>> There are many different use cases. My own applications use > >>>> them in > >>>> simulations to generate random deviates, to generate random hex > >>>> strings as identifiers and in stochastic algorithms like some > >>>> of our > >>>> internal uses. The last case is definitely sensitive to PRNG > >>>> performance. > >>> > >>> Thanks for giving examples, but since we talk about performance, I > >>> was hoping for some real flesh, like the relative duration of > >>> numbers > >>> generation (e.g. the total duration of calls to the > >>> "RandomGenerator" > >>> instances wrt to the total duration of the application). > >>> > >>> I don't know if by "last case", you are referring to code that is > >>> inside CM. I didn't spot anything that makes "heavy" usage of a > >>> RNG (in the sense that generation would count as a sizable part of > >>> the whole processing). > >> monteCarloP in KolmogorovSmirnovTest is one to check. > >>> > >>> As I pointed out many times: if an application is severely > >>> dependent > >>> on the performance of RNG, the user probably will turn to specific > >>> tools (e.g. GPUs? [4]) rather than use CM. > >> > >> That is a bogus argument. We should make our PRNGs simple and fast > >> so their use can extend to performance-sensitive applications. > >>> > >>> Conversely, using Java might be preferred for its flexibility, > >>> which > >>> is destroyed by a search for ultimate performance (which nobody > >>> seems > >>> able to define reasonably). > >>> Performance is not a goal in itself; it should not be a trophy > >>> which > >>> sits uselessly on a shelf. > >> > >> Nor should "beautiful design" in the eyes of one person. > >>> > >>> My goal is not to deliberately slow things down; it is to allow > >>> some > >>> leeway so that designs which are deemed better (on all counts > >>> except, > >>> perhaps, performance) are given a chance to show their > >>> strengths, in > >>> particular in areas where performance in absolute terms is "good > >>> enough" for all use-cases which CM should care about (hence the > >>> need > >>> of actual data points[5]). > >> > >> I see no reason that we can't have it both ways - good design and > >> good performance. What we have now, modulo maybe some small changes > >> to reduce code duplication, works fine. If you want to play with > >> 64-bit generators and can find reference implementations and verify > >> that they do in fact perform better, great. If not, I don't see the > >> point. You can rant and complain all you want; but I am not going > >> to let us trash performance or correctness of code in the random > >> class or anywhere else just because you think it is somehow "better > >> designed" unless you can show specific, practical use cases > >> demonstrating the value of the changes. > >> > >> Phil > >>> > >>> > >>> Gilles > >>> > >>> [1] "Is it faster?" > >>> "No." > >>> "Then, no." > >>> [2] Although that is in some sense what you indirectly defend by > >>> wanting > >>> to stick with a meaningless "next(int bits)" method. > >>> [3] http://www.doornik.com/research/ziggurat.pdf > >>> [4] http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html > >>> [5] Hence the need to agree on a methodology/policy for > >>> benchmarking. > >>> > >>>> > >>>> Phil > >>>> > >>>> [1] http://openjdk.java.net/projects/code-tools/jmh/ > >>>>> IOW, would those users for which such a difference matters use > >>>>> CM at all? > >>>> > >>>>> > >>>>> > >>>>> Thanks, > >>>>> Gilles > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >