On Mon, Jul 16, 2012 at 09:08:51PM -0700, Phil Steitz wrote: > On 7/16/12 5:17 PM, Gilles Sadowski wrote: > > On Mon, Jul 16, 2012 at 04:39:41PM -0700, Phil Steitz wrote: > >> On 7/16/12 2:46 PM, Gilles Sadowski wrote: > >>> Hi. > >>> > >>> Referring to the discussion here: > >>> https://issues.apache.org/jira/browse/MATH-764 > >>> https://issues.apache.org/jira/browse/MATH-823 > >>> > >>> Do we agree: > >>> * To add new constructors to the distribution classes (in package > >>> "o.a.c.m.distribution") that take a "RandomGenerator" (in package > >>> "o.a.c.m.random") as an argument? > >> +1 > >>> * That this argument (reference) will be stored in the distribution > >>> object but can be manipulated from outside (e.g. "setSeed") so that > >>> the class is not strictly immutable? [Immutability is impossible to > >>> enforce since there is no way to copy such an object.] > >> +1 > >>> * That the distribution classes do not need a setter for the RNG, nor a > >>> a method to re-seed the RNG? [I.e. if a user wants a new RNG, he must > >>> instantiate a new distribution object.] > >> +1 for no setter, -1 for no reseed. > > Why do you need a "reSeed" method since it can be called on the RNG object > > that was passed to the constructor of the distribution? > > I.e. it is the user's responsibility to keep a reference to the RNG object > > if he wants to later modify it. > > IMO, it makes it more obvious that the RNG is "owned" by the user's code, > > not by the distribution instance. > > I get your point, but in my view, any random data generation API > should expose a reseed method to reseed the underlying source of > randomness. So rather than force users to hold a reference to the > RandomData instance provided at construction time to do this > indirectly, I think it would be more convenient / more complete to > provide the method in the same API that does the data generation, > which is now going to be the distribution.
Currently, I only see bad usage examples of this functionality: ----- final RandomGenerator rng = new Well512a(); final int num = 5; final NormalDistribution[] dists = new NormalDistribution[num]; // Initialize the distributions. for (int i = 0; i < num; i++) { final double mean = ... final double sigma = ... dists[i] = new NormalDistribution(rng, mean, sigma); } // Do some work. // ... // Reseed the RNG. for (NormalDistribution g : dists) { final long seed = ... g.setSeed(seed); } ----- The "setSeed" method is very error-prone, as the last block shows: 1. No loop is necessary since there is a single instance of the RNG, shared by all the distributions. 2. The same seed is used by all the distributions, although the loop make it look like each is reseeded differently. Another problem is that "RandomGenerator" has 3 "setSeed" methods, and the distribution interfaces must then be expanded to accomodate all the variants; and this additional layer of delegation will not save a single typing stroke of the user. One situation where a "setSeed" is needed is when there is no more handle to the RNG: ----- // Initialize the distributions. for (int i = 0; i < num; i++) { final double mean = ... final double sigma = ... dists[i] = new NormalDistribution(new Well512a(), mean, sigma); } ----- But I would argue that if a user later needs access to the RNGs, the best way is indeed to keep the handles: ----- final RandomGenerator[] rng = new RandomGenerator[num]; // Initialize the RNGs and the distributions. for (int i = 0; i < num; i++) { rng[i] = new Well512a(); final double mean = ... final double sigma = ... dists[i] = new NormalDistribution(rng[i], mean, sigma); } // ... // Reseed the RNGs. for (RandomGenerator r : rng) { final long seed = ... r.setSeed(seed); } ----- > > > > >>> * That the ad-hoc code of the sampling methods currently in > >>> "RandomDataImpl" (in package "o.a.c.m.random") will be moved over to > >>> the > >>> "sample" method in the correpsonding distribution class? > >> +1 > >>> * That "RandomData" and "RandomDataImpl" will be refactored so that > >>> methods > >>> that duplicate functionality transferred to the distribution will > >>> deprecated in 3.1 and removed in 4.0? > >> -0 If we keep these classes, which I am inclined to support > >> (combined into one), they should delegate implementations to > >> distributions (exactly the reverse of how it is now). > > No problem. But IIUC, this will require instantiating a distribution object > > for each call; in some situations, this will be an obvious performance loss > > (e.g. when sampling from a distribution with same parameters). > > Yeah, we will probably want to cache these. This will be rather difficult, since there should be one instance for each combination (the number of which is infinite) of the parameters. The memory footprint will quickly become unbearable. Fixed size pools can be useless given a (un)suitable striping of the parameters in the successive calls to the method. Fecthing small objects (like distributions) from the cache might not be any faster than recreating them (the more so that the RNG itself can be reused) with the appropriate parameters. Regards, Gilles > > Phil > > > >>> * That the interface "RandomData" and its unique implementation > >>> ("RandomDataImpl") will be merged in 4.0? > >> +1 > >> > > Gilles > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org