Henri Yandell wrote:
Some thoughts from a general Commons perspective:

Thanks!

1) +0, This is worth getting right early on as package renaming does tend to confuse users and takes a long time to work through deprecation etc.

Ugh.. Lots of change incompatible with RC1 here...but if others agree...
What I dont understand is why the concept of univariate vs. multivariate is hard to understand. Univariate - sample consists of a single array of data (one random variable / distribution); multivariate - sample has more than on column (random vector / joint distribution).


2)  -0. This is a new feature. If it's easy, add it. If it involves
effort, don't bother until 1.1.
3)  -0. Two options leap to mind, release PRNG as it is, or don't
release the PRNG code. Yes it's a pain for users when you change the
functionality in a new version, but when the option is not having a
feature, users opt for the functionality and pain later. Most likely
the change would be a simple perl regexp anyway.

A point of clarification here. There is no PRNG code in [math]. The random package includes random data generation methods that *use* the JDK PRNG to generate random data, permutations or samples. I strongly disagree with the assertion that the JDK Random (and SecureRandom) implementations are worthless to the point where this package (which fully documents what PRNG is used) is not worth releasing. The valid issue here is that the PRNG should be pluggable (currently you have to subclass RandomDataImpl to do this). There also appears to be interest -- independently -- in adding other PRNG implementations, which could be among those plugged in to the random package.


4)  -0. I was never a statistician, but this sounds like new
functionality. Either release the code as is, or drop it. While 3) is
an API change, this sounds like a functional change and those are much
more painful for a user.

I disagree strongly with this change, for two reasons: first, we spent a long time dabating whether or not statistics should be implemented as separate classes and decided in favor of this. To have a single class compute multiple statistics would be inconsistent with the design of the package. Secondly, even though the "population" version is computationally close to the "sample" version, there is an important and fundamental difference between them conceptually. I tried to explain this in earlier posts. Moreover, it is trivial to add additional classes implementing the "population" versions -- which actually supports the current one statistic per class design. I have not added them because I did not see this as essential for 1.0 and frankly I am not sure they belong in .statistics, since statistics are usually associated with sample data (i.e., some people would not call the population versions "statistics" but rather "population parameters").


5) -0. Keep it as is. Again, it might mean an API change in the
future, but I doubt anyone knows the perfect solution so let's see how
this one goes.

So the only one I'd advise as really being worth the effort is 1.

Hen


Phil

The following changes have been suggested recently.  Before cutting 1.0
final, we should make sure we are all OK postponing or forgoing these:

1) Eliminate the univariate/multivariate distinction in the stat
package, because this seems confusing to some.  Change .univariate to
.descriptive and .multivariate to .regression

2) Add methods to create row or column matrices from double arrays and
to extract submatrices (to the interface itself, rather than adding
these to a utils class later)

3) Make the PRNG fully pluggable in the random package.

4) Modify Variance and StandardDeviation to compute multiple statistics
(with the variants being population, rather than sample statistics).

5) Drop the interface / implementation separation throughout the package.

I am personally -1 on 4) and 5); -0 on 1) and 2); and +0 on 3). I voted
+1 on the release; however, which means that 3) is a wart that I am
willing to live with for 1.0.  It can be worked around now and to fix it
correctly will require that we define a PRNG interface and introduce
factories, etc.

Mark, since you voted to reopen API discussion, can you weigh in on
these issues and add any others that you see as show-stoppers?

Phil




--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to