Re: [math] API changes for RC2

Phil Steitz Sun, 26 Sep 2004 11:23:51 -0700

Mark R. Diggory wrote:

1) Eliminate the univariate/multivariate distinction in the stat package, because this seems confusing to some. Change .univariate to .descriptive and .multivariate to .regression
Univariate and Multivariate are just "classifications". There is no suggestion of changing the structure of the packages. Perhaps we can begin building a "classification outline" now so that we have a better idea what are the classes of statistics and what we want our naming scheme to be based on. In the past I've always leaned towards a classification similar to the mathworld site.

Unfortunately, classification != hierarchical decomposition. The latter has got to be tree with no overlap. This is like the LDAP DIT design problem -- unless you have a *very* immutable world with very natural boundaries, you are likely better off sticking to a relatively flat structure. This is why I am now leaning toward .descriptive (fits everything in there) and .regression. While they are not OO, SAS and R/S both present very flat "package" structures and I don't have that much trouble finding things in them.

The idea of moving SimpleRegression to a package called "regression" is a means to classify "regressions" as much as to classify "multivariates" or "univariates".

o.a.c.math.stat.regression.SimpleRegression

Yes.

o.a.c.math.stat.univariate.DescripiveStatistics

No. Drop the "univariate"

o.a.c.math.stat.multivariate...

No.  Will eventually have things like
o.a.c.math.stat.cluster

Kim made a critique about the naming. Yet package names have little to do with the performance of the library. A simple package rename for clarification prior to release is ok with me as long as it "is clarifying".

The point is that we do not want our users to have to experience the pain associated with changing package structure later. I agree that we need to get this right and I may not be thinking about this correctly, so I will wait to make these changes until we all agree.

2) Add methods to create row or column matrices from double arrays and to extract submatrices (to the interface itself, rather than adding these to a utils class later)

Yes, abstracting the passing the reference to a row, column or submatrix to an interface provides us a means to generically perform operations on the matrix independent of the primitive double[] type which cannot be customized or extended. By passing the interface and not the array itself we can actually hand around "references" to the original matrix instead of copies of it. This will be much more efficient for large matrices and allow us as well to implement the same methods on sparse matrix implementations which may not actually be stored in an [][] structure.

If I understand you correctly, what you are suggesting above is to create *references* to submatrices based on the same underlying data as the "parent" rather than making copies. If we do this, we should implement the "copy semantics" as well and carefully document what is going on in each case (similar to the setData and setDataRef stuff now -- one set makes copies, one does not) The "reference" versions really break encapsulation and can lead to nasty bugs. I understand; however, that for large matrices limiting copy operations is necessary. I still think; however, that all of this would be better placed in a MatrixUtils class and this could be added in 1.1 with no loss. These are new feature requests that came in after RC1 was cut and they can be accomodated in 1.1 without breaking backward compatability. I see no reason to hold the release for this.

[+1]
3) Make the PRNG fully pluggable in the random package.
I think the challenge we end up with here is to simply provide an interface and base implementation that uses the JVM PRNG,

Well, that is what we have done. RandomDataImpl is the implementation of the RandomData interface that uses the JVM PRNG.

if a user wishes to override the PRNG they simple just implement the interface and pass the implementation into the class that uses the PRNG. We can also provide a separate driver implementation based on RngPack and package that separately as well. If users wish to change the PRNG then they can pickup the RngPack distro and our driver for it.

What we need to do here, if we want to get this done correctly before 1.0 is design a "RandomSource" or "RandomGenerator" interface. Unforturnatlely, java.util.Random is not an interface and what we need is to abstract an appropriate interface that will represent this and any other PRNG (or RNG) that users may want to plug in. This will be tricky and will require some research and discussion. We can do this now; but it will take some time. I would prefer to move forward with the release, adding a factory to produce RandomData impls, including a "PRNG-pluggable" version of RandomDataImpl in 1.1.

I felt I could live with these issues unresolved for release 1.0 as well. Yet it sounded like others did not find it satisfactory. I'm willing to work on those I voted [+1] on (Matrix Methods, and PRNG Plugability) to get the packages more satisfactory.

I think we should just implement the Variants of Variance and StandardDeviation as separate classes

If you think these absolutely must be in 1.0, go ahead and add the classes, tests and docs and I will hold RC2 until they are in. Personally, I see no reason that we need to hold the release for these additional features.

Phil


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [math] API changes for RC2

Reply via email to