[math] Recent commits to stat, util packages

Phil Steitz Sat, 05 Jul 2003 17:55:20 -0700

I have a couple of problems with the recent commits to stat and util.

First, the testAddElementRolling test case in FixedDoubleArrayTest will not
compile, since it is trying to access what is now a private field in
FixedDoubleArray (internalArray). The changes to FixedDoubleArray should be
rolled back or the tests should be modified so that they compile and succeed.


Second, I do not see the value in all of the additional classes and overhead
introduced into stat. The goal of Univariate was to provide basic univariate
statistics via a simple interface and lightweight, numerically sound
implementation, consistent with the vision of commons-math and Jakarta Commons
in general. I fear that we may be straying off into statistical computation
framework-building, which I don't think belongs in commons-math (really Jakarta
Commons). More importantly, I don't think we need to add this complexity to
deliver the functionality that we are providing. The only problem that I see
with the structure prior to the recent commits is the confusion between
collections and univariates addValue methods.  I would favor eliminating the
List and BeanList univariates altogether and replacing their functionality with
methods added to StatUtils that take Lists or Collections and property names as
input and compute statistics from them. Similarly, the Univariate interface
could be modified to include addValues(double[]), addValues(List) (assumes
contents are Numbers), addValues(Collection, propertyName).

The checkin comment says that the new univariate framework is independent of
the existing implementations; but StatUtils has been modified to include
numerous static data members and to delegate computation to these.  This adds
significant overhead and I do not see the value in it.  The cost of the
additional stack operations/object creations is significant.  I ran tests
comparing the previous version that does direct computations using the double[]
arrays to the modified version and found an average of more than 6x slowdown
using the new implementation. I did not profile memory utilization, but that is
also a concern. Repeated tests computing the mean of a 1000 doubles 100000
times using the old and new implementations averaged 1.5 and 10.2 seconds,
resp. I do not see the need for all of this additional overhead. 

I suggest that we postpone introduction of a statistical computation framework
until after the initial release, if needed.  In any case, I would like to keep
StatUtils and the core UnivariateImpl small, fast and lightweight, so I would
like to request that the changes to these classes be rolled back.

If others feel that this additional infrastucture is essential, then I just
need to be educated.  It is quite possible that I am thinking too narrowly in
terms of current scope and I may be missing some looming structural problems. 
If this is the case, I am open to being educated. I just need to see a) exactly
why we need to add more complexity at this time and b) why breaking univariate
statistics into four packages and 17 classes when all we are computing is basic
statistics is necessary.  

Phil



__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[math] Recent commits to stat, util packages

Reply via email to