Thank you Phil for the code, I've finished adding the StatUtil strategies to the evaluation methods of the new classes I've been working on. The Timings are once again comparable for both packages. As they should be, since the code is the same for these approaches.

Mean (old=StatUtils, new=Mean class)
old: 3505    new: 2534
old: 3395    new: 2513
old: 3385    new: 2514
old: 3385    new: 2513
old: 3405    new: 2504
old: 3395    new: 2503
old: 3405    new: 2504
old: 3405    new: 2513
old: 3385    new: 2524
old: 3385    new: 2523
old: mean=3405.0 std=36.20926830400073
new: mean=2514.5 std=10.013879257199855

Variance (old=StatUtils, new=Variance class)
old: 38265    new: 40168
old: 38235    new: 40038
old: 38235    new: 40037
old: 38255    new: 40098
old: 38305    new: 40098
old: 38275    new: 40147
old: 38285    new: 40068
old: 38185    new: 39977
old: 38205    new: 39978
old: 38185    new: 39977
old: mean=38243.0 std=41.57990967870072
new: mean=40058.6 std=69.6661881961239

I've also added test that exec the same set of values and results on both evaluation and incremental methods to show that both approaches return equal results within a tolerance of 10E-12 for the provided dataset.

-Mark

Phil Steitz wrote:
Sorry, last reply got sent before I was done with it. Pls disregard and try
this....


This adds
significant overhead and I do not see the value in it.  The cost of the
additional stack operations/object creations is significant.  I ran tests
comparing the previous version that does direct computations using the

double[]


arrays to the modified version and found an average of more than 6x

slowdown


using the new implementation. I did not profile memory utilization, but

that is


also a concern. Repeated tests computing the mean of a 1000 doubles 100000
times using the old and new implementations averaged 1.5 and 10.2 seconds,
resp. I do not see the need for all of this additional overhead.



If you review the code, you'll find there is no added "object creation", the static Variable objects calculate on double[] just as the Univariates did, I would have to see more substantial analysis to believe your claim. All thats going on here are that the Static StatUtil methods are delegating to individual static instances of UnivariateStatistics. These are instantiated on JVM startup like all static objects, calling a method in such an object should not require any more overhead than having the method coded directly into the static method.


If there are performance considerations, lets discuss these.


Here is what I added to StatUtils.test

double[] x = new double[1000];
for (int i = 0; i < 1000; i++) {
x[i] = (5 - i) * (i - 200);
}
long startTick = 0;
double res = 0;
for (int j = 0; j < 10; j++) { startTick = System.currentTimeMillis();
for (int i = 0; i < 100000; i++) {
res = OStatUtils.mean(x);
}
System.out.println("old: " + (System.currentTimeMillis() - startTick));
startTick = System.currentTimeMillis();
for (int i = 0; i < 100000; i++) {
res = StatUtils.mean(x);
}
System.out.println("new: " + (System.currentTimeMillis() - startTick));


The result was a mean of 10203 for the "new" and 1531.1 for the "old", with
standard deviations 81.1 and 13.4 resp. The overhead is the stack operations
and temp object creations.


I doubt (as the numerous discussions over the past week have pointed out) that what we really want to have in StatUtils is one monolithic Static class with all the implemented methods present in it. If I have misinterpreted this opinion in the group, then I'm sure there will be responses to this.


Well, I for one would prefer to have the simple computational methods in one
place.  I would support making the class require instantiation, however, i.e.
making the methods non-static.



There was a great deal of discussion about the benefit of not having the methods implemented directly in static StatUtils because they could not be "overridden" or worked with in an Instantiable form. This approach frees the implementations up to be overridden and frees up room for alternate implementations.


As I said above, the simplest way to deal with this is to make the methods
non-static.


You may have your opinions of how you would like to see the packages organized and implemented. Others in the group do have alternate opinions to yours. I for one see a strong value in individually implemented Statistics. I also have a strong vision that the framework I have been working on provides substantial benefits.

(1a.) It Allows both the storageless and storage based implementations to function behind the same interface. No matter if your calling

increment(double d)

or

evaluate(double[]...)

your working with the same algorithm.


That is true in the old implementation as well, with the core computational
methods in StatUtils.

(1b.) If you wish to have alternate implementations for evaluate and increment, it is easily possible of overload theses methods in future versions of the implementations.


Just make the methods non-static and that will be possible.  I am not sure,
given the relative triviality of these methods, if this is really a big deal,
howerver.




Phil, its clear we have very different "schools of thought" on the subject of how the library should be designed. As a developer on the project I have a right to promote my design model and interests. The architecture is something I have a strong interest in working with.


You certainly have the right to your opinions.  Others also have the right to
disagree with them.

Apache projects are "group" projects, If a project such a [math] cannot find community and room for multiple directions of development. If it cannot make room for alternate ideas and visions, if both revolutionary and evolutionary processes cannot coexist, I doubt the project will have much of a future at all.


I agree with this as well; but from what I have observed, open source projects
do best when they do not try to go off in divergent directions at the same
time. If we cannot agree on a consistent architecture direction, then I don't
think we will succeed. If we can and we stay focussed, then we will.  As I said
above, if others agree with the approach that you want to take, then that is
the direction that the project will go.  I am interested in the opinions of
Tim, Robert and the rest of the team.

Phil


-Mark



--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]




__________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to