Mean (old=StatUtils, new=Mean class) old: 3505 new: 2534 old: 3395 new: 2513 old: 3385 new: 2514 old: 3385 new: 2513 old: 3405 new: 2504 old: 3395 new: 2503 old: 3405 new: 2504 old: 3405 new: 2513 old: 3385 new: 2524 old: 3385 new: 2523 old: mean=3405.0 std=36.20926830400073 new: mean=2514.5 std=10.013879257199855
Variance (old=StatUtils, new=Variance class) old: 38265 new: 40168 old: 38235 new: 40038 old: 38235 new: 40037 old: 38255 new: 40098 old: 38305 new: 40098 old: 38275 new: 40147 old: 38285 new: 40068 old: 38185 new: 39977 old: 38205 new: 39978 old: 38185 new: 39977 old: mean=38243.0 std=41.57990967870072 new: mean=40058.6 std=69.6661881961239
I've also added test that exec the same set of values and results on both evaluation and incremental methods to show that both approaches return equal results within a tolerance of 10E-12 for the provided dataset.
-Mark
Phil Steitz wrote:
Sorry, last reply got sent before I was done with it. Pls disregard and try
this....
This adds significant overhead and I do not see the value in it. The cost of the additional stack operations/object creations is significant. I ran tests comparing the previous version that does direct computations using the
double[]
arrays to the modified version and found an average of more than 6x
slowdown
using the new implementation. I did not profile memory utilization, but
that is
also a concern. Repeated tests computing the mean of a 1000 doubles 100000
times using the old and new implementations averaged 1.5 and 10.2 seconds,
resp. I do not see the need for all of this additional overhead.
If you review the code, you'll find there is no added "object creation", the static Variable objects calculate on double[] just as the Univariates did, I would have to see more substantial analysis to believe your claim. All thats going on here are that the Static StatUtil methods are delegating to individual static instances of UnivariateStatistics. These are instantiated on JVM startup like all static objects, calling a method in such an object should not require any more overhead than having the method coded directly into the static method.
If there are performance considerations, lets discuss these.
Here is what I added to StatUtils.test
double[] x = new double[1000];
for (int i = 0; i < 1000; i++) {
x[i] = (5 - i) * (i - 200);
}
long startTick = 0;
double res = 0;
for (int j = 0; j < 10; j++) { startTick = System.currentTimeMillis();
for (int i = 0; i < 100000; i++) {
res = OStatUtils.mean(x);
}
System.out.println("old: " + (System.currentTimeMillis() - startTick));
startTick = System.currentTimeMillis();
for (int i = 0; i < 100000; i++) {
res = StatUtils.mean(x);
}
System.out.println("new: " + (System.currentTimeMillis() - startTick));
The result was a mean of 10203 for the "new" and 1531.1 for the "old", with
standard deviations 81.1 and 13.4 resp. The overhead is the stack operations
and temp object creations.
I doubt (as the numerous discussions over the past week have pointed out) that what we really want to have in StatUtils is one monolithic Static class with all the implemented methods present in it. If I have misinterpreted this opinion in the group, then I'm sure there will be responses to this.
Well, I for one would prefer to have the simple computational methods in one place. I would support making the class require instantiation, however, i.e. making the methods non-static.
There was a great deal of discussion about the benefit of not having the methods implemented directly in static StatUtils because they could not be "overridden" or worked with in an Instantiable form. This approach frees the implementations up to be overridden and frees up room for alternate implementations.
As I said above, the simplest way to deal with this is to make the methods non-static.
You may have your opinions of how you would like to see the packages organized and implemented. Others in the group do have alternate opinions to yours. I for one see a strong value in individually implemented Statistics. I also have a strong vision that the framework I have been working on provides substantial benefits.
(1a.) It Allows both the storageless and storage based implementations to function behind the same interface. No matter if your calling
increment(double d)
or
evaluate(double[]...)
your working with the same algorithm.
That is true in the old implementation as well, with the core computational methods in StatUtils.
(1b.) If you wish to have alternate implementations for evaluate and increment, it is easily possible of overload theses methods in future versions of the implementations.
Just make the methods non-static and that will be possible. I am not sure, given the relative triviality of these methods, if this is really a big deal, howerver.
Phil, its clear we have very different "schools of thought" on the subject of how the library should be designed. As a developer on the project I have a right to promote my design model and interests. The architecture is something I have a strong interest in working with.
You certainly have the right to your opinions. Others also have the right to disagree with them.
Apache projects are "group" projects, If a project such a [math] cannot find community and room for multiple directions of development. If it cannot make room for alternate ideas and visions, if both revolutionary and evolutionary processes cannot coexist, I doubt the project will have much of a future at all.
I agree with this as well; but from what I have observed, open source projects do best when they do not try to go off in divergent directions at the same time. If we cannot agree on a consistent architecture direction, then I don't think we will succeed. If we can and we stay focussed, then we will. As I said above, if others agree with the approach that you want to take, then that is the direction that the project will go. I am interested in the opinions of Tim, Robert and the rest of the team.
Phil
-Mark
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
__________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]