Sorry, last reply got sent before I was done with it. Pls disregard and try this.... > > This adds > > significant overhead and I do not see the value in it. The cost of the > > additional stack operations/object creations is significant. I ran tests > > comparing the previous version that does direct computations using the > double[] > > arrays to the modified version and found an average of more than 6x > slowdown > > using the new implementation. I did not profile memory utilization, but > that is > > also a concern. Repeated tests computing the mean of a 1000 doubles 100000 > > times using the old and new implementations averaged 1.5 and 10.2 seconds, > > resp. I do not see the need for all of this additional overhead. > > > > If you review the code, you'll find there is no added "object creation", > the static Variable objects calculate on double[] just as the > Univariates did, I would have to see more substantial analysis to > believe your claim. All thats going on here are that the Static StatUtil > methods are delegating to individual static instances of > UnivariateStatistics. These are instantiated on JVM startup like all > static objects, calling a method in such an object should not require > any more overhead than having the method coded directly into the static > method. > > If there are performance considerations, lets discuss these.
Here is what I added to StatUtils.test double[] x = new double[1000]; for (int i = 0; i < 1000; i++) { x[i] = (5 - i) * (i - 200); } long startTick = 0; double res = 0; for (int j = 0; j < 10; j++) { startTick = System.currentTimeMillis(); for (int i = 0; i < 100000; i++) { res = OStatUtils.mean(x); } System.out.println("old: " + (System.currentTimeMillis() - startTick)); startTick = System.currentTimeMillis(); for (int i = 0; i < 100000; i++) { res = StatUtils.mean(x); } System.out.println("new: " + (System.currentTimeMillis() - startTick)); The result was a mean of 10203 for the "new" and 1531.1 for the "old", with standard deviations 81.1 and 13.4 resp. The overhead is the stack operations and temp object creations. > > I doubt (as the numerous discussions over the past week have pointed > out) that what we really want to have in StatUtils is one monolithic > Static class with all the implemented methods present in it. If I have > misinterpreted this opinion in the group, then I'm sure there will be > responses to this. Well, I for one would prefer to have the simple computational methods in one place. I would support making the class require instantiation, however, i.e. making the methods non-static. > There was a great deal of discussion about the benefit of not having the > methods implemented directly in static StatUtils because they could not > be "overridden" or worked with in an Instantiable form. This approach > frees the implementations up to be overridden and frees up room for > alternate implementations. As I said above, the simplest way to deal with this is to make the methods non-static. > > You may have your opinions of how you would like to see the packages > organized and implemented. Others in the group do have alternate > opinions to yours. I for one see a strong value in individually > implemented Statistics. I also have a strong vision that the framework I > have been working on provides substantial benefits. > > (1a.) It Allows both the storageless and storage based implementations > to function behind the same interface. No matter if your calling > > increment(double d) > > or > > evaluate(double[]...) > > your working with the same algorithm. That is true in the old implementation as well, with the core computational methods in StatUtils. > > (1b.) If you wish to have alternate implementations for evaluate and > increment, it is easily possible of overload theses methods in future > versions of the implementations. Just make the methods non-static and that will be possible. I am not sure, given the relative triviality of these methods, if this is really a big deal, howerver. > > > Phil, its clear we have very different "schools of thought" on the > subject of how the library should be designed. As a developer on the > project I have a right to promote my design model and interests. The > architecture is something I have a strong interest in working with. You certainly have the right to your opinions. Others also have the right to disagree with them. > > Apache projects are "group" projects, If a project such a [math] cannot > find community and room for multiple directions of development. If it > cannot make room for alternate ideas and visions, if both revolutionary > and evolutionary processes cannot coexist, I doubt the project will have > much of a future at all. I agree with this as well; but from what I have observed, open source projects do best when they do not try to go off in divergent directions at the same time. If we cannot agree on a consistent architecture direction, then I don't think we will succeed. If we can and we stay focussed, then we will. As I said above, if others agree with the approach that you want to take, then that is the direction that the project will go. I am interested in the opinions of Tim, Robert and the rest of the team. Phil > > > -Mark > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > __________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]