--- "Mark R. Diggory" <[EMAIL PROTECTED]> wrote: > > > > This adds > > significant overhead and I do not see the value in it. The cost of the > > additional stack operations/object creations is significant. I ran tests > > comparing the previous version that does direct computations using the > double[] > > arrays to the modified version and found an average of more than 6x > slowdown > > using the new implementation. I did not profile memory utilization, but > that is > > also a concern. Repeated tests computing the mean of a 1000 doubles 100000 > > times using the old and new implementations averaged 1.5 and 10.2 seconds, > > resp. I do not see the need for all of this additional overhead. > > > > If you review the code, you'll find there is no added "object creation", > the static Variable objects calculate on double[] just as the > Univariates did, I would have to see more substantial analysis to > believe your claim. All thats going on here are that the Static StatUtil > methods are delegating to individual static instances of > UnivariateStatistics. These are instantiated on JVM startup like all > static objects, calling a method in such an object should not require > any more overhead than having the method coded directly into the static > method.
Here is what I added to one of the methods in StatUtilsTest, after copying and renaming the old version OStatUtils: for (int j = 0; j < 10; j++) { startTick = System.currentTimeMillis(); for (int i = 0; i < 100000; i++) { res = OStatUtils.mean(x); System.out.println("old: " + (System.currentTimeMillis() - startTick)); startTick = System.currentTimeMillis(); for (int i = 0; i < 100000; i++) { res = StatUtils.mean(x); } //newStats.addValue(System.currentTimeMillis() - startTick); System.out.println("new: " + (System.currentTimeMillis() - startTick)); }for (int j = 0; j < 10; j++) { startTick = System.currentTimeMillis(); for (int i = 0; i < 100000; i++) { res = OStatUtils.mean(x); } System.out.println("old: " + (System.currentTimeMillis() - startTick)); //oldStats.addValue(System.currentTimeMillis() - startTick); startTick = System.currentTimeMillis(); for (int i = 0; i < 100000; i++) { res = StatUtils.mean(x); } //newStats.addValue(System.currentTimeMillis() - startTick); System.out.println("new: " + (System.currentTimeMillis() - startTick)); > > If there are performance considerations, lets discuss these. > > I doubt (as the numerous discussions over the past week have pointed > out) that what we really want to have in StatUtils is one monolithic > Static class with all the implemented methods present in it. If I have > misinterpreted this opinion in the group, then I'm sure there will be > responses to this. > > > I suggest that we postpone introduction of a statistical computation > framework > > until after the initial release, if needed. In any case, I would like to > keep > > StatUtils and the core UnivariateImpl small, fast and lightweight, so I > would > > like to request that the changes to these classes be rolled back. > > > I would really like to see an architecture thats more than just on flat > static class with a bunch of double[] methods in it. this is not very > useful to me. > > > If others feel that this additional infrastructure is essential, then I > just > > need to be educated. It is quite possible that I am thinking too narrowly > in > > terms of current scope and I may be missing some looming structural > problems. > > If this is the case, I am open to being educated. I just need to see a) > exactly > > why we need to add more complexity at this time and b) why breaking > univariate > > statistics into four packages and 17 classes when all we are computing is > basic > > statistics is necessary. > > > > The packages are categorical, the classes are implementations of each > statistic. The framework provides an intuitive and organized means for > others to easily implement and add statistics to the packages without > being restricted to a fascist and monolithic Univariate interface or > static StatUtils interface. > > If anything the continued conflict between our two schools of thought > shows the necessity of such an approach. Your school of thought can > retain the monolithic Interfaces for "Univariate" and "StatUtil". While > the framework can provide others with the ability to extend and expand > the library without such "heavy handed" restrictions that cripple the > extendability of the project. > > There was a great deal of discussion about the benefit of not having the > methods implemented directly in static StatUtils because they could not > be "overridden" or worked with in an Instantiable form. This approach > frees the implementations up to be overridden and frees up room for > alternate implementations. > > You may have your opinions of how you would like to see the packages > organized and implemented. Others in the group do have alternate > opinions to yours. I for one see a strong value in individually > implemented Statistics. I also have a strong vision that the framework I > have been working on provides substantial benefits. > > (1a.) It Allows both the storageless and storage based implementations > to function behind the same interface. No matter if your calling > > increment(double d) > > or > > evaluate(double[]...) > > your working with the same algorithm. > > (1b.) If you wish to have alternate implementations for evaluate and > increment, it is easily possible of overload theses methods in future > versions of the implementations. > > (2.) With individual Implementations, alternate approaches can be coded > and included for the benefit of those who have an interest in such > implementations. Thus there could be multiple versions of Variance, > based on the strategy of interest and the numerical accuracy required. > > (3.) Having the same implementations of statistics usable across all > Univariate implementations assures a standard behavior and the same > expected results no matter if your using incremental or evaluation based > approaches. > > (4.) The frame work provides a formal structure for the future growth of > the library. Knowing what a UnviariateStatistic is, and seeing the > various implementations, its obvious the route one will take to > implement future statistics of interest. > > > Phil, its clear we have very different "schools of thought" on the > subject of how the library should be designed. As a developer on the > project I have a right to promote my design model and interests. The > architecture is something I have a strong interest in working with. > > Apache projects are "group" projects, If a project such a [math] cannot > find community and room for multiple directions of development. If it > cannot make room for alternate ideas and visions, if both revolutionary > and evolutionary processes cannot coexist, I doubt the project will have > much of a future at all. > > > -Mark > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > __________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]