> > One more sort of philosphical point that makes me want to keep > Univariates as > objects with statistics as properties: to me a Univariate is in > fact a java > bean. It's state is the data that it is characterizing and its > properties are > the statistics describing these data.
And why can't these statistics be objects? Objects that are smart in that they know how to modify themselves. Currently, univariate has all that knowledge which will get more complex with every new statistic. The object approach places the responsibility of data update and computation where it belongs, internal to the statistic. > You are confusing strategies with implementations. The > rootfinding framework > exists to support multiple strategies to do rootfinding, not to support > arbitrary numerical methods. A better analogy would be to the distribution > framework which supports creation of different probability > distributions. You > could argue that a "statistic" is as natural an abstraction as a > probability > distribution. I disagree with that. There is lots of structure in a > probability distribution, very little in a statistic from an abstract > standpoint. But the simple abstractions are always the most useful. They are more easily adapted, reused, and understood. > I disagree. Extending a class or adding a method to an interface > is no harder > than adding a new class (actually easier). It seems ridiculous > to me to add a > new class for each univariate statistic that we want to support. Funny. You just suggested a way to support additional statistics is by creating a new class via extension. Yet you claim adding a new class for a statistic is ridiculous. Are you saying your idea is ridiculous? > If the stats > are going to be meaningfully integrated, they will have to be > used/defined by > the core univariate classes any way, unless your idea is to > eliminate these and > force users think about statistics one at a time instead of as part of a > univariate statistical summary. You can easily create a univariate class that is open-ended to the statistics that it computes and treat them as a logical set. One would create a univariate and any set of statistic objects. Then you would add data to the univariate which would in turn pass the data to each of the statistic objects. The statistic objects then take that the data and update themselves. Now we have a univariate that can compute any statistic, either one provided by commons-math or one created by a user, on a needs basis and not the all-or-nothing approach. > This may be the crux of our > disagreement. I > see the statistics as natural properties of a set of data, not meaningful > objects in their own right. And what limits properties to be only dumb data values? With your logic, objects such as Calendars, Colors, and InputStream could not be used as properties. Currently, univariate has the responsibility of computing a mean. Taking that responsibility away from univariate and giving it to a statistic object makes that object tremendously meaningful. > > We are always going to have to discuss what goes in to > commons-math and what > does not go in, regardless of how packages are organized. For > example, I would > be opposed (as I suspect J, Al and Brent would be too) to adding > a Newton's > method solver now, since it would provide no value beyond what we > already have. > This has nothing to do with how the package is organized. > > I would like to propose the following compromise solution that > allows the kind > of flexibility that you want without breaking things apart as much. > > 1. Rename StoreUnivariate to ExtendedUnivariate and change all > other "Store" > names to "Extended". Changing the name doesn't make the design any better. Do you think if Microsoft had named Windows, Portals, it would be a better OS? > > 2. Make the methods in StatUtils non-static. Continue to use > these for basic > computational methods shared by Univariate and ExtendedUnivariate > implementations and for direct use by applications and elsewhere in > commons-math. These methods do not have to be used by all Univariate > implementation strategies. > > 3. Add addValues methods to Univariate that accept double[], List and > Collection with property name and eliminate ListUnivariate and > BeanListUnivariate. With that you just tripled the complexity of univariate. And as a result, tripled the complexity of adding a statistic, tripled the likelihood of introducing errors with each change, tripled this, tripled that. It's in yet obvious this is flawed? > > 4. Rename UnivariateImpl to SimpleUnivariate and add a > UnivariateFactory with > factory methods to create Simple, Extended and whatever other sorts of > Univariates we may define. > > To add new statistics or computational strategies in this > environment, we can > a) add to the Univariate interface if we think that they are > really basic -- I > think that t-based confidence interval half-width for the mean is > a basic stat > that is now missing Yes. And if you were a user, with the currently implementation, there is nothing you could do about it but pray it'll be added in the next release of commons-math. However, with the object approach, you'd create a simple statistic object that can be used with univariate and all your troubles go away. > for example b) add to the ExtendedUnivariate > interface c) > extend an existing Univariate implementation to add the new > statistic or d) > create a new Univariate including the new statistic or > computational strategy. > U Again, you yourself labeled c and d as ridiculous when you labeled Mark's idea of adding a class for each statistic as such. The current univariates have encapsulated way too much responsibility instead of delegating it to other objects. This makes the code very unstable as it will need to change frequently. As I see it, the univariate types are responsible for two things: maintaining a window of data and computing summary statistics I would suggest separating each of these responsibilities into separate objects. I would make a window policy object that knows if/when data values should be removed when others are added and if individual data values are accessible. I would make a statistics strategy object that knows what statistics to compute and how to compute them based on the window policy. The univariate would act as a mediator between the two objects. I like Mark's approach, but I think I would take it a little further in terms of abstraction by making univariate independent of the statistics its calculating. Brent Worden http://www.brent.worden.org --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]