RE: [math] Recent commits to stat, util packages

Brent Worden Sun, 06 Jul 2003 23:56:13 -0700

>
> One more sort of philosphical point that makes me want to keep
> Univariates as
> objects with statistics as properties:  to me a Univariate is in
> fact a java
> bean.  It's state is the data that it is characterizing and its
> properties are
> the statistics describing these data.


And why can't these statistics be objects?  Objects that are smart in that
they know how to modify themselves.  Currently, univariate has all that
knowledge which will get more complex with every new statistic.  The object
approach places the responsibility of data update and computation where it
belongs, internal to the statistic.

> You are confusing strategies with implementations. The
> rootfinding framework
> exists to support multiple strategies to do rootfinding, not to support
> arbitrary numerical methods. A better analogy would be to the distribution
> framework which supports creation of different probability
> distributions.  You
> could argue that a "statistic" is as natural an abstraction as a
> probability
> distribution.  I disagree with that.  There is lots of structure in a
> probability distribution, very little in a statistic from an abstract
> standpoint.

But the simple abstractions are always the most useful.  They are more
easily adapted, reused, and understood.

> I disagree. Extending a class or adding a method to an interface
> is no harder
> than adding a new class (actually easier).  It seems ridiculous
> to me to add a
> new class for each univariate statistic that we want to support.

Funny.  You just suggested a way to support additional statistics is by
creating a new class via extension.  Yet you claim adding a new class for a
statistic is ridiculous.  Are you saying your idea is ridiculous?

> If the stats
> are going to be meaningfully integrated, they will have to be
> used/defined by
> the core univariate classes any way, unless your idea is to
> eliminate these and
> force users think about statistics one at a time instead of as part of a
> univariate statistical summary.

You can easily create a univariate class that is open-ended to the
statistics that it computes and treat them as a logical set.  One would
create a univariate and any set of statistic objects.  Then you would add
data to the univariate which would in turn pass the data to each of the
statistic objects.
The statistic objects then take that the data and update themselves.  Now we
have a univariate that can compute any statistic, either one provided by
commons-math or one created by a user, on a needs basis and not the
all-or-nothing approach.

> This may be the crux of our
> disagreement.  I
> see the statistics as natural properties of a set of data, not meaningful
> objects in their own right.

And what limits properties to be only dumb data values?  With your logic,
objects such as Calendars, Colors, and InputStream could not be used as
properties.  Currently, univariate has the responsibility of computing a
mean.  Taking that responsibility away from univariate and giving it to a
statistic object makes that object tremendously meaningful.

>
> We are always going to have to discuss what goes in to
> commons-math and what
> does not go in, regardless of how packages are organized.  For
> example, I would
> be opposed (as I suspect J, Al and Brent would be too) to adding
> a Newton's
> method solver now, since it would provide no value beyond what we
> already have.
> This has nothing to do with how the package is organized.
>
> I would like to propose the following compromise solution that
> allows the kind
> of flexibility that you want without breaking things apart as much.
>
> 1. Rename StoreUnivariate to ExtendedUnivariate and change all
> other "Store"
> names to "Extended".

Changing the name doesn't make the design any better.  Do you think if
Microsoft had named Windows, Portals, it would be a better OS?

>
> 2. Make the methods in StatUtils non-static. Continue to use
> these for basic
> computational methods shared by Univariate and ExtendedUnivariate
> implementations and for direct use by applications and elsewhere in
> commons-math. These methods do not have to be used by all Univariate
> implementation strategies.

>
> 3. Add addValues methods to Univariate that accept double[], List and
> Collection with property name and eliminate ListUnivariate and
> BeanListUnivariate.

With that you just tripled the complexity of univariate.  And as a result,
tripled the complexity of adding a statistic, tripled the likelihood of
introducing errors with each change, tripled this, tripled that.  It's in
yet obvious this is flawed?

>
> 4. Rename UnivariateImpl to SimpleUnivariate and add a
> UnivariateFactory with
> factory methods to create Simple, Extended  and whatever other sorts of
> Univariates we may define.
>
> To add new statistics or computational strategies in this
> environment, we can
> a) add to the Univariate interface if we think that they are
> really basic -- I
> think that t-based confidence interval half-width for the mean is
> a basic stat
> that is now missing

Yes.  And if you were a user, with the currently implementation, there is
nothing you could do about it but pray it'll be added in the next release of
commons-math.  However, with the object approach, you'd create a simple
statistic object that can be used with univariate and all your troubles go
away.

> for example b) add to the ExtendedUnivariate
> interface c)
> extend an existing Univariate implementation to add the new
> statistic or d)
> create a new Univariate including the new statistic or
> computational strategy.
> U

Again, you yourself labeled c and d as ridiculous when you labeled Mark's
idea of adding a class for each statistic as such.

The current univariates have encapsulated way too much responsibility
instead of delegating it to other objects.  This makes the code very
unstable as it will need to change frequently.  As I see it, the univariate
types are responsible for two things: maintaining a window of data and
computing summary statistics

I would suggest separating each of these responsibilities into separate
objects.  I would make a window policy object that knows if/when data values
should be removed when others are added and if individual data values are
accessible.  I would make a statistics strategy object that knows what
statistics to compute and how to compute them based on the window policy.
The univariate would act as a mediator between the two objects.  I like
Mark's approach, but I think I would take it a little further in terms of
abstraction by making univariate independent of the statistics its
calculating.

Brent Worden
http://www.brent.worden.org


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [math] Recent commits to stat, util packages

Reply via email to