Phil Steitz wrote:


Mark R. Diggory wrote:

Phil,


Phil Steitz wrote:


I think that apply() makes sense only for DescriptiveStats impls that store values. Unless the stat is somehow listening for changes in the DescriptiveStat (an interesting idea, but not implemented today), nothing meaningful can be returned by apply(). If there are no objections, I would like to move this down from AbstractStorelessDescriptiveStatistics to AbstractDescriptiveStatistics.



StorelessDescriptiveStatistics actually "stores" the values in the windowed case and all the implemented methods.

http://jakarta.apache.org/commons/math/xref/org/apache/commons/math/stat/AbstractStorelessDescriptiveStatistics.html#271



That's the code that I don't like. It makes it look like somehow the stat is going to be "applied" in all cases, but it only makes sense when there is a window:

public double apply(UnivariateStatistic stat) {
if (storage != null) {
return stat.evaluate(storage.getValues(), storage.start(),storage.getNumElements()); } else if (stat instanceof StorelessUnivariateStatistic) {
return ((StorelessUnivariateStatistic) stat).getResult(); <-- meaningless
}




Phil, that portion isn't meaningless, I explain why:

in apply(...) the following occurs,

> return ((StorelessUnivariateStatistic) stat).getResult();

in addValue the following occurs,

n++;
min.increment(value);
...
moment.increment(value);

http://jakarta.apache.org/commons/math/xref/org/apache/commons/math/stat/StorelessDescriptiveStatisticsImpl.html#116

the state of the StorelessUnivariateStatistic is stored internally, incremented in addValue() and returned in apply(...) by calling getResult(), this is how the StorelessUnivariateStatistic is designed to function.

In the StorelessDescriptiveStatisticsImpl, two behaviors are possible

1.) Infinite window, in which case there is no storage and the whole mechanism rely's on the getResult() method and the state of the statistic is maintained within the StorelessUnivariateStatistic

2.) Finite window, in which cases there is storage, and the statistic is applied to the storage to get the result

the code for evaluating these two cases is:

if (storage != null) {
        return stat.evaluate(
                storage.getValues(),
                storage.start(),
                storage.getNumElements());
} else if (stat instanceof StorelessUnivariateStatistic) {
        return ((StorelessUnivariateStatistic) stat).getResult();
}

The problem above is that (unless I am missing something) in the else if, stat is unrelated to *this. If you feel strongly about keeping it here, we should throw or return NaN if there is no stored window.


I would protect the apply method, I wouldn't return a null because it would break the StorelessDescriptiveStatisticsImpl implementation.


As I've stated above, there is a value stored in getResult(), for instance when the DescrStat.getMin() method is called in the StorelessDescriptiveStatisticsImpl case, the internal "min" UnivariateState had "state", there is a value stored there, and it is returned by getResult().

I think your percieving apply(...) as if its to "apply" externally instantiated UnivariateStatistics, it wasn't designed for that, it was designed to polymorphically implement the evaluations for the individual internal existing statistics which are incremented in addValue(...).

As I review this stuff, I am starting to like the idea of dumping the "window" concept from the "storeless" implementations entirely, possibly eventually replacing it with exponential smoothing (which would satisfy my needs) and cleanly separating the impls that don't require storage. The capability that it provides ("rolling") is available in the "stored" impls anyway. What do we think about dumping this from the "storeless" classes and interfaces?


I would say, write "another" implementation for exponential smoothing. I think having an implementation capable of that would be great. But remember, the whole idea is that there can be many different implementations of DescriptiveStatistics that could be made, there is not one "be all, end all" implementation, you don't have to "replace" StorelessDescriptiveStatisticsImpl to write a smoothing implementation. UnivariateStatistics are there to assist by generically encapsulating the statical methods, but even they, if neccessary for smoothing, could be "extended".


-Mark

--
Mark Diggory
Software Developer
Harvard MIT Data Center
http://osprey.hmdc.harvard.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to