Thanks everybody for the feedback.

Since the ADEV calculation is just plain wrong, I will fix the code for that, 
and the docs that go with it.  I will leave the RMS calculation as it is, but 
will change around the docs and perhaps add a note about it not making too much 
sense to use RMS, but to use PRMS instead.  Probably people use that one anyway 
out of convenience, I found snippets of my own code where I had things like 
"($mean, $rms) = stats($pdl);", which is the correct calculation to use.  I 
don't want to add a new function like Joel suggested.  I've seen APIs that have 
list of functions like 'func, func0, func1'--what a mess.  But I think adding a 
note to look for PDL::Stats for more statistical calculations would also be a 
good idea.

cheers,
Derek

On Nov 16, 2011, at 7:06 AM, Karl Glazebrook wrote:

> Agreed ADEV has to be fixed (in code). It has the wrong unit dimensions for 
> one thing
> 
> Karl
> 
> On 16/11/2011, at 10:19 AM, Chris Marshall wrote:
> 
>> Hi Derek-
>> 
>> The fix you refer to was for an inconsistent calculation
>> between the algorithm used with badvals and that used
>> without badvals.  I have the same problems with stats and
>> statsover in that the values seem to be fairly redundant
>> or unneeded for what I wanted for a "quick look" at some
>> data.  However, I'm a bit leery of changing something
>> that has been around so long.
>> 
>> --Chris
>> 
>> On Tue, Nov 15, 2011 at 5:59 PM, Derek Lamb <[email protected]> wrote:
>>> I would like to change some of the definitions of the quantities returned by
>>> statsover.  I find that either their names or their calculations are not
>>> consistent with normal statistical practices.  However I also know that the
>>> statistical terminology used by different communities can be different, so I
>>> wanted to make sure I wasn't stepping on too many toes first.  In
>>> particular:
>>> 1) the absolute deviation is given in the docs as:
>>> ADEV = sqrt(sum( abs(x-mean(x)) )/N)
>>> with a note that "This is also called the standard deviation"
>>> I can find nothing that supports the sqrt in this formula or the following
>>> note.  The average absolute deviation is given by my edition of Bevington &
>>> Robinson (pg 10) (not a statistics bible, I understand, but what was on my
>>> shelf) and also
>>> by 
>>> http://en.wikipedia.org/wiki/Absolute_deviation#Average_absolute_deviation
>>> as
>>> AADEV = sum( abs(x-mean(x)) )/N.
>>> The Bevington & Robinson text says "the presence of the absolute value sign
>>> makes its use inconvenient for statistical analysis...a parameter that is
>>> easier to use analytically and that can be justified fairly well on
>>> theoretical grounds to be a more appropriate measure of the dispersion of
>>> the observations is the <i>standard deviation</i> \sigma."  So I would like
>>> to take out the sqrt of that function and remove the note about it also
>>> being called the standard deviation.  As a side note, this was "fixed" back
>>> in February (see SF bug #3185864 and this git commit) but I think the fix
>>> should have gone the other way (changed the docs and the other code, and
>>> left the fixed code as it was).
>>> 2) the function example gives the $prms second in the returned list and $rms
>>> last, but the detailed description below reverses this.  I will change the
>>> docs, to avoid confusion.
>>> 3) We have two root-mean-square calculations, a regular parent distribution
>>> divide-by-N, and a sample population divide-by-(N-1).  I'm not sure why we
>>> have both of these--will a piddle ever be able to contain a parent
>>> distribution?  Probably not--my definition has it taking the average as the
>>> number of points goes to infinity.  If it were up to me I would remove the
>>> RMS calculation so that statsover would only return 6 quantities (including
>>> the PRMS) instead of 7--the difference in the two calculations is negligible
>>> for large datasets, and for small datasets one should not be using the RMS
>>> calculation anyway, correct?  But I worry about backwards compatibility,
>>> particularly with these sorts of constructs:
>>> $rms = @{statsover($pdl)}[-1]  (that doesn't work, I can never remember that
>>> syntax, but you probably get the point--the poor user is going to get the
>>> ADEV instead)
>>> 4) If we keep the RMS calculation, then I would like to append "or the
>>> standard deviation" to the note following its definition in the docs.
>>> Comments welcome.
>>> cheers,
>>> Derek
>>> _______________________________________________
>>> Perldl mailing list
>>> [email protected]
>>> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>>> 
>>> 
>> 
>> _______________________________________________
>> Perldl mailing list
>> [email protected]
>> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
> 


_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Reply via email to