I would like to change some of the definitions of the quantities returned by 
statsover.  I find that either their names or their calculations are not 
consistent with normal statistical practices.  However I also know that the 
statistical terminology used by different communities can be different, so I 
wanted to make sure I wasn't stepping on too many toes first.  In particular:

1) the absolute deviation is given in the docs as:
        ADEV = sqrt(sum( abs(x-mean(x)) )/N)
with a note that "This is also called the standard deviation"

I can find nothing that supports the sqrt in this formula or the following 
note.  The average absolute deviation is given by my edition of Bevington & 
Robinson (pg 10) (not a statistics bible, I understand, but what was on my 
shelf) and also by 
http://en.wikipedia.org/wiki/Absolute_deviation#Average_absolute_deviation as
        AADEV = sum( abs(x-mean(x)) )/N.

The Bevington & Robinson text says "the presence of the absolute value sign 
makes its use inconvenient for statistical analysis...a parameter that is 
easier to use analytically and that can be justified fairly well on theoretical 
grounds to be a more appropriate measure of the dispersion of the observations 
is the <i>standard deviation</i> \sigma."  So I would like to take out the sqrt 
of that function and remove the note about it also being called the standard 
deviation.  As a side note, this was "fixed" back in February (see SF bug 
#3185864 and this git commit) but I think the fix should have gone the other 
way (changed the docs and the other code, and left the fixed code as it was).

2) the function example gives the $prms second in the returned list and $rms 
last, but the detailed description below reverses this.  I will change the 
docs, to avoid confusion.

3) We have two root-mean-square calculations, a regular parent distribution 
divide-by-N, and a sample population divide-by-(N-1).  I'm not sure why we have 
both of these--will a piddle ever be able to contain a parent distribution?  
Probably not--my definition has it taking the average as the number of points 
goes to infinity.  If it were up to me I would remove the RMS calculation so 
that statsover would only return 6 quantities (including the PRMS) instead of 
7--the difference in the two calculations is negligible for large datasets, and 
for small datasets one should not be using the RMS calculation anyway, correct? 
 But I worry about backwards compatibility, particularly with these sorts of 
constructs:

$rms = @{statsover($pdl)}[-1]  (that doesn't work, I can never remember that 
syntax, but you probably get the point--the poor user is going to get the ADEV 
instead)

4) If we keep the RMS calculation, then I would like to append "or the standard 
deviation" to the note following its definition in the docs.

Comments welcome.

cheers,
Derek
_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Reply via email to