Certainly the docs should be updated, what about a `newstatsover` function
which can return a more well thought out set of statistics? The docs could
contain a note about why `newstatsover` is preferred.

Joel

On Tue, Nov 15, 2011 at 6:26 PM, David Mertens <[email protected]>wrote:

> I second what Jarle said about changing docs instead of changing code.
>
> David
> On Nov 15, 2011 5:30 PM, "Jarle Brinchmann" <[email protected]> wrote:
>
>>
>> On 15 Nov 2011, at 23:59, Derek Lamb wrote:
>>
>> > I would like to change some of the definitions of the quantities
>> returned by statsover.  I find that either their names or their
>> calculations are not consistent with normal statistical practices.  However
>> I also know that the statistical terminology used by different communities
>> can be different, so I wanted to make sure I wasn't stepping on too many
>> toes first.  In particular:
>> >
>> > 1) the absolute deviation is given in the docs as:
>> >       ADEV = sqrt(sum( abs(x-mean(x)) )/N)
>> > with a note that "This is also called the standard deviation"
>>
>> You are totally right about this one. This has a) never been called the
>> standard deviation nor b) has the absolute deviation every been defined in
>> this way. Even the units would be wrong with this usage. There is some
>> variation in the definition of the absolute deviation and about language,
>> although it is never what you show there. The most common in my experience
>> is:
>>
>>   ADEV = Sum( |x-<x>|)/N,
>>
>> which is what you are suggesting, where <x> is the mean. Sometimes it is
>> the median instead (my personal preference). In this case it is known as
>> the average absolute deviation or the mean absolute deviation - in the
>> latter case you often find it with the acronym MAD.  There is also an even
>> more robust estimator called the median absolute deviation which is:
>>
>>   MedAD = median ( |x-<x>|)
>>
>> but I see this much less often. It could be good to have in PDL perhaps,
>> but as the name normally would be MAD it could be confusing.
>>
>> I'd suggest leaving ADEV to be the average absolute deviation above with
>> <x> to be the mean(x) which i think is exactly what you suggest. I do think
>> this has to be changed as the current implementation is plain wrong.
>>
>> > 3) We have two root-mean-square calculations, a regular parent
>> distribution divide-by-N, and a sample population divide-by-(N-1).  I'm not
>> sure why we have both of these--will a piddle ever be able to contain a
>> parent distribution?  Probably not--my definition has it taking the average
>> as the number of points goes to infinity.  If it were up to me I would
>> remove the RMS calculation so that statsover would only return 6 quantities
>> (including the PRMS) instead of 7--the difference in the two calculations
>> is negligible for large datasets, and for small datasets one should not be
>> using the RMS calculation anyway, correct?  But I worry about backwards
>> compatibility, particularly with these sorts of constructs:
>> >
>> > $rms = @{statsover($pdl)}[-1]  (that doesn't work, I can never remember
>> that syntax, but you probably get the point--the poor user is going to get
>> the ADEV instead)
>>
>> Bah, I didn't realise we had two. The sample variance is probably the
>> most sensible to keep - but note that if you know (somehow) the mean, then
>> even the sample variance is divided by N. Anyway, I think it is dodgy to
>> make significant changes here in stats - changing the docs would be my
>> preferred solution here.
>>
>>        Cheers,
>>                Jarle.
>>
>>
>> _______________________________________________
>> Perldl mailing list
>> [email protected]
>> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>>
>
> _______________________________________________
> Perldl mailing list
> [email protected]
> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>
>
_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Reply via email to