Re: Numpy outlier removal

Steven D'Aprano Sun, 06 Jan 2013 21:18:32 -0800

On Mon, 07 Jan 2013 02:29:27 +0000, Oscar Benjamin wrote:

> On 7 January 2013 01:46, Steven D'Aprano
> <steve+comp.lang.pyt...@pearwood.info> wrote:
>> On Sun, 06 Jan 2013 19:44:08 +0000, Joseph L. Casale wrote:
>>
>>> I have a dataset that consists of a dict with text descriptions and
>>> values that are integers. If required, I collect the values into a
>>> list and create a numpy array running it through a simple routine:
>>>
>>> data[abs(data - mean(data)) < m * std(data)]
>>>
>>> where m is the number of std deviations to include.
>>
>> I'm not sure that this approach is statistically robust. No, let me be
>> even more assertive: I'm sure that this approach is NOT statistically
>> robust, and may be scientifically dubious.
> 
> Whether or not this is "statistically robust" requires more explanation
> about the OP's intention.


Not really. Statistics robustness is objectively defined, and the user's 
intention doesn't come into it. The mean is not a robust measure of 
central tendency, the median is, regardless of why you pick one or the 
other.

There are sometimes good reasons for choosing non-robust statistics or 
techniques over robust ones, but some techniques are so dodgy that there 
is *never* a good reason for doing so. E.g. finding the line of best fit 
by eye, or taking more and more samples until you get a statistically 
significant result. Such techniques are not just non-robust in the 
statistical sense, but non-robust in the general sense, if not outright 
deceitful.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Numpy outlier removal

Reply via email to