On Mon, 07 Jan 2013 02:29:27 +0000, Oscar Benjamin wrote: > On 7 January 2013 01:46, Steven D'Aprano > <steve+comp.lang.pyt...@pearwood.info> wrote: >> On Sun, 06 Jan 2013 19:44:08 +0000, Joseph L. Casale wrote: >> >>> I have a dataset that consists of a dict with text descriptions and >>> values that are integers. If required, I collect the values into a >>> list and create a numpy array running it through a simple routine: >>> >>> data[abs(data - mean(data)) < m * std(data)] >>> >>> where m is the number of std deviations to include. >> >> I'm not sure that this approach is statistically robust. No, let me be >> even more assertive: I'm sure that this approach is NOT statistically >> robust, and may be scientifically dubious. > > Whether or not this is "statistically robust" requires more explanation > about the OP's intention.
Not really. Statistics robustness is objectively defined, and the user's intention doesn't come into it. The mean is not a robust measure of central tendency, the median is, regardless of why you pick one or the other. There are sometimes good reasons for choosing non-robust statistics or techniques over robust ones, but some techniques are so dodgy that there is *never* a good reason for doing so. E.g. finding the line of best fit by eye, or taking more and more samples until you get a statistically significant result. Such techniques are not just non-robust in the statistical sense, but non-robust in the general sense, if not outright deceitful. -- Steven -- http://mail.python.org/mailman/listinfo/python-list