Re: [Offtopic] Line fitting [was Re: Numpy outlier removal]

Alan Spence Fri, 11 Jan 2013 07:48:02 -0800

On 09 Jan 2013, at 00:02:11 Steven D'Aprano <st...@pearwood.info> wrote:


> The point I keep making, that everybody seems to be ignoring, is that 
> eyeballing a line of best fit is subjective, unreliable and impossible to 
> verify. How could I check that the line you say is the "best fit" 
> actually *is* the *best fit* for the given data, given that you picked 
> that line by eye? Chances are good that if you came back to the data a 
> month later, you'd pick a different line!

It might bring more insight to the debate if you talk about parameter error and 
model error.  Steven is correct if you consider only parameter error.  However 
model error is often the main problem, and here using visual techniques might 
well improve your model selection even if it's not a real model but a visually 
based approximation to a model.  However, if you only do it by eye, you end up 
in a space which is not rigorous from a modelling perspective and other issues 
can arise from this.  Visual techniques might also help deal with outliers but 
again in an unrigorous manner.  Visual techniques can also bring in real world 
knowledge (but this is really the same point as model selection).

With regard to the original post on outliers, Steven made a lot of excellent 
points.  However there are at least two important issues which he didn't 
mention. (1) You must think carefully and hard about the outliers. For example, 
can they recur, or have actions in the real world been taken that mean they 
can't happen again?  Are they actually data errors?  How you deal with them 
might be changed by these types of consideration.  (2) It is best to fit your 
model with and without the outliers and see what impact it has on the real 
world application you're doing the analysis for.  It's also good to try more 
than one set of excluded outliers to see just how stable the results are 
depending on how many outliers you remove. If the results change much, be very 
careful how you use the results.

Alan

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: [Offtopic] Line fitting [was Re: Numpy outlier removal]

Reply via email to