I wonder why it is still standard practice in some circles to search for "outliers" as opposed to using robust/resistent methods.
Here is a great paper with a scientific approach to "outliers": @Article{fin06cal, author = {Finney, David J.}, title = {Calibration guidelines challenge outlier practices}, journal = The American Statistician, year = 2006, volume = 60, pages = {309-313}, annote = {anticoagulant therapy;bias;causation;ethics;objectivity;outliers;guidelines for treatment of outliers;overview of types of outliers;letter to the editor and reply 61:187 May 2007} } Frank Rich Shepard wrote > > On Thu, 9 Feb 2012, mails wrote: > >> I need to analyse a data matrix with dimensions of 30x100. Before >> analysing the data there is, however, a need to remove outliers from the >> data. I read quite a lot about outlier removal already and I think the >> most common technique for that seems to be Principal Component Analysis >> (PCA). However, I think that these technqiue is quite subjective. When is >> an outlier an outlier? I uploaded an example PCA plot here: > > Those more expert than I will certainly provide answers. What I do will > new data is create box-and-whisker plots (I use the lattice package) which > defines outliers as those data beyond 1.5x the first or third quartile > values. > > No one but you can answer your question on when an outlier is an > outlier. > It depends on your data set and the context of the data. For example, a > water chemistry value that far exceeds a regulartory threshold might be > meaningful in the context of a one-off excursion (in which case it's not > an > outlier but a real data point) or it might result from a handling, > instrumentation, or analytical error (in which case toss it as an > outlier). > > Rich > > ______________________________________________ > R-help@ mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Outlier-removal-techniques-tp4372652p4373592.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.