On Thu, 9 Feb 2012, mails wrote:

I need to analyse a data matrix with dimensions of 30x100. Before
analysing the data there is, however, a need to remove outliers from the
data. I read quite a lot about outlier removal already and I think the
most common technique for that seems to be Principal Component Analysis
(PCA). However, I think that these technqiue is quite subjective. When is
an outlier an outlier? I uploaded an example PCA plot here:

  Those more expert than I will certainly provide answers. What I do will
new data is create box-and-whisker plots (I use the lattice package) which
defines outliers as those data beyond 1.5x the first or third quartile
values.

  No one but you can answer your question on when an outlier is an outlier.
It depends on your data set and the context of the data. For example, a
water chemistry value that far exceeds a regulartory threshold might be
meaningful in the context of a one-off excursion (in which case it's not an
outlier but a real data point) or it might result from a handling,
instrumentation, or analytical error (in which case toss it as an outlier).

Rich

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to