> -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > project.org] On Behalf Of Jahan > Sent: Tuesday, November 30, 2010 12:16 PM > To: r-help@r-project.org > Subject: [R] Outlier statistics question > > I have a statistical question. > The data sets I am working with are right-skewed so I have been > plotting the log transformations of my data. I am using a Grubbs Test > to detect outliers in the data, but I get different outcomes depending > on whether I run the test on the original data or the log(data). Here > is one of the problematic sets: > > fgf2p50=c(1.563,2.161,2.529,2.726,2.442,5.047) > stripchart(fgf2p50,vertical=TRUE) > #This next step requires you have the 'outliers' package > library(outliers) > grubbs.test(fgf2p50) > #the output says p<0.05 so 5.047 is an outlier > #Next, I run the test on the log(data) > log10=c(0.194,0.335,0.403,0.436,0.388,0.703) > grubbs.test(log10) > #output is that p>0.05 so we reject that there is an outlier. > > The question is, which outlier test do I accept? >
You may not want to "accept" either test. What do YOU mean by an outlier, and why is it important for you to detect and handle "outliers" differently? Maybe you should model the data so that the model correctly predicts or explains the so-called outlier. So, what is it that you are wanting to do? Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.