Re: [R] Bug in t.test?

Thomas Lumley Fri, 13 Aug 2010 15:36:00 -0700


Thanks for the clear example. However, if there is a bug it is only that 
t.test.formula doesn't throw an error when given the paired=TRUE option.

The documentation says "The formula interface is only applicable for the 2-sample 
tests.",  but there probably should be an explicit check -- I didn't see that in the 
documentation until I realized that t.test.formula couldn't work for paired tests because 
you don't tell it which observations are paired.

   -thomas


On Fri, 13 Aug 2010, Johannes W. Dietrich wrote:

Hello all,
due to unexplained differences between statistical results from collaboratorsand our lab that arose in the same large proteomics dataset we reevaluatedthe t.test() function. Here, we found a weird behaviour that is alsoreproducible in the following small test dataset:
Suppose, we have two vectors with numbers and some missing values that referto the same individuals and that should therefore be evaluated with a pairedt-test:
 testdata.A <- c(1.15, -0.2, NA, 1, -2, -0.5, 0.1, 1.2, -1.4, 0.01);
 testdata.B <- c(1.2, 1.1, 3, -0.1, 3, 1.1, 0, 1.3, 4, NA);
Then
print(t.test(testdata.A, testdata.B, paired=TRUE, alternative="two.sided",na.action="na.pass"))
and
print(t.test(testdata.A, testdata.B, paired=TRUE, alternative="two.sided",na.action="na.exclude"))
deliver the same p value (0.1162, identical to Excel's result).

However, after combining the two vectors with
 testdata <- c(testdata.A, testdata.B);
and defining a criterion vector with
 criterion <- c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1);
(that is the type of data layout we have in our proteomics project) we get adifferent p-value (0.01453) with
print(t.test(testdata ~ criterion, paired=TRUE, alternative="two.sided",na.action="na.exclude")) .
The statement
print(t.test(testdata ~ criterion, paired=TRUE, alternative="two.sided",na.action="na.pass"))
however, delivers a p-value of 0.1162 again.

With
print(t.test(testdata[criterion==0], testdata[criterion==1], paired=TRUE,alternative="two.sided", na.action="na.exclude"))
that imitates the first form, we get again a p value of 0.1162.
What is the reason for the different p values? Should not all calls to t.testthat exlude missing values be equivalent and therefore deliver the sameresults?
Excel, StatView and KaleidaGraph all display a p-value of 0.1162.

J. W. D.
--
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- Dr. Johannes W. Dietrich, M.D.
-- Laboratory XU44, Endocrine Research
-- Medical Hospital I, Bergmannsheil University Hospitals
-- Ruhr University of Bochum
-- Buerkle-de-la-Camp-Platz 1, D-44789 Bochum, NRW, Germany
-- Phone: +49:234:302-6400, Fax: +49:234:302-6403
-- eMail: "j.w.dietr...@medical-cybernetics.de"
-- WWW: http://medical-cybernetics.de
-- WWW: http://www.bergmannsheil.de
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Bug in t.test?

Reply via email to