Thanks for the clear example. However, if there is a bug it is only that 
t.test.formula doesn't throw an error when given the paired=TRUE option.

The documentation says "The formula interface is only applicable for the 2-sample 
tests.",  but there probably should be an explicit check -- I didn't see that in the 
documentation until I realized that t.test.formula couldn't work for paired tests because 
you don't tell it which observations are paired.

   -thomas


On Fri, 13 Aug 2010, Johannes W. Dietrich wrote:

Hello all,

due to unexplained differences between statistical results from collaborators and our lab that arose in the same large proteomics dataset we reevaluated the t.test() function. Here, we found a weird behaviour that is also reproducible in the following small test dataset:

Suppose, we have two vectors with numbers and some missing values that refer to the same individuals and that should therefore be evaluated with a paired t-test:

 testdata.A <- c(1.15, -0.2, NA, 1, -2, -0.5, 0.1, 1.2, -1.4, 0.01);
 testdata.B <- c(1.2, 1.1, 3, -0.1, 3, 1.1, 0, 1.3, 4, NA);

Then

print(t.test(testdata.A, testdata.B, paired=TRUE, alternative="two.sided", na.action="na.pass"))

and

print(t.test(testdata.A, testdata.B, paired=TRUE, alternative="two.sided", na.action="na.exclude"))

deliver the same p value (0.1162, identical to Excel's result).

However, after combining the two vectors with

 testdata <- c(testdata.A, testdata.B);

and defining a criterion vector with

 criterion <- c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1);

(that is the type of data layout we have in our proteomics project) we get a different p-value (0.01453) with

print(t.test(testdata ~ criterion, paired=TRUE, alternative="two.sided", na.action="na.exclude")) .

The statement

print(t.test(testdata ~ criterion, paired=TRUE, alternative="two.sided", na.action="na.pass"))

however, delivers a p-value of 0.1162 again.

With

print(t.test(testdata[criterion==0], testdata[criterion==1], paired=TRUE, alternative="two.sided", na.action="na.exclude"))

that imitates the first form, we get again a p value of 0.1162.

What is the reason for the different p values? Should not all calls to t.test that exlude missing values be equivalent and therefore deliver the same results?

Excel, StatView and KaleidaGraph all display a p-value of 0.1162.

J. W. D.
--
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- Dr. Johannes W. Dietrich, M.D.
-- Laboratory XU44, Endocrine Research
-- Medical Hospital I, Bergmannsheil University Hospitals
-- Ruhr University of Bochum
-- Buerkle-de-la-Camp-Platz 1, D-44789 Bochum, NRW, Germany
-- Phone: +49:234:302-6400, Fax: +49:234:302-6403
-- eMail: "j.w.dietr...@medical-cybernetics.de"
-- WWW: http://medical-cybernetics.de
-- WWW: http://www.bergmannsheil.de
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to