Thanks for the clear example. However, if there is a bug it is only that
t.test.formula doesn't throw an error when given the paired=TRUE option.
The documentation says "The formula interface is only applicable for the 2-sample
tests.", but there probably should be an explicit check -- I didn't see that in the
documentation until I realized that t.test.formula couldn't work for paired tests because
you don't tell it which observations are paired.
-thomas
On Fri, 13 Aug 2010, Johannes W. Dietrich wrote:
Hello all,
due to unexplained differences between statistical results from collaborators
and our lab that arose in the same large proteomics dataset we reevaluated
the t.test() function. Here, we found a weird behaviour that is also
reproducible in the following small test dataset:
Suppose, we have two vectors with numbers and some missing values that refer
to the same individuals and that should therefore be evaluated with a paired
t-test:
testdata.A <- c(1.15, -0.2, NA, 1, -2, -0.5, 0.1, 1.2, -1.4, 0.01);
testdata.B <- c(1.2, 1.1, 3, -0.1, 3, 1.1, 0, 1.3, 4, NA);
Then
print(t.test(testdata.A, testdata.B, paired=TRUE, alternative="two.sided",
na.action="na.pass"))
and
print(t.test(testdata.A, testdata.B, paired=TRUE, alternative="two.sided",
na.action="na.exclude"))
deliver the same p value (0.1162, identical to Excel's result).
However, after combining the two vectors with
testdata <- c(testdata.A, testdata.B);
and defining a criterion vector with
criterion <- c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1);
(that is the type of data layout we have in our proteomics project) we get a
different p-value (0.01453) with
print(t.test(testdata ~ criterion, paired=TRUE, alternative="two.sided",
na.action="na.exclude")) .
The statement
print(t.test(testdata ~ criterion, paired=TRUE, alternative="two.sided",
na.action="na.pass"))
however, delivers a p-value of 0.1162 again.
With
print(t.test(testdata[criterion==0], testdata[criterion==1], paired=TRUE,
alternative="two.sided", na.action="na.exclude"))
that imitates the first form, we get again a p value of 0.1162.
What is the reason for the different p values? Should not all calls to t.test
that exlude missing values be equivalent and therefore deliver the same
results?
Excel, StatView and KaleidaGraph all display a p-value of 0.1162.
J. W. D.
--
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- Dr. Johannes W. Dietrich, M.D.
-- Laboratory XU44, Endocrine Research
-- Medical Hospital I, Bergmannsheil University Hospitals
-- Ruhr University of Bochum
-- Buerkle-de-la-Camp-Platz 1, D-44789 Bochum, NRW, Germany
-- Phone: +49:234:302-6400, Fax: +49:234:302-6403
-- eMail: "j.w.dietr...@medical-cybernetics.de"
-- WWW: http://medical-cybernetics.de
-- WWW: http://www.bergmannsheil.de
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.