Peter,
Regarding 1) I do not agree. See the following, simplified example:
x <- data.frame(ID=rep(1:2, each=4), Visit=rep(c(1:4), 2),
ptA=c(7,8,9,10,17,18,19,20), ptB=c(5,6,7,8,21,20,19,18))
In this data frame you have only 2 patients with 4 visits each, but the
correlation of ptA and ptB is in opposite direction in these 2 patients.
See the plot:
plot(ptB~ptA, x)
If you do 'cor.test(x$ptA, x$ptB)' you get a very high correlation
(0.961) and a significant p-value (0.0001356). However, doing it by patient:
xx <- x[x$ID==1,]; cor.test(xx$ptA, xx$ptB)
xx <- x[x$ID==2,]; cor.test(xx$ptA, xx$ptB)
you get 2 opposite correlation values (1 and -1). So in the instance of
patient 2 the correlation on individual level is _very_ far from the one
estimated on the whole dataset. My problem is: in what way can I
estimate the correlation between ptA and ptB taking into account the
multiple measures?
Regarding 2) This is not as much of a problem. Simplest solution is to
build a model with and without correlation and compare them with anova.
P value from anova will indicate significance of the correlation.
Regarding 3) I know of this solution - Bland & Altman paper from BMJ
1994 recommended that. I'm looking for something more sophisticated...
Best regards,
--
Michal J. Figurski, PhD
HUP, Pathology & Laboratory Medicine
Biomarker Research Laboratory
3400 Spruce St. 7 Maloney S
Philadelphia, PA 19104
tel. (215) 662-3413
On 3/24/2011 1:58 PM, Peter Langfelder wrote:
I see, so it's more of a statistics than R question. A couple thoughts:
1. The fact that 4 measurements in each single patient are possibly
highly related should not change the correlation, only the p-value.
Here's an example: generate two variables a and b
a = c(1:10);
b = sample(a) + a
cor(a,b)
[,1]
[1,] 0.4735424
cor (rep(a, 4), rep(b, 4))
[,1]
[1,] 0.4735424
Notice that the correlation of a,b, and the correlation of 4-times
repeated a with 4-times repeated b is exactly the same.
2. The calculation of a p-value is more complicated and I don't have a
good answer, but an upper bound on the p-value can be obtained by
calculating the p-value pretending that there are only 10
measurements. In the package WGCNA we have a function for that, it's
called corPvalueStudent.
3. If the 4 measurements for each patient are very similar, you could
simply average them, then proceed as if you had 10 independent
measurements.
Peter
On Thu, Mar 24, 2011 at 10:38 AM, Michal Figurski
<figur...@mail.med.upenn.edu> wrote:
Peter,
This is actually too simple - it doesn't take into account the fact that the
data were measured several times on the same subject. This is one thing I
know for sure, that one should not just lump such data together and pretend
that each point comes from a different patient...
--
Michal J. Figurski, PhD
HUP, Pathology& Laboratory Medicine
Biomarker Research Laboratory
3400 Spruce St. 7 Maloney S
Philadelphia, PA 19104
tel. (215) 662-3413
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.