Re: [R] Estimating correlation in multiple measures data

Michal Figurski Thu, 24 Mar 2011 13:17:06 -0700

Peter,

Regarding 1) I do not agree. See the following, simplified example:

x <- data.frame(ID=rep(1:2, each=4), Visit=rep(c(1:4), 2),ptA=c(7,8,9,10,17,18,19,20), ptB=c(5,6,7,8,21,20,19,18))

In this data frame you have only 2 patients with 4 visits each, but thecorrelation of ptA and ptB is in opposite direction in these 2 patients.See the plot:

plot(ptB~ptA, x)

If you do 'cor.test(x$ptA, x$ptB)' you get a very high correlation(0.961) and a significant p-value (0.0001356). However, doing it by patient:

xx <- x[x$ID==1,]; cor.test(xx$ptA, xx$ptB)
xx <- x[x$ID==2,]; cor.test(xx$ptA, xx$ptB)

you get 2 opposite correlation values (1 and -1). So in the instance ofpatient 2 the correlation on individual level is _very_ far from the oneestimated on the whole dataset. My problem is: in what way can Iestimate the correlation between ptA and ptB taking into account themultiple measures?

Regarding 2) This is not as much of a problem. Simplest solution is tobuild a model with and without correlation and compare them with anova.P value from anova will indicate significance of the correlation.

Regarding 3) I know of this solution - Bland & Altman paper from BMJ1994 recommended that. I'm looking for something more sophisticated...


Best regards,

--
Michal J. Figurski, PhD
HUP, Pathology & Laboratory Medicine
Biomarker Research Laboratory
3400 Spruce St. 7 Maloney S
Philadelphia, PA 19104
tel. (215) 662-3413


On 3/24/2011 1:58 PM, Peter Langfelder wrote:

I see, so it's more of a statistics than R question. A couple thoughts:

1. The fact that 4 measurements in each single patient are possibly
highly related should not change the correlation, only the p-value.
Here's an example: generate two variables a and b

a = c(1:10);
b = sample(a) + a

cor(a,b)

           [,1]
[1,] 0.4735424

cor (rep(a, 4), rep(b, 4))

           [,1]
[1,] 0.4735424

Notice that the correlation of a,b, and the correlation of 4-times
repeated a with 4-times repeated b is exactly the same.

2. The calculation of a p-value is more complicated and I don't have a
good answer, but an upper bound on the p-value can be obtained by
calculating the p-value pretending that there are only 10
measurements. In the package WGCNA we have a function for that, it's
called corPvalueStudent.

3. If the 4 measurements for each patient are very similar, you could
simply average them, then proceed as if you had 10 independent
measurements.

Peter

On Thu, Mar 24, 2011 at 10:38 AM, Michal Figurski
<figur...@mail.med.upenn.edu>  wrote:

Peter,

This is actually too simple - it doesn't take into account the fact that the
data were measured several times on the same subject. This is one thing I
know for sure, that one should not just lump such data together and pretend
that each point comes from a different patient...

--
Michal J. Figurski, PhD
HUP, Pathology&  Laboratory Medicine
Biomarker Research Laboratory
3400 Spruce St. 7 Maloney S
Philadelphia, PA 19104
tel. (215) 662-3413


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Estimating correlation in multiple measures data

Reply via email to