hello,

i am a bit of a statistical neophyte and currently trying to make some sense of 
confidence intervals for correlation coefficients. i am using the cor.test() 
function. the documentation is quite terse and i am having trouble tieing up 
the output from this function with stuff that i have read in the literature. 
so, for example, i make two sequences and calculate the correlation coefficient:

> x <- runif(20)
> y <- jitter(x, amount = 0.7)
> cor(x, y)
[1] 0.5198252

now i want to establish that confidence i can attach to this value. from the 
table i retrieved from the article "Understanding Correlation" by r. j. rummel 
[online] i get that the probability of a correlation coefficient of 0.5198252 
arising by chance from two sequences of length 20 is less than 0.01. so this 
seems like i can attach some significance to the result. i still don't 
understand where the table comes from and it only goes up as far as sequences 
of length 1000. the data i am wanting to analyse has length of more than 70000, 
so i need to calculate these confidence levels myself. i assume that cor.test() 
is the way to do this. so i tried:

> cor.test(x, y, "greater", conf.level = 0.95)

        Pearson's product-moment correlation

data:  x and y 
t = 2.5816, df = 18, p-value = 0.009405
alternative hypothesis: true correlation is greater than 0 
95 percent confidence interval:
 0.1753340 1.0000000 
sample estimates:
      cor 
0.5198252 

> cor.test(x, y, "less", conf.level = 0.95)

        Pearson's product-moment correlation

data:  x and y 
t = 2.5816, df = 18, p-value = 0.9906
alternative hypothesis: true correlation is less than 0 
95 percent confidence interval:
 -1.0000000  0.7509089 
sample estimates:
      cor 
0.5198252 

> cor.test(x, y, "two.sided", conf.level = 0.95)

        Pearson's product-moment correlation

data:  x and y 
t = 2.5816, df = 18, p-value = 0.01881
alternative hypothesis: true correlation is not equal to 0 
95 percent confidence interval:
 0.1003997 0.7823738 
sample estimates:
      cor 
0.5198252

i reckon that the first invocation of the function is closest to what i am 
looking for. now the rest of the output from the function is a total mystery to 
me. could anyone please tell me:

o what is a p-value?
o how to interpret the quoted confidence interval?

i do see that as i increase the conf.level input parameter to cov.test() the 
lower bound of the confidence interval gets lower:

        0.95            ->              0.1753340 1.0000000
        0.975           ->              0.1003997 1.0000000
        0.995           ->              -0.04859184  1.00000000

does this mean that with 99.5% certainty the correlation coefficient should lie 
in the range -0.04859184 to 1.00000000? hmmm. i am doubtful. plus this doesn't 
really answer my question, which is more about what confidence i can assign to 
the measured correlation coefficient (0.5198252).

an alternative question would be: given two sequences and a calculated 
correlation coefficient, with what probability could i assert that the 
underlying processes are indeed correlated and that the calculated correlation 
coefficient does not simply arise by chance.

please forgive my ignorance. any help will be vastly appreciated. thanks!

best regards,
andrew.

----------------------------------------------------------------------
Get a free email account with anti spam protection.
http://www.bluebottle.com/tag/2

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to