Re: [R] Correlation discrepancy

2011-08-23 Thread Kohta Ishikawa
Divide by 8 leads biased estimator of covariance.
R cov function calculates unbiased estimator(divide by (sample size)-1).

Regards,
Kohta

--
View this message in context: 
http://r.789695.n4.nabble.com/Correlation-discrepancy-tp3762457p3762491.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Correlation discrepancy

2011-08-23 Thread Vincy Pyne
Dear Mr Dimitris and Mr Harding, by mistake I have typed my colleagues name 
(i.e. Ashok) while thanking you. Please excuse me for that.

Regards

Vincy

--- On Tue, 8/23/11, ted.hard...@wlandres.net  wrote:

From: ted.hard...@wlandres.net 
Subject: Re: [R] Correlation discrepancy
To: r-help@r-project.org
Cc: "Vincy Pyne" 
Received: Tuesday, August 23, 2011, 11:38 AM

In addition, something has gone wrong, Vincy, with your data x,y
between evaluating cov(x,y) and evaluating your explicit formula.

If I repeat your
 commands:

  x = c(44,46,46,47,45,43,45,44)
  y = c(44,43,41,41,46,48,44,43)
  cov(x, y)
  # [1] -2.428571

  sum((x-mean(x))*(y-mean(y)))/8
  # [1] -2.125

which has the right sign and, when changed to incorporate the
correct denomonator (n-1 = 7) as suggested by Dimitris:

  sum((x-mean(x))*(y-mean(y)))/7
  # [1] -2.428571

gives exact agreement. With regard to your second formula, this
should correspondingly be:

  sum(x*y)/7 - (mean(x)*mean(y))*8/7
  # [1] -2.428571

again agreeing exactly. Your result:

>> covariance = sum((x-mean(x))*(y-mean(y)))/8   # no of of paired
>> obs. = 8
>>
>> or
>>
>> covariance = sum(x*y)/8-(mean(x)*mean(y))
>>
>> gives
>>
>> covariance = 2.125

agrees in numerical magnitude with the "1/8"
 form, but has
the wrong sign. Or maybe you simply mis-typed "-2.125" as "2.125".

Hoping this helps,
Ted.

On 23-Aug-11 11:25:15, Dimitris Rizopoulos wrote:
> well, you don't have the correct denominator, i.e., n-1,
> with n denoting the sample size. Have a look at the *Details*
> section of the online help file for cov(), and try also
> 
> sum((x-mean(x))*(y-mean(y)))/7
> cov(x, y)
> 
> 
> I hope it helps.
> 
> Best,
> Dimitris
> 
> 
> On 8/23/2011 1:18 PM, Vincy Pyne wrote:
>> Dear R list, I have one very elementary question regrading correlation
>> between two variables.
>>
>> x = c(44,46,46,47,45,43,45,44)
>> y = c(44,43,41,41,46,48,44,43)
>>
>>> cov(x, y)
>> [1] -2.428571
>>
>> However, if I try to calculate the covariance using the formula
 as
>>
>>
>> covariance = sum((x-mean(x))*(y-mean(y)))/8       # no of of paired
>> obs. = 8
>>
>> or
>>
>> covariance = sum(x*y)/8-(mean(x)*mean(y))
>>
>> gives
>>
>> covariance = 2.125
>>
>> I am not able to figure out where I am going wrong w.r.t. the
>> covariance formula. Kindly guide.
>>
>> Regards
>>
>> Vincy
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>      [[alternative HTML version deleted]]
>>
>>
>>
>>
>> __
>> R-help@r-project.org mailing
 list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> -- 
> Dimitris Rizopoulos
> Assistant Professor
> Department of Biostatistics
> Erasmus University Medical Center
> 
> Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
> Tel: +31/(0)10/7043478
> Fax: +31/(0)10/7043014
> Web: http://www.erasmusmc.nl/biostatistiek/
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 23-Aug-11                                       Time: 12:38:36
-- XFMail
 --

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Correlation discrepancy

2011-08-23 Thread Vincy Pyne
Dear Mr. Dimitris and Mr Harding, thanks a lot for your guidance. It will be 
interesting to find out how the Excel deals with this formula. I will try it. 
Thanks again.

Regards

Ashok

--- On Tue, 8/23/11, ted.hard...@wlandres.net  wrote:

From: ted.hard...@wlandres.net 
Subject: Re: [R] Correlation discrepancy
To: r-help@r-project.org
Cc: "Vincy Pyne" 
Received: Tuesday, August 23, 2011, 11:38 AM

In addition, something has gone wrong, Vincy, with your data x,y
between evaluating cov(x,y) and evaluating your explicit formula.

If I repeat your commands:

  x = c(44,46,46,47,45,43,45,44)
  y = c(44,43,41,41,46,48,44,43)
  cov(x, y)
  # [1] -2.428571

 
 sum((x-mean(x))*(y-mean(y)))/8
  # [1] -2.125

which has the right sign and, when changed to incorporate the
correct denomonator (n-1 = 7) as suggested by Dimitris:

  sum((x-mean(x))*(y-mean(y)))/7
  # [1] -2.428571

gives exact agreement. With regard to your second formula, this
should correspondingly be:

  sum(x*y)/7 - (mean(x)*mean(y))*8/7
  # [1] -2.428571

again agreeing exactly. Your result:

>> covariance = sum((x-mean(x))*(y-mean(y)))/8   # no of of paired
>> obs. = 8
>>
>> or
>>
>> covariance = sum(x*y)/8-(mean(x)*mean(y))
>>
>> gives
>>
>> covariance = 2.125

agrees in numerical magnitude with the "1/8" form, but has
the wrong sign. Or maybe you simply mis-typed "-2.125" as "2.125".

Hoping this helps,
Ted.

On 23-Aug-11 11:25:15, Dimitris
 Rizopoulos wrote:
> well, you don't have the correct denominator, i.e., n-1,
> with n denoting the sample size. Have a look at the *Details*
> section of the online help file for cov(), and try also
> 
> sum((x-mean(x))*(y-mean(y)))/7
> cov(x, y)
> 
> 
> I hope it helps.
> 
> Best,
> Dimitris
> 
> 
> On 8/23/2011 1:18 PM, Vincy Pyne wrote:
>> Dear R list, I have one very elementary question regrading correlation
>> between two variables.
>>
>> x = c(44,46,46,47,45,43,45,44)
>> y = c(44,43,41,41,46,48,44,43)
>>
>>> cov(x, y)
>> [1] -2.428571
>>
>> However, if I try to calculate the covariance using the formula as
>>
>>
>> covariance = sum((x-mean(x))*(y-mean(y)))/8       # no of of paired
>> obs. =
 8
>>
>> or
>>
>> covariance = sum(x*y)/8-(mean(x)*mean(y))
>>
>> gives
>>
>> covariance = 2.125
>>
>> I am not able to figure out where I am going wrong w.r.t. the
>> covariance formula. Kindly guide.
>>
>> Regards
>>
>> Vincy
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>      [[alternative HTML version deleted]]
>>
>>
>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE
 do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> -- 
> Dimitris Rizopoulos
> Assistant Professor
> Department of Biostatistics
> Erasmus University Medical Center
> 
> Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
> Tel: +31/(0)10/7043478
> Fax: +31/(0)10/7043014
> Web: http://www.erasmusmc.nl/biostatistiek/
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 23-Aug-11                                       Time: 12:38:36
-- XFMail --

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Correlation discrepancy

2011-08-23 Thread Ted Harding
In addition, something has gone wrong, Vincy, with your data x,y
between evaluating cov(x,y) and evaluating your explicit formula.

If I repeat your commands:

  x = c(44,46,46,47,45,43,45,44)
  y = c(44,43,41,41,46,48,44,43)
  cov(x, y)
  # [1] -2.428571

  sum((x-mean(x))*(y-mean(y)))/8
  # [1] -2.125

which has the right sign and, when changed to incorporate the
correct denomonator (n-1 = 7) as suggested by Dimitris:

  sum((x-mean(x))*(y-mean(y)))/7
  # [1] -2.428571

gives exact agreement. With regard to your second formula, this
should correspondingly be:

  sum(x*y)/7 - (mean(x)*mean(y))*8/7
  # [1] -2.428571

again agreeing exactly. Your result:

>> covariance = sum((x-mean(x))*(y-mean(y)))/8   # no of of paired
>> obs. = 8
>>
>> or
>>
>> covariance = sum(x*y)/8-(mean(x)*mean(y))
>>
>> gives
>>
>> covariance = 2.125

agrees in numerical magnitude with the "1/8" form, but has
the wrong sign. Or maybe you simply mis-typed "-2.125" as "2.125".

Hoping this helps,
Ted.

On 23-Aug-11 11:25:15, Dimitris Rizopoulos wrote:
> well, you don't have the correct denominator, i.e., n-1,
> with n denoting the sample size. Have a look at the *Details*
> section of the online help file for cov(), and try also
> 
> sum((x-mean(x))*(y-mean(y)))/7
> cov(x, y)
> 
> 
> I hope it helps.
> 
> Best,
> Dimitris
> 
> 
> On 8/23/2011 1:18 PM, Vincy Pyne wrote:
>> Dear R list, I have one very elementary question regrading correlation
>> between two variables.
>>
>> x = c(44,46,46,47,45,43,45,44)
>> y = c(44,43,41,41,46,48,44,43)
>>
>>> cov(x, y)
>> [1] -2.428571
>>
>> However, if I try to calculate the covariance using the formula as
>>
>>
>> covariance = sum((x-mean(x))*(y-mean(y)))/8   # no of of paired
>> obs. = 8
>>
>> or
>>
>> covariance = sum(x*y)/8-(mean(x)*mean(y))
>>
>> gives
>>
>> covariance = 2.125
>>
>> I am not able to figure out where I am going wrong w.r.t. the
>> covariance formula. Kindly guide.
>>
>> Regards
>>
>> Vincy
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>  [[alternative HTML version deleted]]
>>
>>
>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> -- 
> Dimitris Rizopoulos
> Assistant Professor
> Department of Biostatistics
> Erasmus University Medical Center
> 
> Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
> Tel: +31/(0)10/7043478
> Fax: +31/(0)10/7043014
> Web: http://www.erasmusmc.nl/biostatistiek/
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 23-Aug-11   Time: 12:38:36
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Correlation discrepancy

2011-08-23 Thread Dimitris Rizopoulos
well, you don't have the correct denominator, i.e., n-1, with n denoting 
the sample size. Have a look at the *Details* section of the online help 
file for cov(), and try also


sum((x-mean(x))*(y-mean(y)))/7
cov(x, y)


I hope it helps.

Best,
Dimitris


On 8/23/2011 1:18 PM, Vincy Pyne wrote:

Dear R list, I have one very elementary question regrading correlation between 
two variables.

x = c(44,46,46,47,45,43,45,44)
y = c(44,43,41,41,46,48,44,43)


cov(x, y)

[1] -2.428571

However, if I try to calculate the covariance using the formula as


covariance = sum((x-mean(x))*(y-mean(y)))/8   # no of of paired obs. = 8

or

covariance = sum(x*y)/8-(mean(x)*mean(y))

gives

covariance = 2.125

I am not able to figure out where I am going wrong w.r.t. the covariance 
formula. Kindly guide.

Regards

Vincy












[[alternative HTML version deleted]]




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.