Re: [R] R's Spearman

2007-06-03 Thread Raymond Wan

Dear Frank and Felipe,

Thank you both for your replies.  The code looks exactly like the 
formula in Japanese wikipedia which I was trying to make sense of (as it 
wasn't in English wikipedia).  Thank you for sharing your code with, Felipe!

And the clarification, Frank.  Knowing many ways of calculating it helps 
understanding it...thanks to both of you!

Ray


Frank E Harrell Jr wrote:
> Mendiburu, Felipe (CIP) wrote:
>> Dear Ray,
>>
>> The R's Spearman calculated by R is correct for ties or nonties, 
>> which is not correct is the probability for the case of ties. I send 
>> to you formulates it for the correlation with ties, that is equal to R.
>> Regards,
>>
>> Felipe de Mendiburu
>> Statistician
>
> Just use midranks for ties (as with rank()) and compute the ordinary 
> correlation on those (see also the spearman2 and rcorr functions in 
> Hmisc package).  No need to use complex formulas.  And the t 
> approximation for p-values works pretty well.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R's Spearman

2007-05-31 Thread Frank E Harrell Jr
Mendiburu, Felipe (CIP) wrote:
> Dear Ray,
> 
> The R's Spearman calculated by R is correct for ties or nonties, which is not 
> correct is the probability for the case of ties. I send to you formulates it 
> for the correlation with ties, that is equal to R. 
> 
> Regards,
> 
> Felipe de Mendiburu
> Statistician

Just use midranks for ties (as with rank()) and compute the ordinary 
correlation on those (see also the spearman2 and rcorr functions in 
Hmisc package).  No need to use complex formulas.  And the t 
approximation for p-values works pretty well.

Frank Harrell

> 
> 
> # Spearman correlation "rs" with ties or no ties
> rs<-function(x,y) {
> d<-rank(x)-rank(y)
> tx<-as.numeric(table(x))
> ty<-as.numeric(table(y))
> Lx<-sum((tx^3-tx)/12)
> Ly<-sum((ty^3-ty)/12)
> N<-length(x)
> SX2<- (N^3-N)/12 - Lx
> SY2<- (N^3-N)/12 - Ly
> rs<- (SX2+SY2-sum(d^2))/(2*sqrt(SX2*SY2))
> return(rs)
> }
> 
> # Aplicacion
>> cor(y[,1],y[,2],method="spearman")
> [1] 0.2319084
>> rs(y[,1],y[,2])
> [1] 0.2319084
> 
> 
> 
> -Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of Raymond Wan
> Sent: Monday, May 28, 2007 10:29 PM
> To: r-help@stat.math.ethz.ch
> Subject: [R] R's Spearman
> 
> 
> 
> Hi all,
> 
> I am trying to figure out the formula used by R's Spearman rho (using 
> cor(method="spearman")) because I can't seem to get the same value as by 
> calculating "by hand".  Perhaps I'm using "cor" wrong, but I don't know 
> where.  Basically, I am running these commands:
> 
>  > y=read.table(file="tmp",header=TRUE,sep="\t")
>  > y
>IQ Hours
> 1 106 7
> 2  86 0
> 3  9720
> 4 11312
> 5 12012
> 6 11017
>  > cor(y[1],y[2],method="spearman")
>Hours
> IQ 0.2319084
> 
> [it's an abbreviated example of one I took from Wikipedia].  I 
> calculated by hand (apologies if the table looks strange when pasted 
> into e-mail):
> 
>   IQHoursrank(IQ)  rank(hours)diffdiff^2
> 110673 2 11
> 2 8601 1 00
> 3 9720   2 6-416
> 411312   5 3.5 1.52.25
> 512012   6 3.5 2.56.25
> 611017   4 5-11
>   26.5
>
>   rho=0.242857
> 
> where rho = (1 - ((6 * 26.5) / 6 * (6^2 - 1))).  I kept modifying the 
> table and realized that the difference in result comes from ties.  i.e., 
> if I remove the tie in rows 4 and 5, I get the same result from both cor 
> and calculating by hand.  Perhaps I'm handling ties wrong...does anyone 
> know how R does it or perhaps I need to change how I'm using it?
> 
> Thank you!
> 
> Ray
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R's Spearman

2007-05-31 Thread Mendiburu, Felipe \(CIP\)
Dear Ray,

The R's Spearman calculated by R is correct for ties or nonties, which is not 
correct is the probability for the case of ties. I send to you formulates it 
for the correlation with ties, that is equal to R. 

Regards,

Felipe de Mendiburu
Statistician


# Spearman correlation "rs" with ties or no ties
rs<-function(x,y) {
d<-rank(x)-rank(y)
tx<-as.numeric(table(x))
ty<-as.numeric(table(y))
Lx<-sum((tx^3-tx)/12)
Ly<-sum((ty^3-ty)/12)
N<-length(x)
SX2<- (N^3-N)/12 - Lx
SY2<- (N^3-N)/12 - Ly
rs<- (SX2+SY2-sum(d^2))/(2*sqrt(SX2*SY2))
return(rs)
}

# Aplicacion
> cor(y[,1],y[,2],method="spearman")
[1] 0.2319084
> rs(y[,1],y[,2])
[1] 0.2319084



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Raymond Wan
Sent: Monday, May 28, 2007 10:29 PM
To: r-help@stat.math.ethz.ch
Subject: [R] R's Spearman



Hi all,

I am trying to figure out the formula used by R's Spearman rho (using 
cor(method="spearman")) because I can't seem to get the same value as by 
calculating "by hand".  Perhaps I'm using "cor" wrong, but I don't know 
where.  Basically, I am running these commands:

 > y=read.table(file="tmp",header=TRUE,sep="\t")
 > y
   IQ Hours
1 106 7
2  86 0
3  9720
4 11312
5 12012
6 11017
 > cor(y[1],y[2],method="spearman")
   Hours
IQ 0.2319084

[it's an abbreviated example of one I took from Wikipedia].  I 
calculated by hand (apologies if the table looks strange when pasted 
into e-mail):

  IQHoursrank(IQ)  rank(hours)diffdiff^2
110673 2 11
2 8601 1 00
3 9720   2 6-416
411312   5 3.5 1.52.25
512012   6 3.5 2.56.25
611017   4 5-11
  26.5
   
  rho=0.242857

where rho = (1 - ((6 * 26.5) / 6 * (6^2 - 1))).  I kept modifying the 
table and realized that the difference in result comes from ties.  i.e., 
if I remove the tie in rows 4 and 5, I get the same result from both cor 
and calculating by hand.  Perhaps I'm handling ties wrong...does anyone 
know how R does it or perhaps I need to change how I'm using it?

Thank you!

Ray

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R's Spearman

2007-05-28 Thread Raymond Wan

Hi,

Chung-hong Chan wrote:
> Hi,
>
> You can try with
> cor.test(rank(y[1]),rank(y[2]))
>   

Thanks for this!  It didn't solve my problem, but it helped me realize 
that the formula I was using by hand is invalid for the tie case.  I 
just realized that with R's cor function, the Pearson correlation of the 
ranks = the Spearman of the original values.

I've yet to find the formula for the tied case for Spearman, but at 
least now I know what the problem is (the formula I was using by hand).  
Thanks!

Ray

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R's Spearman

2007-05-28 Thread Chung-hong Chan
Hi,

You can try with
cor.test(rank(y[1]),rank(y[2]))


On 5/29/07, Raymond Wan <[EMAIL PROTECTED]> wrote:
>
> Hi all,
>
> I am trying to figure out the formula used by R's Spearman rho (using
> cor(method="spearman")) because I can't seem to get the same value as by
> calculating "by hand".  Perhaps I'm using "cor" wrong, but I don't know
> where.  Basically, I am running these commands:
>
>  > y=read.table(file="tmp",header=TRUE,sep="\t")
>  > y
>IQ Hours
> 1 106 7
> 2  86 0
> 3  9720
> 4 11312
> 5 12012
> 6 11017
>  > cor(y[1],y[2],method="spearman")
>Hours
> IQ 0.2319084
>
> [it's an abbreviated example of one I took from Wikipedia].  I
> calculated by hand (apologies if the table looks strange when pasted
> into e-mail):
>
>   IQHoursrank(IQ)  rank(hours)diffdiff^2
> 110673 2 11
> 2 8601 1 00
> 3 9720   2 6-416
> 411312   5 3.5 1.52.25
> 512012   6 3.5 2.56.25
> 611017   4 5-11
>   26.5
>
>   rho=0.242857
>
> where rho = (1 - ((6 * 26.5) / 6 * (6^2 - 1))).  I kept modifying the
> table and realized that the difference in result comes from ties.  i.e.,
> if I remove the tie in rows 4 and 5, I get the same result from both cor
> and calculating by hand.  Perhaps I'm handling ties wrong...does anyone
> know how R does it or perhaps I need to change how I'm using it?
>
> Thank you!
>
> Ray
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
"The scientists of today think deeply instead of clearly. One must be
sane to think clearly, but one can think deeply and be quite insane."
Nikola Tesla
http://www.macgrass.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R's Spearman

2007-05-28 Thread Raymond Wan

Hi all,

I am trying to figure out the formula used by R's Spearman rho (using 
cor(method="spearman")) because I can't seem to get the same value as by 
calculating "by hand".  Perhaps I'm using "cor" wrong, but I don't know 
where.  Basically, I am running these commands:

 > y=read.table(file="tmp",header=TRUE,sep="\t")
 > y
   IQ Hours
1 106 7
2  86 0
3  9720
4 11312
5 12012
6 11017
 > cor(y[1],y[2],method="spearman")
   Hours
IQ 0.2319084

[it's an abbreviated example of one I took from Wikipedia].  I 
calculated by hand (apologies if the table looks strange when pasted 
into e-mail):

  IQHoursrank(IQ)  rank(hours)diffdiff^2
110673 2 11
2 8601 1 00
3 9720   2 6-416
411312   5 3.5 1.52.25
512012   6 3.5 2.56.25
611017   4 5-11
  26.5
   
  rho=0.242857

where rho = (1 - ((6 * 26.5) / 6 * (6^2 - 1))).  I kept modifying the 
table and realized that the difference in result comes from ties.  i.e., 
if I remove the tie in rows 4 and 5, I get the same result from both cor 
and calculating by hand.  Perhaps I'm handling ties wrong...does anyone 
know how R does it or perhaps I need to change how I'm using it?

Thank you!

Ray

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.