Re: [R] R's Spearman
Dear Frank and Felipe, Thank you both for your replies. The code looks exactly like the formula in Japanese wikipedia which I was trying to make sense of (as it wasn't in English wikipedia). Thank you for sharing your code with, Felipe! And the clarification, Frank. Knowing many ways of calculating it helps understanding it...thanks to both of you! Ray Frank E Harrell Jr wrote: > Mendiburu, Felipe (CIP) wrote: >> Dear Ray, >> >> The R's Spearman calculated by R is correct for ties or nonties, >> which is not correct is the probability for the case of ties. I send >> to you formulates it for the correlation with ties, that is equal to R. >> Regards, >> >> Felipe de Mendiburu >> Statistician > > Just use midranks for ties (as with rank()) and compute the ordinary > correlation on those (see also the spearman2 and rcorr functions in > Hmisc package). No need to use complex formulas. And the t > approximation for p-values works pretty well. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R's Spearman
Mendiburu, Felipe (CIP) wrote: > Dear Ray, > > The R's Spearman calculated by R is correct for ties or nonties, which is not > correct is the probability for the case of ties. I send to you formulates it > for the correlation with ties, that is equal to R. > > Regards, > > Felipe de Mendiburu > Statistician Just use midranks for ties (as with rank()) and compute the ordinary correlation on those (see also the spearman2 and rcorr functions in Hmisc package). No need to use complex formulas. And the t approximation for p-values works pretty well. Frank Harrell > > > # Spearman correlation "rs" with ties or no ties > rs<-function(x,y) { > d<-rank(x)-rank(y) > tx<-as.numeric(table(x)) > ty<-as.numeric(table(y)) > Lx<-sum((tx^3-tx)/12) > Ly<-sum((ty^3-ty)/12) > N<-length(x) > SX2<- (N^3-N)/12 - Lx > SY2<- (N^3-N)/12 - Ly > rs<- (SX2+SY2-sum(d^2))/(2*sqrt(SX2*SY2)) > return(rs) > } > > # Aplicacion >> cor(y[,1],y[,2],method="spearman") > [1] 0.2319084 >> rs(y[,1],y[,2]) > [1] 0.2319084 > > > > -Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] Behalf Of Raymond Wan > Sent: Monday, May 28, 2007 10:29 PM > To: r-help@stat.math.ethz.ch > Subject: [R] R's Spearman > > > > Hi all, > > I am trying to figure out the formula used by R's Spearman rho (using > cor(method="spearman")) because I can't seem to get the same value as by > calculating "by hand". Perhaps I'm using "cor" wrong, but I don't know > where. Basically, I am running these commands: > > > y=read.table(file="tmp",header=TRUE,sep="\t") > > y >IQ Hours > 1 106 7 > 2 86 0 > 3 9720 > 4 11312 > 5 12012 > 6 11017 > > cor(y[1],y[2],method="spearman") >Hours > IQ 0.2319084 > > [it's an abbreviated example of one I took from Wikipedia]. I > calculated by hand (apologies if the table looks strange when pasted > into e-mail): > > IQHoursrank(IQ) rank(hours)diffdiff^2 > 110673 2 11 > 2 8601 1 00 > 3 9720 2 6-416 > 411312 5 3.5 1.52.25 > 512012 6 3.5 2.56.25 > 611017 4 5-11 > 26.5 > > rho=0.242857 > > where rho = (1 - ((6 * 26.5) / 6 * (6^2 - 1))). I kept modifying the > table and realized that the difference in result comes from ties. i.e., > if I remove the tie in rows 4 and 5, I get the same result from both cor > and calculating by hand. Perhaps I'm handling ties wrong...does anyone > know how R does it or perhaps I need to change how I'm using it? > > Thank you! > > Ray > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R's Spearman
Dear Ray, The R's Spearman calculated by R is correct for ties or nonties, which is not correct is the probability for the case of ties. I send to you formulates it for the correlation with ties, that is equal to R. Regards, Felipe de Mendiburu Statistician # Spearman correlation "rs" with ties or no ties rs<-function(x,y) { d<-rank(x)-rank(y) tx<-as.numeric(table(x)) ty<-as.numeric(table(y)) Lx<-sum((tx^3-tx)/12) Ly<-sum((ty^3-ty)/12) N<-length(x) SX2<- (N^3-N)/12 - Lx SY2<- (N^3-N)/12 - Ly rs<- (SX2+SY2-sum(d^2))/(2*sqrt(SX2*SY2)) return(rs) } # Aplicacion > cor(y[,1],y[,2],method="spearman") [1] 0.2319084 > rs(y[,1],y[,2]) [1] 0.2319084 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Raymond Wan Sent: Monday, May 28, 2007 10:29 PM To: r-help@stat.math.ethz.ch Subject: [R] R's Spearman Hi all, I am trying to figure out the formula used by R's Spearman rho (using cor(method="spearman")) because I can't seem to get the same value as by calculating "by hand". Perhaps I'm using "cor" wrong, but I don't know where. Basically, I am running these commands: > y=read.table(file="tmp",header=TRUE,sep="\t") > y IQ Hours 1 106 7 2 86 0 3 9720 4 11312 5 12012 6 11017 > cor(y[1],y[2],method="spearman") Hours IQ 0.2319084 [it's an abbreviated example of one I took from Wikipedia]. I calculated by hand (apologies if the table looks strange when pasted into e-mail): IQHoursrank(IQ) rank(hours)diffdiff^2 110673 2 11 2 8601 1 00 3 9720 2 6-416 411312 5 3.5 1.52.25 512012 6 3.5 2.56.25 611017 4 5-11 26.5 rho=0.242857 where rho = (1 - ((6 * 26.5) / 6 * (6^2 - 1))). I kept modifying the table and realized that the difference in result comes from ties. i.e., if I remove the tie in rows 4 and 5, I get the same result from both cor and calculating by hand. Perhaps I'm handling ties wrong...does anyone know how R does it or perhaps I need to change how I'm using it? Thank you! Ray __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R's Spearman
Hi, Chung-hong Chan wrote: > Hi, > > You can try with > cor.test(rank(y[1]),rank(y[2])) > Thanks for this! It didn't solve my problem, but it helped me realize that the formula I was using by hand is invalid for the tie case. I just realized that with R's cor function, the Pearson correlation of the ranks = the Spearman of the original values. I've yet to find the formula for the tied case for Spearman, but at least now I know what the problem is (the formula I was using by hand). Thanks! Ray __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R's Spearman
Hi, You can try with cor.test(rank(y[1]),rank(y[2])) On 5/29/07, Raymond Wan <[EMAIL PROTECTED]> wrote: > > Hi all, > > I am trying to figure out the formula used by R's Spearman rho (using > cor(method="spearman")) because I can't seem to get the same value as by > calculating "by hand". Perhaps I'm using "cor" wrong, but I don't know > where. Basically, I am running these commands: > > > y=read.table(file="tmp",header=TRUE,sep="\t") > > y >IQ Hours > 1 106 7 > 2 86 0 > 3 9720 > 4 11312 > 5 12012 > 6 11017 > > cor(y[1],y[2],method="spearman") >Hours > IQ 0.2319084 > > [it's an abbreviated example of one I took from Wikipedia]. I > calculated by hand (apologies if the table looks strange when pasted > into e-mail): > > IQHoursrank(IQ) rank(hours)diffdiff^2 > 110673 2 11 > 2 8601 1 00 > 3 9720 2 6-416 > 411312 5 3.5 1.52.25 > 512012 6 3.5 2.56.25 > 611017 4 5-11 > 26.5 > > rho=0.242857 > > where rho = (1 - ((6 * 26.5) / 6 * (6^2 - 1))). I kept modifying the > table and realized that the difference in result comes from ties. i.e., > if I remove the tie in rows 4 and 5, I get the same result from both cor > and calculating by hand. Perhaps I'm handling ties wrong...does anyone > know how R does it or perhaps I need to change how I'm using it? > > Thank you! > > Ray > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- "The scientists of today think deeply instead of clearly. One must be sane to think clearly, but one can think deeply and be quite insane." Nikola Tesla http://www.macgrass.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R's Spearman
Hi all, I am trying to figure out the formula used by R's Spearman rho (using cor(method="spearman")) because I can't seem to get the same value as by calculating "by hand". Perhaps I'm using "cor" wrong, but I don't know where. Basically, I am running these commands: > y=read.table(file="tmp",header=TRUE,sep="\t") > y IQ Hours 1 106 7 2 86 0 3 9720 4 11312 5 12012 6 11017 > cor(y[1],y[2],method="spearman") Hours IQ 0.2319084 [it's an abbreviated example of one I took from Wikipedia]. I calculated by hand (apologies if the table looks strange when pasted into e-mail): IQHoursrank(IQ) rank(hours)diffdiff^2 110673 2 11 2 8601 1 00 3 9720 2 6-416 411312 5 3.5 1.52.25 512012 6 3.5 2.56.25 611017 4 5-11 26.5 rho=0.242857 where rho = (1 - ((6 * 26.5) / 6 * (6^2 - 1))). I kept modifying the table and realized that the difference in result comes from ties. i.e., if I remove the tie in rows 4 and 5, I get the same result from both cor and calculating by hand. Perhaps I'm handling ties wrong...does anyone know how R does it or perhaps I need to change how I'm using it? Thank you! Ray __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.