[Rd] What to do with a inconsistency in rank( ) that's in S+ and R ever since?

2006-10-27 Thread Jens Oehlschlägel
Dear R-developers,

I just realized that rank() behaves inconsistent if combining one of na.last in 
{TRUE|FALSE} with a ties.method in {average|random|max|min}.
The documentation suggests that e.g. with na.last=TRUE NAs are treated like the 
last (=highest) value, which obviously is not the case:

 rank(c(1,2,2,NA,NA), na.last = TRUE, ties.method = c(average, first, 
 random, max, min)[1])
[1] 1.0 2.5 2.5 4.0 5.0

I'd expect 

[1] 1.0 2.5 2.5 4.5 4.5

rather, but in fact NAs seem to be always treated ties.method = first. I have 
no idea in which situation one could desire e.g. ties.method = average except 
for NAs!?

I am aware that the prototype behaves like this and R ever since behaves like 
this, however to me this appears very unfortunate. In order not to 'break' 
existing code, what about adding ties.methods 
{NAaverage|NArandom|NAmax|NAmin} that behave consistently? 

Best regards


Jens Oehlschlägel


P.S. Please cc. me, I am not on the list


 version
   _   
platform   i386-pc-mingw32 
arch   i386
os mingw32 
system i386, mingw32   
status 
major  2   
minor  4.0 
year   2006
month  10  
day03  
svn rev39566   
language   R   
version.string R version 2.4.0 (2006-10-03)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] What to do with a inconsistency in rank() that's in S+ and R ever since?

2006-10-27 Thread Andrew Piskorski
On Fri, Oct 27, 2006 at 11:14:25AM +0200, Jens Oehlschl?gel wrote:

 rather, but in fact NAs seem to be always treated ties.method =
 first. I have no idea in which situation one could desire
 e.g. ties.method = average except for NAs!?

Interesting.  I was aware of the S-Plus vs. R difference, but I didn't
realize that it appears to be because R rank() ignores
ties.method=average for NA values.

 I am aware that the prototype behaves like this and R ever since
 behaves like this, however to me this appears very unfortunate. In
 order not to 'break' existing code, what about adding ties.methods

If you only care about ranking integers and floating point numbers,
it's pretty straghtforward to take the S-Plus implementation of
rank(), call it to my.rank(), and use it in both R and S-Plus.  (Since
the R rank() makes calls to .Internal(), you can't re-use its
implementation in S-Plus.)

Note though that the S-Plus-style my.rank() will still sort strings
differently in R than in S-Plus.  I never looked into why.

Some old notes I have on this issue:

  R and S-Plus rank() treat NAs differently (which can magnifiy other
  floating point differences):

  # S-Plus 6.2.1:# R 2.1.0:
   rank(1:5) rank(1:5)
  [1] 1 2 3 4 5  [1] 1 2 3 4 5
   rank(c(1,2,NA,4,NA))  rank(c(1,2,NA,4,NA))
  [1] 1.0 2.0 4.5 3.0 4.5[1] 1 2 4 3 5
   rank(c(1,NA,3,4,NA))  rank(c(1,NA,3,4,NA))
  [1] 1.0 4.5 2.0 3.0 4.5[1] 1 4 2 3 5
   rank(c(1,NA,3))   rank(c(1,NA,3))
  [1] 1 3 2  [1] 1 3 2
   rank(c(NA,NA,3))  rank(c(NA,NA,3))
  [1] 2.5 2.5 1.0[1] 2 3 1

-- 
Andrew Piskorski [EMAIL PROTECTED]
http://www.piskorski.com/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel