This is an RFC / announcement related to the 2nd part of PR#16885
    https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16885
about  complex NA's.

The (somewhat rare) incompatibility in R's 3.3.0 match() behavior for the
case of complex numbers with NA & NaN's {which has been fixed for R 3.3.0
patched in the mean time} triggered some more comprehensive "research".

I found that we have had a long-standing inconsistency at least between the
documented and the real behavior.  I am claiming that the documented
behavior is desirable and hence R's current "real" behavior is bugous, and
I am proposing to change it, in R-devel (to be 3.4.0) for now.

In help(match) we have been saying

 |  Exactly what matches what is to some extent a matter of definition.
 |  For all types, \code{NA} matches \code{NA} and no other value.
 |  For real and complex values, \code{NaN} values are regarded
 |  as matching any other \code{NaN} value, but not matching \code{NA}.

for at least 10 years.  But we don't do that at all in the
complex case (and AFAIK never got a bug report about it).

Also, e.g., print(.) or format(.) do simply use  "NA" for all
the different complex NA-containing numbers, where OTOH,
non-NA NaN's { <=>  !is.nan(z) & is.na(z) }
in format() or print() do show the NaN in real and/or imaginary
parts; for an example, look at the "format" column of the matrix
below, after 'print(cbind' ...

The current match()---and duplicated(), unique() which are based on the same
C code---*do* distinguish almost all complex NA / NaN's which is
NOT according to documentation. I have found that this is just because of 
of our hashing function for the complex case, chash() in R/src/main/unique.c,
is bogous in the sense that it is not compatible with the above documentation
and also not with the cequal() function (in the same file uniqu.c) for checking
equality of complex numbers.

As I have found,, a *simplified* version of the chash() function
to make it compatible with cequal() does solve all the problems I've
indicated,  and the current plan is to commit that change --- after some
discussion time, here on R-devel ---  to the code base.

My change passes  'make check-all' fine, but I'm 100% sure that there will
be effects in package-space. ... one reason for this posting.

As mentioned above, note that the chash() function has been in
use for all three functions
     match()
     duplicated()
     unique()
and the change will affect all three --- but just for the case of complex
vectors with NA or NaN's.

To show more, a small R session -- using my version of R-devel
== the proposition: 
The R script ('complex-NA-short.R') for (a bit more than) the
session is attached {{you can attach  text/plain easily}}:

> x0 <- c(0,1, NA, NaN); z <- outer(x0,x0, complex, length.out=1); rm(x0)
> ##           --- = NA_real_  but that does not exist e.g., in R 2.3.1
> ##                   similarly,  '1L', '2L', .. do not exist e.g., in R 2.3.1
> (z <- z[is.na(z)])
 [1]       NA NaN+  0i       NA NaN+  1i       NA       NA       NA       NA
 [9]   0+NaNi   1+NaNi       NA NaN+NaNi
> outerID <- function(x,y, ...) { ## ugly; can we get outer() to work ?
+     r <- matrix( , length(x), length(y))
+     for(i in seq(along=x))
+         for(j in seq(along=y))
+             r[i,j] <- identical(z[i], z[j], ...)
+     r
+ }
> ## Very strictly - in the sense of identical() -- these 12 complex numbers 
> all differ:
> ## a version that works in older versions of R, where identical() had fewer 
> arguments!
> outerID.picky <- function(x,y) {
+     nF <- length(formals(identical)) - 2
+     do.call("outerID", c(list(x, y), as.list(rep(FALSE, nF))))
+ }
> oldR <- !exists("getRversion") || getRversion() < "3.0.0" ## << FIXME: 3.0.0 
> is  a wild guess
> symnum(id.z <- outerID.picky(z,z)) ## == Diagonal matrix [newer versions of R]
                             
 [1,] | . . . . . . . . . . .
 [2,] . | . . . . . . . . . .
 [3,] . . | . . . . . . . . .
 [4,] . . . | . . . . . . . .
 [5,] . . . . | . . . . . . .
 [6,] . . . . . | . . . . . .
 [7,] . . . . . . | . . . . .
 [8,] . . . . . . . | . . . .
 [9,] . . . . . . . . | . . .
[10,] . . . . . . . . . | . .
[11,] . . . . . . . . . . | .
[12,] . . . . . . . . . . . |
> try(# for older R versions
+ stopifnot(identical(id.z, outerID(z,z)), oldR || identical(id.z, diag(12) == 
1))
+ )
> (mz <- match(z, z)) # currently different {NA,NaN} patterns differ - not in 
> print()/format() _FIXME_
 [1] 1 2 1 2 1 1 1 1 2 2 1 2
> zRI <- rbind(Re=Re(z), Im=Im(z)) # and see the pattern :
> print(cbind(format = format(z), t(zRI), mz), quote=FALSE)
      format   Re   Im   mz
 [1,]       NA <NA> 0    1 
 [2,] NaN+  0i NaN  0    2 
 [3,]       NA <NA> 1    1 
 [4,] NaN+  1i NaN  1    2 
 [5,]       NA 0    <NA> 1 
 [6,]       NA 1    <NA> 1 
 [7,]       NA <NA> <NA> 1 
 [8,]       NA NaN  <NA> 1 
 [9,]   0+NaNi 0    NaN  2 
[10,]   1+NaNi 1    NaN  2 
[11,]       NA <NA> NaN  1 
[12,] NaN+NaNi NaN  NaN  2 
>
-------------------------------
Note that 'mz <- match(z, z)' and hence the last column of the matrix above
are very different in current R, 
distinguishing most kinds of NA / NaN  against the documentation (and the
 real/numeric case).

Martin Maechler
R Core Team


### Basically a shortened version of  the PR#16885 -- complex part b)
### of  R/tests/reg-tests-1c.R

## b) complex 'x' with different kinds of NaN
x0 <- c(0,1, NA, NaN); z <- outer(x0,x0, complex, length.out=1); rm(x0)
##           --- = NA_real_  but that does not exist e.g., in R 2.3.1
##                   similarly,  '1L', '2L', .. do not exist e.g., in R 2.3.1
(z <- z[is.na(z)])
outerID <- function(x,y, ...) { ## ugly; can we get outer() to work ?
    r <- matrix( , length(x), length(y))
    for(i in seq(along=x))
        for(j in seq(along=y))
            r[i,j] <- identical(z[i], z[j], ...)
    r
}
## Very strictly - in the sense of identical() -- these 12 complex numbers all 
differ:
## a version that works in older versions of R, where identical() had fewer 
arguments!
outerID.picky <- function(x,y) {
    nF <- length(formals(identical)) - 2
    do.call("outerID", c(list(x, y), as.list(rep(FALSE, nF))))
}
oldR <- !exists("getRversion") || getRversion() < "3.0.0" ## << FIXME: 3.0.0 is 
 a wild guess
symnum(id.z <- outerID.picky(z,z)) ## == Diagonal matrix [newer versions of R]
try(# for older R versions
stopifnot(identical(id.z, outerID(z,z)), oldR || identical(id.z, diag(12) == 1))
)
(mz <- match(z, z)) # currently different {NA,NaN} patterns differ - not in 
print()/format() _FIXME_
zRI <- rbind(Re=Re(z), Im=Im(z)) # and see the pattern :
print(cbind(format = format(z), t(zRI), mz), quote=FALSE)

## compute  match(z[i], z) , for  i = 1,2,..,12  :
(m1z <- sapply(z, match, table = z))
## 1 2 1 2 2 2 1 2 2 2 1 2   # R 1.2.3  (2001-04-26)
## 1 2 3 4 1 3 7 8 2 4 8 7   # R 1.4.1  (2002-01-30)
## 1 2 3 4 1 3 7 8 2 4 8 12  # R 1.5.1  (2002-06-17)
## 1 2 3 4 1 3 7 8 2 4 8 12  # R 1.8.1  (2003-11-21)
## 1 2 3 4 1 3 7 8 2 4 8 12  # R 2.0.1  (2004-11-15)
## 1 2 3 4 1 3 7 4 2 4 4 12  # R 2.1.1  (2005-06-20)
## 1 2 3 4 1 3 7 4 2 4 4 12  # R 2.3.1  (2006-06-01)
## 1 2 3 4 1 3 7 8 2 4 8 12  # R 2.5.1  (2007-06-27)
## 1 2 3 4 1 3 7 4 2 4 4 12  # R 2.10.1 (2009-12-14)
## 1 2 3 4 1 3 7 4 2 4 4 12  # R 3.1.1  (2014-07-10)
## 1 2 3 4 1 3 7 4 2 4 4 12  # R 3.2.5 -- and 3.3.0 patched
## 1 2 1 2 1 1 1 1 2 2 1 2   # <<-- Martin's R-devel and proposed future R

if(!exists("anyNA", mode="function")) anyNA <- function(x) any(is.na(x))
stopifnot(apply(zRI, 2, anyNA)) # *all* are  NA *or* NaN (or both)
is.NA <- function(.) is.na(.) & !is.nan(.)
(iNaN <- apply(zRI, 2, function(.) any(is.nan(.))))
(iNA <-  apply(zRI, 2, function(.) any(is.NA (.)))) # has non-NaN NA's
## In Martin's version of R-devel :
stopifnot(identical(m1z == 1, iNA),
          identical(m1z == 2, !iNA))
## m1z uses match(x, *) with length(x) == 1 and failed in R 3.3.0
stopifnot(identical(m1z, mz))
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to