On 'factor', I meant the case where 'levels' is not specified, where 'unique' is called.
> factor(c(complex(real=NaN), complex(imaginary=NaN))) [1] NaN+0i <NA> Levels: NaN+0i Look at <NA> in the result above. Yes, it happens in earlier versions of R, too. On matching both NA and NaN, another consequence is that length(unique(.)) may depend on order. Example using R devel r70604: > x0 <- c(0,1, NA, NaN); z <- outer(x0,x0, complex, length.out=1); rm(x0) > (z <- z[is.na(z)]) [1] NA NaN+ 0i NA NaN+ 1i NA NA NA NA [9] 0+NaNi 1+NaNi NA NaN+NaNi > length(print(unique(z))) [1] NA NaN+0i [1] 2 > length(print(unique(c(z[8], z[-8])))) [1] NA [1] 1 -------------------------------------------- On Mon, 23/5/16, Martin Maechler <maech...@stat.math.ethz.ch> wrote: Subject: Re: [Rd] complex NA's match(), etc: not back-compatible change proposal Cc: R-devel@r-project.org Date: Monday, 23 May, 2016, 11:06 PM >>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel@r-project.org> >>>>> on Fri, 13 May 2016 16:33:05 +0000 writes: > That, for example, complex(real=NaN) and complex(imaginary=NaN) are regarded as equal makes it possible that > length(unique(as.character(x))) > length(unique(x)) > (current code of function 'factor' doesn't expect it). Thank you, that is an interesting remark - but is already true, in [[elided Yahoo spam]] .. and of course this is because we do *print* 0+NaNi etc, i.e., we differentiate the non-NA-but-NaN complex values in formatting / printing but not in match(), unique() ... and indeed, with the 'z' example below, fz <- factor(z,z) gives a warnings about duplicated levels and gives such warnings also in current (and previous) versions of R, at least for the slightly larger z I've used in the tests/reg-tests-1c.R example. For the moment I can live with that warning, as I don't think factor()s are constructed from complex numbers "often"... and the performance of factor() in the more regular cases is important. > Yes, an argument for the behavior is that NA and NaN are of one kind. > On my system, using 32-bit R for Windows from binary from CRAN, the result of sapply(z, match, table = z) (not in current R-devel) may be different from below: > 1 2 3 4 1 3 7 8 2 4 8 12 # R 2.10.1, different from below > 1 2 3 4 1 3 7 8 2 4 8 12 # R 3.2.5, different from below interesting, thank you... and another reason why the change (currently only in R-devel) may have been a good one: More uniformity. > I noticed that, by function 'cequal' in unique.c, a complex number that has both NA and NaN matches NA and also matches NaN. >> x0 <- c(0,1, NA, NaN); z <- outer(x0,x0, complex, length.out=1); rm(x0) >> (z <- z[is.na(z)]) > [1] NA NaN+ 0i NA NaN+ 1i NA NA NA NA > [9] 0+NaNi 1+NaNi NA NaN+NaNi >> sapply(z, match, table = z[8]) > [1] 1 1 1 1 1 1 1 1 1 1 1 1 >> match(z, z[8]) > [1] 1 1 1 1 1 1 1 1 1 1 1 1 Yes, I see the same. But is n't it what we expect: All of our z[] entries has at least one NA or a NaN in its real or imaginary, and since z[8] has both, it does match with all z[]'s either because of the NA or because of the NaN in common. Hence, currently, I don't think this needs to be changed... but if there are other reasons / arguments ... Thank you again, Martin Maechler >> sessionInfo() > R Under development (unstable) (2016-05-12 r70604) > Platform: i386-w64-mingw32/i386 (32-bit) > Running under: Windows XP (build 2600) Service Pack 2 > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > attached base packages: > [1] stats graphics grDevices utils datasets methods base > ----------------- >>>>> Martin Maechler <maechler at stat.math.ethz.ch> >>>>> on Tue, 10 May 2016 16:08:39 +0200 writes: >> This is an RFC / announcement related to the 2nd part of PR#16885 >> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16885 >> about complex NA's. >> The (somewhat rare) incompatibility in R's 3.3.0 match() behavior for the >> case of complex numbers with NA & NaN's {which has been fixed for R 3.3.0 >> patched in the mean time} triggered some more comprehensive "research". >> I found that we have had a long-standing inconsistency at least between the >> documented and the real behavior. I am claiming that the documented >> behavior is desirable and hence R's current "real" behavior is bugous, and >> I am proposing to change it, in R-devel (to be 3.4.0) for now. > After the "roaring unanimous" assent (one private msg > encouraging me to go forward, no dissenting voice, hence an > "odds ratio" of +Inf in favor ;-) > I have now committed my proposal to R-devel (svn rev. 70597) and > some of us will be seeing the effect in package space within a > day or so, in the CRAN checks against R-devel (not for > bioconductor AFAIK; their checks using R-devel only when it less > than ca 6 months from release). > It's still worthwhile to discuss the issue, if you come late > to it, notably as ---paraphrasing Dirk on the R-package-devel list--- > the release of 3.4.0 is almost a year away, and so now is the > best time to tinker with the API, in other words, consider breaking > rarely used legacy APIs.. > Martin >> In help(match) we have been saying >> | Exactly what matches what is to some extent a matter of definition. >> | For all types, \code{NA} matches \code{NA} and no other value. >> | For real and complex values, \code{NaN} values are regarded >> | as matching any other \code{NaN} value, but not matching \code{NA}. >> for at least 10 years. But we don't do that at all in the >> complex case (and AFAIK never got a bug report about it). >> Also, e.g., print(.) or format(.) do simply use "NA" for all >> the different complex NA-containing numbers, where OTOH, >> non-NA NaN's { <=> !is.nan(z) & is.na(z) } >> in format() or print() do show the NaN in real and/or imaginary >> parts; for an example, look at the "format" column of the matrix >> below, after 'print(cbind' ... >> The current match()---and duplicated(), unique() which are based on the same >> C code---*do* distinguish almost all complex NA / NaN's which is >> NOT according to documentation. I have found that this is just because of >> of our hashing function for the complex case, chash() in R/src/main/unique.c, >> is bogous in the sense that it is not compatible with the above documentation >> and also not with the cequal() function (in the same file uniqu.c) for checking >> equality of complex numbers. >> As I have found,, a *simplified* version of the chash() function >> to make it compatible with cequal() does solve all the problems I've >> indicated, and the current plan is to commit that change --- after some >> discussion time, here on R-devel --- to the code base. >> My change passes 'make check-all' fine, but I'm 100% sure that there will >> be effects in package-space. ... one reason for this posting. >> As mentioned above, note that the chash() function has been in >> use for all three functions >> match() >> duplicated() >> unique() >> and the change will affect all three --- but just for the case of complex >> vectors with NA or NaN's. >> To show more, a small R session -- using my version of R-devel >> == the proposition: >> The R script ('complex-NA-short.R') for (a bit more than) the >> session is attached {{you can attach text/plain easily}}: >>> x0 <- c(0,1, NA, NaN); z <- outer(x0,x0, complex, length.out=1); rm(x0) >>> ## --- = NA_real_ but that does not exist e.g., in R 2.3.1 >>> ## similarly, '1L', '2L', .. do not exist e.g., in R 2.3.1 >>> (z <- z[is.na(z)]) >> [1] NA NaN+ 0i NA NaN+ 1i NA NA NA NA >> [9] 0+NaNi 1+NaNi NA NaN+NaNi >>> outerID <- function(x,y, ...) { ## ugly; can we get outer() to work ? >> + r <- matrix( , length(x), length(y)) >> + for(i in seq(along=x)) >> + for(j in seq(along=y)) >> + r[i,j] <- identical(z[i], z[j], ...) >> + r >> + } >>> ## Very strictly - in the sense of identical() -- these 12 complex numbers all differ: >>> ## a version that works in older versions of R, where identical() had fewer arguments! >>> outerID.picky <- function(x,y) { >> + nF <- length(formals(identical)) - 2 >> + do.call("outerID", c(list(x, y), as.list(rep(FALSE, nF)))) >> + } >>> oldR <- !exists("getRversion") || getRversion() < "3.0.0" ## << FIXME: 3.0.0 is a wild guess >>> symnum(id.z <- outerID.picky(z,z)) ## == Diagonal matrix [newer versions of R] >> [1,] | . . . . . . . . . . . >> [2,] . | . . . . . . . . . . >> [3,] . . | . . . . . . . . . >> [4,] . . . | . . . . . . . . >> [5,] . . . . | . . . . . . . >> [6,] . . . . . | . . . . . . >> [7,] . . . . . . | . . . . . >> [8,] . . . . . . . | . . . . >> [9,] . . . . . . . . | . . . >> [10,] . . . . . . . . . | . . >> [11,] . . . . . . . . . . | . >> [12,] . . . . . . . . . . . | >>> try(# for older R versions >> + stopifnot(identical(id.z, outerID(z,z)), oldR || identical(id.z, diag(12) == 1)) >> + ) >>> (mz <- match(z, z)) # currently different {NA,NaN} patterns differ - not in print()/format() _FIXME_ >> [1] 1 2 1 2 1 1 1 1 2 2 1 2 >>> zRI <- rbind(Re=Re(z), Im=Im(z)) # and see the pattern : >>> print(cbind(format = format(z), t(zRI), mz), quote=FALSE) >> format Re Im mz >> [1,] NA <NA> 0 1 >> [2,] NaN+ 0i NaN 0 2 >> [3,] NA <NA> 1 1 >> [4,] NaN+ 1i NaN 1 2 >> [5,] NA 0 <NA> 1 >> [6,] NA 1 <NA> 1 >> [7,] NA <NA> <NA> 1 >> [8,] NA NaN <NA> 1 >> [9,] 0+NaNi 0 NaN 2 >> [10,] 1+NaNi 1 NaN 2 >> [11,] NA <NA> NaN 1 >> [12,] NaN+NaNi NaN NaN 2 >>> >> ------------------------------- >> Note that 'mz <- match(z, z)' and hence the last column of the matrix above >> are very different in current R, >> distinguishing most kinds of NA / NaN against the documentation (and the >> real/numeric case). >> Martin Maechler >> R Core Team >> ### Basically a shortened version of the PR#16885 -- complex part b) >> ### of R/tests/reg-tests-1c.R >> ## b) complex 'x' with different kinds of NaN >> x0 <- c(0,1, NA, NaN); z <- outer(x0,x0, complex, length.out=1); rm(x0) >> ## --- = NA_real_ but that does not exist e.g., in R 2.3.1 >> ## similarly, '1L', '2L', .. do not exist e.g., in R 2.3.1 >> (z <- z[is.na(z)]) >> outerID <- function(x,y, ...) { ## ugly; can we get outer() to work ? >> r <- matrix( , length(x), length(y)) >> for(i in seq(along=x)) >> for(j in seq(along=y)) >> r[i,j] <- identical(z[i], z[j], ...) >> r >> } >> ## Very strictly - in the sense of identical() -- these 12 complex numbers all differ: >> ## a version that works in older versions of R, [[elided Yahoo spam]] >> outerID.picky <- function(x,y) { >> nF <- length(formals(identical)) - 2 >> do.call("outerID", c(list(x, y), as.list(rep(FALSE, nF)))) >> } >> oldR <- !exists("getRversion") || getRversion() < "3.0.0" ## << FIXME: 3.0.0 is a wild guess >> symnum(id.z <- outerID.picky(z,z)) ## == Diagonal matrix [newer versions of R] >> try(# for older R versions >> stopifnot(identical(id.z, outerID(z,z)), oldR || identical(id.z, diag(12) == 1)) >> ) >> (mz <- match(z, z)) # currently different {NA,NaN} patterns differ - not in print()/format() _FIXME_ >> zRI <- rbind(Re=Re(z), Im=Im(z)) # and see the pattern : >> print(cbind(format = format(z), t(zRI), mz), quote=FALSE) >> ## compute match(z[i], z) , for i = 1,2,..,12 : >> (m1z <- sapply(z, match, table = z)) >> ## 1 2 1 2 2 2 1 2 2 2 1 2 # R 1.2.3 (2001-04-26) >> ## 1 2 3 4 1 3 7 8 2 4 8 7 # R 1.4.1 (2002-01-30) >> ## 1 2 3 4 1 3 7 8 2 4 8 12 # R 1.5.1 (2002-06-17) >> ## 1 2 3 4 1 3 7 8 2 4 8 12 # R 1.8.1 (2003-11-21) >> ## 1 2 3 4 1 3 7 8 2 4 8 12 # R 2.0.1 (2004-11-15) >> ## 1 2 3 4 1 3 7 4 2 4 4 12 # R 2.1.1 (2005-06-20) >> ## 1 2 3 4 1 3 7 4 2 4 4 12 # R 2.3.1 (2006-06-01) >> ## 1 2 3 4 1 3 7 8 2 4 8 12 # R 2.5.1 (2007-06-27) >> ## 1 2 3 4 1 3 7 4 2 4 4 12 # R 2.10.1 (2009-12-14) >> ## 1 2 3 4 1 3 7 4 2 4 4 12 # R 3.1.1 (2014-07-10) >> ## 1 2 3 4 1 3 7 4 2 4 4 12 # R 3.2.5 -- and 3.3.0 patched >> ## 1 2 1 2 1 1 1 1 2 2 1 2 # <<-- Martin's R-devel and proposed future R >> if(!exists("anyNA", mode="function")) anyNA <- function(x) any(is.na(x)) >> stopifnot(apply(zRI, 2, anyNA)) # *all* are NA *or* NaN (or both) >> is.NA <- function(.) is.na(.) & !is.nan(.) >> (iNaN <- apply(zRI, 2, function(.) any(is.nan(.)))) >> (iNA <- apply(zRI, 2, function(.) any(is.NA (.)))) # has non-NaN NA's >> ## In Martin's version of R-devel : >> stopifnot(identical(m1z == 1, iNA), >> identical(m1z == 2, !iNA)) >> ## m1z uses match(x, *) with length(x) == 1 and failed in R 3.3.0 >> stopifnot(identical(m1z, mz)) >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel