With 'z' of length 8 below, or of length 12 previously, one may try sapply(rev(z), match, table = rev(z)) match(rev(z), rev(z))
I found that the two results were different in R devel r70604. A shorter one: > z <- complex(real = c(0,NaN,NaN), imaginary = c(NA,NA,0)) > sapply(z, match, table = z) [1] 1 1 2 > match(z, z) [1] 1 1 3 An explanation of the behavior: With normal equality, if z[2] is equal to z[1] and z[3] is not equal to z[1], z[3] is not equal to z[2]. It is not the case here with 'cequal'. However, it seems that the property is assumed in usual case of 'match'. For it, just changing 'cequal' so that a complex number that has both NA and NaN matches NA and doesn't match NaN is enough. It also makes length(unique(.)) not order-dependent. For more change, I am fine with '1 A'. -------------------------------------------- On Mon, 30/5/16, Martin Maechler <maech...@stat.math.ethz.ch> wrote: Subject: Re: [Rd] complex NA's match(), etc: not back-compatible change proposal Cc: R-devel@r-project.org Date: Monday, 30 May, 2016, 5:48 PM >>>>> Suharto Anggono >>>>> on Sat, 28 May 2016 09:34:08 +0000 writes: > On 'factor', I meant the case where 'levels' is not > specified, where 'unique' is called. I see, thank you. >> factor(c(complex(real=NaN), complex(imaginary=NaN))) > [1] NaN+0i <NA> > Levels: NaN+0i > Look at <NA> in the result above. Yes, it happens in > earlier versions of R, too. Yes; let's call this "problem 1" > On matching both NA and NaN, another consequence is that > length(unique(.)) may depend on order. > Example using R devel r70604: >> x0 <- c(0,1, NA, NaN); z <- outer(x0,x0, complex, length.out=1); rm(x0) >> (z <- z[is.na(z)]) > [1] NA NaN+ 0i NA NaN+ 1i NA NA NA NA > [9] 0+NaNi 1+NaNi NA NaN+NaNi >> length(print(unique(z))) > [1] NA NaN+0i > [1] 2 >> length(print(unique(c(z[8], z[-8])))) > [1] NA > [1] 1 > -------------------------------------------- Thank you, Suharto. I agree these are even more convincing reasons to consider changing. Let's call this ("matching both NA and NaN") "problem 2". I think we agree that the R-devel -- comparted to previous versions -- *is* consistent in its (C level) functions cequal() and chash() and also is consistent with the documentation of match()/unique()/duplicated(). Hence I think a change would have to affect all of the above, including a change of documentation. Also, resolution of "problem 1" and "problem 2" are related, but --I think-- almost separate. For the following, let's use a vector notation for complex numbers, say (a, b) :== complex(real = a, imaginary = b) With R (showing relevant examples): ##------------------------------------------------------------------------------ options(width = max(85, getOption("width"))) # so 'z' prints in one line p.z <- function(z) print(noquote(paste0("(",Re(z),",",Im(z),")"))) z <- c(1,NA,NaN); z <- outer(z,z, complex, length.out=1); (z <- z[is.na(z)]) ## NA NaN+ 1i NA NA NA 1+NaNi NA NaN+NaNi p.z(z) ## (NA,1) (NaN,1) (1,NA) (NA,NA) (NaN,NA) (1,NaN) (NA,NaN) (NaN,NaN) length(p.z(unique(z[ 1:8 ]))) ## [1] (NA,1) (NaN,1) ## [1] 2 length(p.z(unique(z[ c(8,1:7) ]))) ## [1] (NaN,NaN) (NA,1) ## [1] 2 length(p.z(unique(z[ c(7:8,1:6) ]))) ## [1] (NA,NaN) ## [1] 1 ##------------------------------------------------------------------------------ Problem 1: To me, at the moment, it would seem most "natural" to consider a change where the match()/unique()/duplicated() behavior matched the behavior of print()/format()/as.character() for such complex vectors. I think this would automatically solve the issue that sometimes length(unique(as.character(x))) > length(unique(x)) The are principally two solutions to this: A: change match()/unique()/duplicated() B: change print()/format()/as.character() For A -- which seems "less disruptive" and more desirable to me -- we would have to change cequal() {and chash()!} and say that complex numbers with NA|NaN "match" if they have any NA, but otherwise, both the regular (r,i) and the NaN must be at the exact same places (and *different* NaNs should match, of course). Problem 2: unique(z[i]) depends on the permutation 'i' What should a change be here ... notably after the "proposed" (rather only "considered") change '1 A' above ? Can "the" new behavior easily be described in words (if '1 A' above is already assumed)? At the moment, I would not tackle Problem 2. It would become less problematic once Problem 1 is solved according to '1 A', because it least length(unique(.)) would not change: It would contain *one* z[] with an NA, and all the other z[]s. Opinions ? Thank you in advance for chiming in.. Martin Maechler, ETH Zurich > On Mon, 23/5/16, Martin Maechler <maech...@stat.math.ethz.ch> wrote: > Subject: Re: [Rd] complex NA's match(), etc: not back-compatible change proposal > Cc: R-devel@r-project.org > Date: Monday, 23 May, 2016, 11:06 PM >>>>>> > Suharto Anggono Suharto Anggono via R-devel <r-devel@r-project.org> >>>>>> on Fri, 13 > May 2016 16:33:05 +0000 writes: > > That, for example, complex(real=NaN) > and complex(imaginary=NaN) are regarded as equal makes it > possible that > > > length(unique(as.character(x))) > length(unique(x)) > > (current code of > function 'factor' doesn't expect it). > Thank you, that is an > interesting remark - but is already true, > in > [[elided Yahoo spam]] > .. > and of course this is because we do > *print* 0+NaNi etc, > i.e., we > differentiate the non-NA-but-NaN complex values in > formatting / printing but not in match(), > unique() ... > and indeed, > with the 'z' example below, > > fz <- factor(z,z) > gives a warnings about > duplicated levels and gives such warnings > also in current (and previous) versions of R, > at least for the slightly > larger z > I've used in the tests/reg-tests-1c.R example. > For the moment I can live with > that warning, as I don't think > factor()s > are constructed from complex numbers "often"... > and the performance of factor() in the more > regular cases is important. >> Yes, an argument for the behavior is that > NA and NaN are of one kind. >> On my > system, using 32-bit R for Windows from binary from CRAN, > the result of sapply(z, match, table = z) (not in current > R-devel) may be different from below: > >> 1 2 3 4 1 3 7 8 2 4 8 12 # R 2.10.1, different from > below > > 1 2 3 4 1 3 7 8 2 4 8 12 > # R 3.2.5, different from below > interesting, thank you... and another reason > why the change > (currently only in R-devel) > may have been a good one: More uniformity. > > I noticed that, by > function 'cequal' in unique.c, a complex number that > has both NA and NaN matches NA and also matches NaN. > >> x0 <- c(0,1, > NA, NaN); z <- outer(x0,x0, complex, length.out=1); > rm(x0) > >> (z <- > z[is.na(z)]) > > [1] > NA NaN+ 0i NA NaN+ 1i > NA NA > NA NA > >> [9] 0+NaNi 1+NaNi > NA NaN+NaNi > >> sapply(z, match, table = > z[8]) > > [1] 1 1 1 1 1 1 1 1 1 1 1 > 1 > >> match(z, z[8]) > > [1] 1 1 1 1 1 1 1 1 1 1 1 1 > Yes, I see the same. But is > n't it what we expect: > All of our z[] entries has at least one NA or a > NaN in its real > or imaginary, and since z[8] > has both, it does match with all > z[]'s > either because of the NA or because of the NaN in common. > Hence, currently, I don't > think this needs to be changed... > but if > there are other reasons / arguments ... > Thank you again, > Martin > Maechler > >> sessionInfo() > > > R Under development (unstable) (2016-05-12 > r70604) > > Platform: > i386-w64-mingw32/i386 (32-bit) > > > Running under: Windows XP (build 2600) Service Pack 2 > > locale: > > [1] LC_COLLATE=English_United > States.1252 > > [2] > LC_CTYPE=English_United States.1252 > >> [3] LC_MONETARY=English_United States.1252 > > [4] LC_NUMERIC=C > > > [5] LC_TIME=English_United States.1252 > > attached base > packages: > > [1] stats > graphics grDevices utils > datasets methods base > > > ----------------- >>>>>> > Martin Maechler <maechler at stat.math.ethz.ch> >>>>>> on Tue, 10 > May 2016 16:08:39 +0200 writes: > >> This is an RFC / announcement > related to the 2nd part of PR#16885 > >>> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16885 > >> about complex NA's. > >> The (somewhat > rare) incompatibility in R's 3.3.0 match() behavior for > the > >> case of complex numbers > with NA & NaN's {which has been fixed for R 3.3.0 > >> patched in the mean time} > triggered some more comprehensive "research". > >> I found that we > have had a long-standing inconsistency at least between > the > >> documented and the real > behavior. I am claiming that the documented > >> behavior is desirable and hence > R's current "real" behavior is bugous, and > >> I am proposing to change it, in > R-devel (to be 3.4.0) for now. > > After the "roaring > unanimous" assent (one private msg > > > encouraging me to go forward, no dissenting voice, > hence an > > "odds ratio" > of +Inf in favor ;-) > > > I have now committed my proposal to R-devel (svn > rev. 70597) and > > some of us will > be seeing the effect in package space within a > > day or so, in the CRAN checks > against R-devel (not for > > > bioconductor AFAIK; their checks using R-devel only when it > less > > than ca 6 months from > release). > > > It's still worthwhile to discuss the issue, if you come > late > > to it, notably as > ---paraphrasing Dirk on the R-package-devel list--- > > the release of 3.4.0 is almost a > year away, and so now is the > > best > time to tinker with the API, in other words, consider > breaking > > rarely used legacy > APIs.. > > Martin > >>> In help(match) we have been saying > >> | Exactly > what matches what is to some extent a matter of > definition. > >> | For all > types, \code{NA} matches \code{NA} and no other value. > >> | For real and complex values, > \code{NaN} values are regarded > >>> | as matching any other \code{NaN} value, but not > matching \code{NA}. > >>> for at least 10 years. But we don't do that > at all in the > >> complex case > (and AFAIK never got a bug report about it). > >> Also, e.g., > print(.) or format(.) do simply use "NA" for > all > >> the different complex > NA-containing numbers, where OTOH, > >>> non-NA NaN's { <=> !is.nan(z) & > is.na(z) } > >> in format() or > print() do show the NaN in real and/or imaginary > >> parts; for an example, look at > the "format" column of the matrix > >> below, after > 'print(cbind' ... > >> The current match()---and > duplicated(), unique() which are based on the same > >> C code---*do* distinguish almost > all complex NA / NaN's which is > >>> NOT according to documentation. I have found that > this is just because of > >> of > our hashing function for the complex case, chash() in > R/src/main/unique.c, > >> is > bogous in the sense that it is not compatible with the above > documentation > >> and also not > with the cequal() function (in the same file uniqu.c) for > checking > >> equality of complex > numbers. > >> As > I have found,, a *simplified* version of the chash() > function > >> to make it > compatible with cequal() does solve all the problems > I've > >> indicated, and the > current plan is to commit that change --- after some > >> discussion time, here on R-devel > --- to the code base. > > >> My change passes 'make check-all' > fine, but I'm 100% sure that there will > >> be effects in package-space. ... > one reason for this posting. > >> As mentioned above, note that > the chash() function has been in > >>> use for all three functions > >>> match() > >> > duplicated() > >> unique() > >> and the change will affect all > three --- but just for the case of complex > >> vectors with NA or NaN's. > >> To show more, a > small R session -- using my version of R-devel > >> == the proposition: > >> The R script > ('complex-NA-short.R') for (a bit more than) the > >> session is attached {{you can > attach text/plain easily}}: > >>> x0 <- c(0,1, NA, NaN); z > <- outer(x0,x0, complex, length.out=1); rm(x0) > >>> ## > --- = NA_real_ but that does not exist e.g., > in R 2.3.1 > >>> ## > similarly, '1L', > '2L', .. do not exist e.g., in R 2.3.1 > >>> (z <- z[is.na(z)]) > >> [1] NA NaN+ > 0i NA NaN+ 1i NA > NA NA > NA > >> > [9] 0+NaNi 1+NaNi > NA NaN+NaNi > >>> > outerID <- function(x,y, ...) { ## ugly; can we get > outer() to work ? > >> + > r <- matrix( , length(x), length(y)) > >> + for(i in > seq(along=x)) > >> + > for(j in seq(along=y)) > >>> + r[i,j] <- > identical(z[i], z[j], ...) > >> > + r > >> + } > >>> ## Very strictly - in the > sense of identical() -- these 12 complex numbers all > differ: > >>> ## a version that > works in older versions of R, where identical() had fewer [[elided Yahoo spam]] > >>> outerID.picky > <- function(x,y) { > >> + > nF <- length(formals(identical)) - 2 > >> + > do.call("outerID", c(list(x, y), > as.list(rep(FALSE, nF)))) > >> + > } > >>> oldR <- > !exists("getRversion") || getRversion() < > "3.0.0" ## << FIXME: 3.0.0 is a wild > guess > >>> symnum(id.z <- > outerID.picky(z,z)) ## == Diagonal matrix [newer versions of > R] > > > >> [1,] | . . . . > . . . . . . . > >> [2,] . | . . . > . . . . . . . > >> [3,] . . | . . > . . . . . . . > >> [4,] . . . | . > . . . . . . . > >> [5,] . . . . | > . . . . . . . > >> [6,] . . . . . > | . . . . . . > >> [7,] . . . . . > . | . . . . . > >> [8,] . . . . . > . . | . . . . > >> [9,] . . . . . > . . . | . . . > >> [10,] . . . . . > . . . . | . . > >> [11,] . . . . . > . . . . . | . > >> [12,] . . . . . > . . . . . . | > >>> try(# for > older R versions > >> + > stopifnot(identical(id.z, outerID(z,z)), oldR || > identical(id.z, diag(12) == 1)) > >>> + ) > >>> (mz <- > match(z, z)) # currently different {NA,NaN} patterns differ > - not in print()/format() _FIXME_ > >>> [1] 1 2 1 2 1 1 1 1 2 2 1 2 > >>>> zRI <- rbind(Re=Re(z), Im=Im(z)) # and see > the pattern : > >>> > print(cbind(format = format(z), t(zRI), mz), quote=FALSE) > >> > format Re Im mz > >> [1,] NA > <NA> 0 1 > >> [2,] > NaN+ 0i NaN 0 2 > >> > [3,] NA <NA> 1 1 > >> [4,] NaN+ 1i NaN 1 2 > >> [5,] NA > 0 <NA> 1 > >> [6,] > NA 1 <NA> 1 > > >> [7,] NA <NA> <NA> > 1 > >> [8,] NA > NaN <NA> 1 > >> > [9,] 0+NaNi 0 NaN 2 > > >> [10,] 1+NaNi 1 NaN 2 > >> [11,] NA > <NA> NaN 1 > >> [12,] > NaN+NaNi NaN NaN 2 > >>> > >> > ------------------------------- > >>> Note that 'mz <- match(z, z)' and hence > the last column of the matrix above > >>> are very different in current R, > >> distinguishing most kinds of NA > / NaN against the documentation (and the > >> real/numeric case). > >> Martin > Maechler > >> R Core Team > >>> ### Basically a shortened version of the PR#16885 > -- complex part b) > >> ### of > R/tests/reg-tests-1c.R > > >> ## b) complex 'x' with different kinds > of NaN > >> x0 <- c(0,1, NA, > NaN); z <- outer(x0,x0, complex, length.out=1); rm(x0) > >> ## --- > = NA_real_ but that does not exist e.g., in R 2.3.1 > >> ## > similarly, '1L', '2L', .. do > not exist e.g., in R 2.3.1 > >> (z > <- z[is.na(z)]) > >> outerID > <- function(x,y, ...) { ## ugly; can we get outer() to > work ? > >> r <- matrix( , > length(x), length(y)) > >> for(i > in seq(along=x)) > >> for(j in > seq(along=y)) > >> r[i,j] <- > identical(z[i], z[j], ...) > >> > r > >> } > >>> ## Very strictly - in the sense of identical() -- > these 12 complex numbers all differ: > >>> ## a version that works in older versions of R, > [[elided Yahoo spam]] > >>> outerID.picky <- function(x,y) { > >> nF <- > length(formals(identical)) - 2 > >>> do.call("outerID", c(list(x, y), > as.list(rep(FALSE, nF)))) > >> > } > >> oldR <- > !exists("getRversion") || getRversion() < > "3.0.0" ## << FIXME: 3.0.0 is a wild > guess > >> symnum(id.z <- > outerID.picky(z,z)) ## == Diagonal matrix [newer versions of > R] > >> try(# for older R > versions > >> > stopifnot(identical(id.z, outerID(z,z)), oldR || > identical(id.z, diag(12) == 1)) > >>> ) > >> (mz <- match(z, > z)) # currently different {NA,NaN} patterns differ - not in > print()/format() _FIXME_ > >> zRI > <- rbind(Re=Re(z), Im=Im(z)) # and see the pattern : > >> print(cbind(format = format(z), > t(zRI), mz), quote=FALSE) > >> ## compute match(z[i], z) , > for i = 1,2,..,12 : > >> (m1z > <- sapply(z, match, table = z)) > >>> ## 1 2 1 2 2 2 1 2 2 2 1 2 # R 1.2.3 > (2001-04-26) > >> ## 1 2 3 4 1 3 7 > 8 2 4 8 7 # R 1.4.1 (2002-01-30) > >> ## 1 2 3 4 1 3 7 8 2 4 8 12 # > R 1.5.1 (2002-06-17) > >> ## 1 2 > 3 4 1 3 7 8 2 4 8 12 # R 1.8.1 (2003-11-21) > >> ## 1 2 3 4 1 3 7 8 2 4 8 12 # > R 2.0.1 (2004-11-15) > >> ## 1 2 > 3 4 1 3 7 4 2 4 4 12 # R 2.1.1 (2005-06-20) > >> ## 1 2 3 4 1 3 7 4 2 4 4 12 # > R 2.3.1 (2006-06-01) > >> ## 1 2 > 3 4 1 3 7 8 2 4 8 12 # R 2.5.1 (2007-06-27) > >> ## 1 2 3 4 1 3 7 4 2 4 4 12 # > R 2.10.1 (2009-12-14) > >> ## 1 2 > 3 4 1 3 7 4 2 4 4 12 # R 3.1.1 (2014-07-10) > >> ## 1 2 3 4 1 3 7 4 2 4 4 12 # > R 3.2.5 -- and 3.3.0 patched > >> > ## 1 2 1 2 1 1 1 1 2 2 1 2 # <<-- > Martin's R-devel and proposed future R > >> > if(!exists("anyNA", mode="function")) > anyNA <- function(x) any(is.na(x)) > >>> stopifnot(apply(zRI, 2, anyNA)) # *all* are NA > *or* NaN (or both) > >> is.NA > <- function(.) is.na(.) & !is.nan(.) > >> (iNaN <- apply(zRI, 2, > function(.) any(is.nan(.)))) > >> > (iNA <- apply(zRI, 2, function(.) any(is.NA (.)))) # > has non-NaN NA's > >> ## In > Martin's version of R-devel : > >>> stopifnot(identical(m1z == 1, iNA), > >> identical(m1z == 2, !iNA)) > >> ## m1z uses match(x, *) with > length(x) == 1 and failed in R 3.3.0 > >>> stopifnot(identical(m1z, mz)) > >>> ______________________________________________ > >> R-devel at r-project.org mailing > list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > ______________________________________________ > > R-devel@r-project.org > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel