Function factor() in the current development version (2009-05-22) guarantees that levels are different character strings. However, they may represent the same decimal number. The following example is derived from a posting by Stavros Macrakis in thread "Match .3 in a sequence" in March
nums <- 0.3 + 2e-16 * c(-2,-1,1,2) f <- factor(nums) levels(f) # [1] "0.300000000000000" "0.3" The levels differ in trailing zeros, but represent the same decimal number. Besides that this is not really meaningful, it may cause a problem, when using as.numeric(levels(f)). In the above case, as.numeric() works fine and maps the two levels to the same number. However, there are cases, where the difference in trailing zeros implies different values in as.numeric(levels(f)) and these values may even form a decreasing sequence although levels were constructed from an increasing sequence of numbers. Examples are platform dependent, but may be found by the following code. Tested on Intel under Linux (both with and without SSE) and also under Windows with an older version of R. for (i in 1:100000) { x <- 10^(floor(runif(1, 61, 63)) + runif(1)/2) x <- as.numeric(sprintf("%.14g", x)) eps <- 2^(floor(log2(x)) - 52) k <- round(x * c(5e-16, 1e-15) / eps) if (x > 1e62) { k <- rev( - k) } y <- x + k[1]:k[2] * eps ind <- which(diff(as.numeric(as.character(y))) < 0) for (j in ind) { u1 <- y[c(j, j+1)] u2 <- factor(u1) print(levels(u2)) print(diff(as.numeric(levels(u2)))) aux <- readline("next") } } An example of the output is [1] "1.2296427920313e+61" "1.22964279203130e+61" [1] -1.427248e+45 next [1] "1.82328862326830e+62" "1.8232886232683e+62" [1] -2.283596e+46 next The negative number in diff(as.numeric(levels(u2))) demonstrates cases, when as.numeric(levels(u2)) is decreasing. We can also see that the reason is that the two strings in levels(u2) differ in the trailing zeros. I did quite intensive search for such examples for all possible exponents (not only 61 and 62 and a week of CPU on three processors) and all the obtained examples were caused by a difference in trailing zeros. So, i believe that removing trailing zeros from the output of as.character(x) solves the problem with the reversed order in as.numeric(levels(factor(x))) entirely. A patch against R-devel_2009-05-22, which eliminates trailing zeros from as.character(x), but makes no other changes to as.character(x), is in an attachment. Using the patch, we obtain a better result also in the following. nums <- 0.3 + 2e-16 * c(-2,-1,1,2) factor(nums) # [1] 0.3 0.3 0.3 0.3 # Levels: 0.3 Petr.
--- R-devel/src/main/coerce.c 2009-04-17 17:53:35.000000000 +0200 +++ R-devel-elim-trailing/src/main/coerce.c 2009-05-23 08:39:03.914774176 +0200 @@ -294,12 +294,33 @@ else return mkChar(EncodeInteger(x, w)); } +const char *elim_trailing(const char *s, char cdec) +{ + const char *p; + char *replace; + for (p = s; *p; p++) { + if (*p == cdec) { + replace = (char *) p++; + while ('0' <= *p & *p <= '9') { + if (*(p++) != '0') { + replace = (char *) p; + } + } + while (*(replace++) = *(p++)) { + ; + } + break; + } + } + return s; +} + SEXP attribute_hidden StringFromReal(double x, int *warn) { int w, d, e; formatReal(&x, 1, &w, &d, &e, 0); if (ISNA(x)) return NA_STRING; - else return mkChar(EncodeReal(x, w, d, e, OutDec)); + else return mkChar(elim_trailing(EncodeReal(x, w, d, e, OutDec), OutDec)); } SEXP attribute_hidden StringFromComplex(Rcomplex x, int *warn)
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel