>>>>> "PS" == Petr Savicky <savi...@cs.cas.cz> >>>>> on Sun, 10 May 2009 13:52:53 +0200 writes:
PS> On Sat, May 09, 2009 at 10:55:17PM +0200, Martin Maechler wrote: PS> [...] >> If'd revert to such a solution, >> we'd have to get back to Peter's point about the issue that >> he'd think table(.) should be more tolerant than as.character() >> about "almost equality". >> For compatibility reasons, we could also return back to the >> reasoning that useR should use {something like} >> table(signif(x, 14)) >> instead of >> table(x) >> for numeric x in "typical" cases. PS> In the released versions 2.8.1 and 2.9.0, function factor() satisfies PS> identical(as.character(factor(x)), as.character(x)) (*) PS> for all numeric x. This follows from the code (levels are computed by PS> as.character() from unmodified input values) and may be verified PS> even for the problematic cases, for example PS> x <- (0.3 + 2e-16 * c(-2,-1,1,2)) PS> factor(x) PS> # [1] 0.300000000000000 0.3 0.3 0.300000000000000 PS> # Levels: 0.300000000000000 0.3 0.3 0.300000000000000 PS> as.character(x) PS> # [1] "0.300000000000000" "0.3" "0.3" PS> # [4] "0.300000000000000" PS> identical(as.character(factor(x)), as.character(x)) PS> # [1] TRUE PS> In my opinion, it is reasonable to require that (*) be PS> preserved also in future versions of R. PS> Function as.character(x) has disadvantages. Besides of PS> the platform dependence, it also does not always perform PS> rounding needed to eliminate FP errors. Usually, PS> as.character(x) rounds to at most 15 digits, so, we get, PS> for example PS> as.character(0.1 + 0.2) # [1] "0.3" PS> as required. However, there are also exceptions, for example PS> as.character(1e19 + 1e5) # [1] "10000000000000100352" PS> Here, the number is printed exactly, so the resulting PS> string contains the FP error caused by the fact that PS> 1e19 + 1e5 has more than 53 significant digits in binary PS> representation, namely 59. PS> binary representation of 1e19 + 1e5 is PS> 1000101011000111001000110000010010001001111010011000011010100000 PS> binary representation of 10000000000000100352 is PS> 1000101011000111001000110000010010001001111010011000100000000000 PS> However, as.character(x) seems to do enough rounding for PS> most purposes, otherwise it would not be suitable as the PS> basic numeric to character conversion. If table() needs PS> factor() with a different conversion than PS> as.character(x), it may be done explicitly as discussed PS> by Martin above. PS> So, i suggest to use as.character() as the default PS> conversion in factor(), so that PS> identical(as.character(factor(x)), as.character(x)) is PS> satisfied for the default usage of factor(). PS> Of course, i appreciate, if factor() has parameters, PS> which allow better control of the underlying conversion, PS> as it is done in the current development versions. The version I have committed a few hours ago is indeed a much re-simplified version, using as.character(.) explicitly and consequently no longer providing the extra optional arguments that we have had for a couple of days. Keeping such a basic function factor() as simple as possible seems a good strategy to me. Martin Maechler ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel