On Mon, Mar 16, 2009 at 07:39:23PM -0400, Stavros Macrakis wrote: ... > Let's look at the extraordinarily poor behavior I was mentioning. Consider: > > nums <- (.3 + 2e-16 * c(-2,-1,1,2)); nums > [1] 0.3 0.3 0.3 0.3 > > Though they all print as .3 with the default precision (which is > normal and expected), they are all different from .3: > > nums - .3 => -3.885781e-16 -2.220446e-16 2.220446e-16 3.885781e-16 > > When we convert nums to a factor, we get: > > fact <- as.factor(nums); fact > [1] 0.300000000000000 0.3 0.3 0.300000000000000 > Levels: 0.300000000000000 0.3 0.3 0.300000000000000 > > Not clear what the difference between 0.300000000000000 and 0.3 is > supposed to be, nor why some 0.300000000000000 are < .3 and others are ...
When creating a factor from numeric vector, the list of levels and the assignment of original elements to the levels is done using double precision. Since the four elements in the vector are distinct, we get four distinct levels. After this is done, the levels attribute is formed using as.character(). This can map different numbers to the same string, so in the example above, this leads to a factor, which contains repeated levels. This part of the problem may be avoided using fact <- as.factor(as.character(nums)); fact [1] 0.300000000000000 0.3 0.3 0.300000000000000 Levels: 0.3 0.300000000000000 The reason for having 0.300000000000000 and 0.3 is that as.character() works the same as printing with digits=15. The R printing mechanism works in two steps. In the first step it tries to determine the shortest format needed to achieve the required relative precision of the output. This step uses an algorithm, which need not provide an accurate result. The next step is that the number is printed using C function sprintf with the chosen format. This step is accurate, so we cannot get wrong digits. We only can get wrong number of digits. In order to avoid using 15 digits in as.character(), we can use round(,digits), with digits argument appropriate for the current situation. > fact <- as.factor(round(nums,digits=1)); fact [1] 0.3 0.3 0.3 0.3 Levels: 0.3 Petr. ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel