The factor approach is horrifically ugly and dangerous. Even if it didn't have the extraordinarily poor behavior documented below, it simply isn't well-defined what it should do. The explicit approximation route is far far preferable in every way: more predictable, more controllable, and even (though it hardly matters usually) faster.
Let's look at the extraordinarily poor behavior I was mentioning. Consider: nums <- (.3 + 2e-16 * c(-2,-1,1,2)); nums [1] 0.3 0.3 0.3 0.3 Though they all print as .3 with the default precision (which is normal and expected), they are all different from .3: nums - .3 => -3.885781e-16 -2.220446e-16 2.220446e-16 3.885781e-16 When we convert nums to a factor, we get: fact <- as.factor(nums); fact [1] 0.300000000000000 0.3 0.3 0.300000000000000 Levels: 0.300000000000000 0.3 0.3 0.300000000000000 Not clear what the difference between 0.300000000000000 and 0.3 is supposed to be, nor why some 0.300000000000000 are < .3 and others are > .3, but let's put that aside for the moment. Now let's look at the relations among the factor values: fact[1]==fact[2] [1] FALSE > fact[1]==fact[4] [1] TRUE So though nums[1] < nums[2] < nums[3] < nums[4], fact[1] compares *unequal* to fact[2] though it compares *equal* to fact[4]. Apparently R is comparing the *names* of the levels rather than the indexes in the factor. This would be weird even if it didn't lead to this very bad case. Hope this helps, -s On Mon, Mar 16, 2009 at 6:53 PM, Daniel Murphy <chiefmur...@gmail.com> wrote: > I have a matrix whose columns were filled with values which were functions > of cvseq<-seq(.2,.3,by=.1) (and a row value of mode integer). To do a lookup > for cv=.3 later, I wanted to match(.3,cvseq), which gave me NA, hence my > question. I thought R would match .3 in cvseq within .Machine$double.eps, > but I can understand it if .3 and the second element of cvseq would not have > identical bits. > Besides the helpful suggestions below, I also tried >> cvseqf <- as.factor(cvseq) >> match(.3,cvseq) > [1] 2 > which worked. > In general, would it be better to go the enumeration route via as.factor or > the approximation route? > Thanks for the help. > -Dan > > On Mon, Mar 16, 2009 at 8:24 AM, Stavros Macrakis <macra...@alum.mit.edu> > wrote: >> >> Well, first of all, seq(from=.2,to=.3) gives c(0.2), so I assume you >> really mean something like seq(from=.2,to=.3,by=.1), which gives >> c(0.2, 0.3). >> >> %in% tests for exact equality, which is almost never a good idea with >> floating-point numbers. >> >> You need to define what exactly you mean by "in" for floating-point >> numbers. What sort of tolerance are you willing to allow? >> >> Some possibilities would be for example: >> >> approxin <- function(x,list,tol) any(abs(list-x)<tol) # absolute >> tolerance >> >> rapproxin <- function(x,list,tol) (x==0 && 0 %in% list) || >> any(abs((list-x)/x)<=tol,na.rm=TRUE) >> # relative tolerance; only exact 0 will match 0 >> >> Hope this helps, >> >> -s >> >> On Mon, Mar 16, 2009 at 9:36 AM, Daniel Murphy <chiefmur...@gmail.com> >> wrote: >> > Hello:I am trying to match the value 0.3 in the sequence seq(.2,.3). I >> > get >> >> 0.3 %in% seq(from=.2,to=.3) >> > [1] FALSE >> > Yet >> >> 0.3 %in% c(.2,.3) >> > [1] TRUE >> > For arbitrary sequences, this "invisible .3" has been problematic. What >> > is >> > the best way to work around this? > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel