On Thu, 2005-10-13 at 10:02 -0400, Duncan Murdoch wrote: > Sorry, a typo in my previous message (parens in the wrong place in the > conversion). > > Here it is corrected: > > I'm doing a big slow computation, and profiling shows that it is > spending a lot of time in match(), apparently because I have code like > > x %in% listofxvals > > Both x and listofxvals are factors with the same levels, so I could > probably speed this up by stripping off the levels and just treating > them as integer vectors, then restoring the levels at the end. > > What is the safest way to do this? I am worried that at some point x > and listofxvals will *not* have the same levels, and the optimization > will give the wrong answer. So I need code that guarantees they have > the same coding. > > I think this works, where "master" is a factor with the master list of > levels (guaranteed to be a superset of the levels of x and listofxvals), > but can anyone spot anything that might go wrong? > > # Strip the levels > x <- as.integer( factor(x, levels = levels(master) ) ) > > # Restore the levels > x <- structure( x, levels = levels(master), class = "factor" ) > > Thanks for any advice... > > Duncan Murdoch
Duncan, With the predicate that 'master' has the full superset of all possible factor levels defined, it would seem that this would be a reasonable way to go. This approach would also seem to eliminate whatever overhead is encountered as a result of the coercion of 'x' as a factor to a character vector, which is done by match(). One question I have is, what is the advantage of using structure() versus: x <- factor(x, levels = levels(master)) ? Thanks, Marc ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html