On Thu, 2005-10-13 at 14:31 -0400, Duncan Murdoch wrote: > On 10/13/2005 1:07 PM, Marc Schwartz (via MN) wrote: > > On Thu, 2005-10-13 at 10:02 -0400, Duncan Murdoch wrote: > >> Sorry, a typo in my previous message (parens in the wrong place in the > >> conversion). > >> > >> Here it is corrected: > >> > >> I'm doing a big slow computation, and profiling shows that it is > >> spending a lot of time in match(), apparently because I have code like > >> > >> x %in% listofxvals > >> > >> Both x and listofxvals are factors with the same levels, so I could > >> probably speed this up by stripping off the levels and just treating > >> them as integer vectors, then restoring the levels at the end. > >> > >> What is the safest way to do this? I am worried that at some point x > >> and listofxvals will *not* have the same levels, and the optimization > >> will give the wrong answer. So I need code that guarantees they have > >> the same coding. > >> > >> I think this works, where "master" is a factor with the master list of > >> levels (guaranteed to be a superset of the levels of x and listofxvals), > >> but can anyone spot anything that might go wrong? > >> > >> # Strip the levels > >> x <- as.integer( factor(x, levels = levels(master) ) ) > >> > >> # Restore the levels > >> x <- structure( x, levels = levels(master), class = "factor" ) > >> > >> Thanks for any advice... > >> > >> Duncan Murdoch > > > > Duncan, > > > > With the predicate that 'master' has the full superset of all possible > > factor levels defined, it would seem that this would be a reasonable way > > to go. > > > > This approach would also seem to eliminate whatever overhead is > > encountered as a result of the coercion of 'x' as a factor to a > > character vector, which is done by match(). > > > > One question I have is, what is the advantage of using structure() > > versus: > > > > x <- factor(x, levels = levels(master)) > > > > ? > > That one doesn't work. What "factor(x, levels=levels(master))" says is > to convert x to a factor, coding the values in it according the levels > in master. But at this point x has values which are integers, so they > won't match the levels of master, which are probably character strings. > > For example: > > > master <- factor(letters) > > print(x <- factor(letters[1:3])) > [1] a b c > Levels: a b c > > print(x <- as.integer( factor(x, levels = levels(master) ) ) ) > [1] 1 2 3 > > print(x <- factor(x, levels = levels(master))) > [1] <NA> <NA> <NA> > Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z > > I get NA's at the end because the values 1,2,3 aren't in the vector of > factor levels (which are the lowercase letters).
As opposed to: > print(x <- structure(x, levels = levels(master), class = "factor" )) [1] a b c Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z OK. Makes sense. Thanks for the clarification. Marc ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html