Re: [R] Removing and restoring factor levels (TYPO CORRECTED)

2005-10-13 Thread Marc Schwartz (via MN)
On Thu, 2005-10-13 at 14:31 -0400, Duncan Murdoch wrote:
> On 10/13/2005 1:07 PM, Marc Schwartz (via MN) wrote:
> > On Thu, 2005-10-13 at 10:02 -0400, Duncan Murdoch wrote:
> >> Sorry, a typo in my previous message (parens in the wrong place in the 
> >> conversion).
> >> 
> >> Here it is corrected:
> >> 
> >> I'm doing a big slow computation, and profiling shows that it is
> >> spending a lot of time in match(), apparently because I have code like
> >> 
> >> x %in% listofxvals
> >> 
> >> Both x and listofxvals are factors with the same levels, so I could
> >> probably speed this up by stripping off the levels and just treating
> >> them as integer vectors, then restoring the levels at the end.
> >> 
> >> What is the safest way to do this?  I am worried that at some point x
> >> and listofxvals will *not* have the same levels, and the optimization
> >> will give the wrong answer.  So I need code that guarantees they have
> >> the same coding.
> >> 
> >> I think this works, where "master" is a factor with the master list of
> >> levels (guaranteed to be a superset of the levels of x and listofxvals),
> >> but can anyone spot anything that might go wrong?
> >> 
> >> # Strip the levels
> >> x <- as.integer( factor(x, levels = levels(master) ) )
> >> 
> >> # Restore the levels
> >> x <- structure( x, levels = levels(master), class = "factor" )
> >> 
> >> Thanks for any advice...
> >> 
> >> Duncan Murdoch
> > 
> > Duncan,
> > 
> > With the predicate that 'master' has the full superset of all possible
> > factor levels defined, it would seem that this would be a reasonable way
> > to go.
> > 
> > This approach would also seem to eliminate whatever overhead is
> > encountered as a result of the coercion of 'x' as a factor to a
> > character vector, which is done by match().
> > 
> > One question I have is, what is the advantage of using structure()
> > versus:
> > 
> >x <- factor(x, levels = levels(master))
> > 
> > ?
> 
> That one doesn't work.  What "factor(x, levels=levels(master))" says is 
> to convert x to a factor, coding the values in it according the levels 
> in master.  But at this point x has values which are integers, so  they 
> won't match the levels of master, which are probably character strings.
> 
> For example:
> 
>  > master <- factor(letters)
>  > print(x <- factor(letters[1:3]))
> [1] a b c
> Levels: a b c
>  > print(x <- as.integer( factor(x, levels = levels(master) ) ) )
> [1] 1 2 3
>  > print(x <- factor(x, levels = levels(master)))
> [1]   
> Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
> 
> I get NA's at the end because the values 1,2,3 aren't in the vector of 
> factor levels (which are the lowercase letters).

As opposed to:

> print(x <- structure(x, levels = levels(master), class = "factor" ))
[1] a b c
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z


OK.  Makes sense. Thanks for the clarification.

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Removing and restoring factor levels (TYPO CORRECTED)

2005-10-13 Thread Duncan Murdoch
On 10/13/2005 1:07 PM, Marc Schwartz (via MN) wrote:
> On Thu, 2005-10-13 at 10:02 -0400, Duncan Murdoch wrote:
>> Sorry, a typo in my previous message (parens in the wrong place in the 
>> conversion).
>> 
>> Here it is corrected:
>> 
>> I'm doing a big slow computation, and profiling shows that it is
>> spending a lot of time in match(), apparently because I have code like
>> 
>> x %in% listofxvals
>> 
>> Both x and listofxvals are factors with the same levels, so I could
>> probably speed this up by stripping off the levels and just treating
>> them as integer vectors, then restoring the levels at the end.
>> 
>> What is the safest way to do this?  I am worried that at some point x
>> and listofxvals will *not* have the same levels, and the optimization
>> will give the wrong answer.  So I need code that guarantees they have
>> the same coding.
>> 
>> I think this works, where "master" is a factor with the master list of
>> levels (guaranteed to be a superset of the levels of x and listofxvals),
>> but can anyone spot anything that might go wrong?
>> 
>> # Strip the levels
>> x <- as.integer( factor(x, levels = levels(master) ) )
>> 
>> # Restore the levels
>> x <- structure( x, levels = levels(master), class = "factor" )
>> 
>> Thanks for any advice...
>> 
>> Duncan Murdoch
> 
> Duncan,
> 
> With the predicate that 'master' has the full superset of all possible
> factor levels defined, it would seem that this would be a reasonable way
> to go.
> 
> This approach would also seem to eliminate whatever overhead is
> encountered as a result of the coercion of 'x' as a factor to a
> character vector, which is done by match().
> 
> One question I have is, what is the advantage of using structure()
> versus:
> 
>x <- factor(x, levels = levels(master))
> 
> ?

That one doesn't work.  What "factor(x, levels=levels(master))" says is 
to convert x to a factor, coding the values in it according the levels 
in master.  But at this point x has values which are integers, so  they 
won't match the levels of master, which are probably character strings.

For example:

 > master <- factor(letters)
 > print(x <- factor(letters[1:3]))
[1] a b c
Levels: a b c
 > print(x <- as.integer( factor(x, levels = levels(master) ) ) )
[1] 1 2 3
 > print(x <- factor(x, levels = levels(master)))
[1]   
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z

I get NA's at the end because the values 1,2,3 aren't in the vector of 
factor levels (which are the lowercase letters).

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Removing and restoring factor levels (TYPO CORRECTED)

2005-10-13 Thread Marc Schwartz (via MN)
On Thu, 2005-10-13 at 10:02 -0400, Duncan Murdoch wrote:
> Sorry, a typo in my previous message (parens in the wrong place in the 
> conversion).
> 
> Here it is corrected:
> 
> I'm doing a big slow computation, and profiling shows that it is
> spending a lot of time in match(), apparently because I have code like
> 
> x %in% listofxvals
> 
> Both x and listofxvals are factors with the same levels, so I could
> probably speed this up by stripping off the levels and just treating
> them as integer vectors, then restoring the levels at the end.
> 
> What is the safest way to do this?  I am worried that at some point x
> and listofxvals will *not* have the same levels, and the optimization
> will give the wrong answer.  So I need code that guarantees they have
> the same coding.
> 
> I think this works, where "master" is a factor with the master list of
> levels (guaranteed to be a superset of the levels of x and listofxvals),
> but can anyone spot anything that might go wrong?
> 
> # Strip the levels
> x <- as.integer( factor(x, levels = levels(master) ) )
> 
> # Restore the levels
> x <- structure( x, levels = levels(master), class = "factor" )
> 
> Thanks for any advice...
> 
> Duncan Murdoch

Duncan,

With the predicate that 'master' has the full superset of all possible
factor levels defined, it would seem that this would be a reasonable way
to go.

This approach would also seem to eliminate whatever overhead is
encountered as a result of the coercion of 'x' as a factor to a
character vector, which is done by match().

One question I have is, what is the advantage of using structure()
versus:

   x <- factor(x, levels = levels(master))

?

Thanks,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Removing and restoring factor levels (TYPO CORRECTED)

2005-10-13 Thread Duncan Murdoch
Sorry, a typo in my previous message (parens in the wrong place in the 
conversion).

Here it is corrected:

I'm doing a big slow computation, and profiling shows that it is
spending a lot of time in match(), apparently because I have code like

x %in% listofxvals

Both x and listofxvals are factors with the same levels, so I could
probably speed this up by stripping off the levels and just treating
them as integer vectors, then restoring the levels at the end.

What is the safest way to do this?  I am worried that at some point x
and listofxvals will *not* have the same levels, and the optimization
will give the wrong answer.  So I need code that guarantees they have
the same coding.

I think this works, where "master" is a factor with the master list of
levels (guaranteed to be a superset of the levels of x and listofxvals),
but can anyone spot anything that might go wrong?

# Strip the levels
x <- as.integer( factor(x, levels = levels(master) ) )

# Restore the levels
x <- structure( x, levels = levels(master), class = "factor" )

Thanks for any advice...

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html