[R] drop unused levels in subset.data.frame

2009-11-10 Thread baptiste auguie
Dear list,

subset has a 'drop' argument that I had often mistaken for the one in
[.factor which removes unused levels.
Clearly it doesn't work that way, as shown below,

d - data.frame(x = factor(letters[1:15]), y = factor(LETTERS[1:3]))
s - subset(d, y==A, drop=TRUE)
str(s)
'data.frame':   5 obs. of  2 variables:
 $ x: Factor w/ 15 levels a,b,c,d,..: 1 4 7 10 13
 $ y: Factor w/ 3 levels A,B,C: 1 1 1 1 1

The subset still retains all the unused factor levels. I wonder how
people usually get rid of all unused levels in a data.frame after
subsetting? I came up with this but I may have missed a better
built-in solution,

dropit - function (d, columns = names(d), ...)
{
d[columns] = lapply(d[columns], [, drop=TRUE, ...)
d
}

str(dropit(s))
'data.frame':   5 obs. of  2 variables:
 $ x: Factor w/ 5 levels a,d,g,j,..: 1 2 3 4 5
 $ y: Factor w/ 1 level A: 1 1 1 1 1


Best regards,

baptiste

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] drop unused levels in subset.data.frame

2009-11-10 Thread David Winsemius


On Nov 10, 2009, at 10:49 AM, baptiste auguie wrote:


Dear list,

subset has a 'drop' argument that I had often mistaken for the one in
[.factor which removes unused levels.
Clearly it doesn't work that way, as shown below,

d - data.frame(x = factor(letters[1:15]), y = factor(LETTERS[1:3]))
s - subset(d, y==A, drop=TRUE)
str(s)
'data.frame':   5 obs. of  2 variables:
$ x: Factor w/ 15 levels a,b,c,d,..: 1 4 7 10 13
$ y: Factor w/ 3 levels A,B,C: 1 1 1 1 1

The subset still retains all the unused factor levels. I wonder how
people usually get rid of all unused levels in a data.frame after
subsetting? I came up with this but I may have missed a better
built-in solution,

dropit - function (d, columns = names(d), ...)
{
   d[columns] = lapply(d[columns], [, drop=TRUE, ...)
   d
}



If you are looking for a one-liner, then consider:

data.frame(lapply(s, function(x) if (is.factor(x)){ factor(x)} else  
{x}))


I added a numeric column to make sure I had not clobbered a non-factor  
variable.


 d - data.frame(x = factor(letters[1:15]), y =  
factor(LETTERS[1:3]), N=1:15)

 s - subset(d, y==A, drop=TRUE)
 str( data.frame(lapply(s, function(x) if (is.factor(x)){ factor(x)}  
else {x})) )

'data.frame':   5 obs. of  3 variables:
 $ x: Factor w/ 5 levels a,d,g,j,..: 1 2 3 4 5
 $ y: Factor w/ 1 level A: 1 1 1 1 1
 $ N: int  1 4 7 10 13



str(dropit(s))
'data.frame':   5 obs. of  2 variables:
$ x: Factor w/ 5 levels a,d,g,j,..: 1 2 3 4 5
$ y: Factor w/ 1 level A: 1 1 1 1 1


Best regards,

baptiste

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] drop unused levels in subset.data.frame

2009-11-10 Thread Marc Schwartz

On Nov 10, 2009, at 9:49 AM, baptiste auguie wrote:


Dear list,

subset has a 'drop' argument that I had often mistaken for the one in
[.factor which removes unused levels.
Clearly it doesn't work that way, as shown below,

d - data.frame(x = factor(letters[1:15]), y = factor(LETTERS[1:3]))
s - subset(d, y==A, drop=TRUE)
str(s)
'data.frame':   5 obs. of  2 variables:
$ x: Factor w/ 15 levels a,b,c,d,..: 1 4 7 10 13
$ y: Factor w/ 3 levels A,B,C: 1 1 1 1 1

The subset still retains all the unused factor levels. I wonder how
people usually get rid of all unused levels in a data.frame after
subsetting? I came up with this but I may have missed a better
built-in solution,

dropit - function (d, columns = names(d), ...)
{
   d[columns] = lapply(d[columns], [, drop=TRUE, ...)
   d
}

str(dropit(s))
'data.frame':   5 obs. of  2 variables:
$ x: Factor w/ 5 levels a,d,g,j,..: 1 2 3 4 5
$ y: Factor w/ 1 level A: 1 1 1 1 1


There is a page in the R wiki here:

  http://wiki.r-project.org/rwiki/doku.php?id=tips:data-manip:drop_unused_levels

that has some approaches.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] drop unused levels in subset.data.frame

2009-11-10 Thread baptiste auguie
Neat, I reinvented the wheel! Would that seem like a useful example at
the end of the help page for ?subset ? (it currently has very little
to say about drop).

Thanks also to David for the alternative idea.

Best regards,

baptiste


2009/11/10 Marc Schwartz marc_schwa...@me.com:
 On Nov 10, 2009, at 9:49 AM, baptiste auguie wrote:

 Dear list,

 subset has a 'drop' argument that I had often mistaken for the one in
 [.factor which removes unused levels.
 Clearly it doesn't work that way, as shown below,

 d - data.frame(x = factor(letters[1:15]), y = factor(LETTERS[1:3]))
 s - subset(d, y==A, drop=TRUE)
 str(s)
 'data.frame':   5 obs. of  2 variables:
 $ x: Factor w/ 15 levels a,b,c,d,..: 1 4 7 10 13
 $ y: Factor w/ 3 levels A,B,C: 1 1 1 1 1

 The subset still retains all the unused factor levels. I wonder how
 people usually get rid of all unused levels in a data.frame after
 subsetting? I came up with this but I may have missed a better
 built-in solution,

 dropit - function (d, columns = names(d), ...)
 {
   d[columns] = lapply(d[columns], [, drop=TRUE, ...)
   d
 }

 str(dropit(s))
 'data.frame':   5 obs. of  2 variables:
 $ x: Factor w/ 5 levels a,d,g,j,..: 1 2 3 4 5
 $ y: Factor w/ 1 level A: 1 1 1 1 1

 There is a page in the R wiki here:

  http://wiki.r-project.org/rwiki/doku.php?id=tips:data-manip:drop_unused_levels

 that has some approaches.

 HTH,

 Marc Schwartz



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] drop unused levels in subset.data.frame

2009-11-10 Thread Hadley Wickham
If you don't want to preserve factor levels when subsetting use
characters. There are very few other differences in behavior.

Hadley

On Tuesday, November 10, 2009, baptiste auguie
baptiste.aug...@googlemail.com wrote:
 Dear list,

 subset has a 'drop' argument that I had often mistaken for the one in
 [.factor which removes unused levels.
 Clearly it doesn't work that way, as shown below,

 d - data.frame(x = factor(letters[1:15]), y = factor(LETTERS[1:3]))
 s - subset(d, y==A, drop=TRUE)
 str(s)
 'data.frame':   5 obs. of  2 variables:
  $ x: Factor w/ 15 levels a,b,c,d,..: 1 4 7 10 13
  $ y: Factor w/ 3 levels A,B,C: 1 1 1 1 1

 The subset still retains all the unused factor levels. I wonder how
 people usually get rid of all unused levels in a data.frame after
 subsetting? I came up with this but I may have missed a better
 built-in solution,

 dropit - function (d, columns = names(d), ...)
 {
     d[columns] = lapply(d[columns], [, drop=TRUE, ...)
     d
 }

 str(dropit(s))
 'data.frame':   5 obs. of  2 variables:
  $ x: Factor w/ 5 levels a,d,g,j,..: 1 2 3 4 5
  $ y: Factor w/ 1 level A: 1 1 1 1 1


 Best regards,

 baptiste

 __
 r-h...@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.