[R] drop unused levels in subset.data.frame
Dear list, subset has a 'drop' argument that I had often mistaken for the one in [.factor which removes unused levels. Clearly it doesn't work that way, as shown below, d - data.frame(x = factor(letters[1:15]), y = factor(LETTERS[1:3])) s - subset(d, y==A, drop=TRUE) str(s) 'data.frame': 5 obs. of 2 variables: $ x: Factor w/ 15 levels a,b,c,d,..: 1 4 7 10 13 $ y: Factor w/ 3 levels A,B,C: 1 1 1 1 1 The subset still retains all the unused factor levels. I wonder how people usually get rid of all unused levels in a data.frame after subsetting? I came up with this but I may have missed a better built-in solution, dropit - function (d, columns = names(d), ...) { d[columns] = lapply(d[columns], [, drop=TRUE, ...) d } str(dropit(s)) 'data.frame': 5 obs. of 2 variables: $ x: Factor w/ 5 levels a,d,g,j,..: 1 2 3 4 5 $ y: Factor w/ 1 level A: 1 1 1 1 1 Best regards, baptiste __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] drop unused levels in subset.data.frame
On Nov 10, 2009, at 10:49 AM, baptiste auguie wrote: Dear list, subset has a 'drop' argument that I had often mistaken for the one in [.factor which removes unused levels. Clearly it doesn't work that way, as shown below, d - data.frame(x = factor(letters[1:15]), y = factor(LETTERS[1:3])) s - subset(d, y==A, drop=TRUE) str(s) 'data.frame': 5 obs. of 2 variables: $ x: Factor w/ 15 levels a,b,c,d,..: 1 4 7 10 13 $ y: Factor w/ 3 levels A,B,C: 1 1 1 1 1 The subset still retains all the unused factor levels. I wonder how people usually get rid of all unused levels in a data.frame after subsetting? I came up with this but I may have missed a better built-in solution, dropit - function (d, columns = names(d), ...) { d[columns] = lapply(d[columns], [, drop=TRUE, ...) d } If you are looking for a one-liner, then consider: data.frame(lapply(s, function(x) if (is.factor(x)){ factor(x)} else {x})) I added a numeric column to make sure I had not clobbered a non-factor variable. d - data.frame(x = factor(letters[1:15]), y = factor(LETTERS[1:3]), N=1:15) s - subset(d, y==A, drop=TRUE) str( data.frame(lapply(s, function(x) if (is.factor(x)){ factor(x)} else {x})) ) 'data.frame': 5 obs. of 3 variables: $ x: Factor w/ 5 levels a,d,g,j,..: 1 2 3 4 5 $ y: Factor w/ 1 level A: 1 1 1 1 1 $ N: int 1 4 7 10 13 str(dropit(s)) 'data.frame': 5 obs. of 2 variables: $ x: Factor w/ 5 levels a,d,g,j,..: 1 2 3 4 5 $ y: Factor w/ 1 level A: 1 1 1 1 1 Best regards, baptiste __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] drop unused levels in subset.data.frame
On Nov 10, 2009, at 9:49 AM, baptiste auguie wrote: Dear list, subset has a 'drop' argument that I had often mistaken for the one in [.factor which removes unused levels. Clearly it doesn't work that way, as shown below, d - data.frame(x = factor(letters[1:15]), y = factor(LETTERS[1:3])) s - subset(d, y==A, drop=TRUE) str(s) 'data.frame': 5 obs. of 2 variables: $ x: Factor w/ 15 levels a,b,c,d,..: 1 4 7 10 13 $ y: Factor w/ 3 levels A,B,C: 1 1 1 1 1 The subset still retains all the unused factor levels. I wonder how people usually get rid of all unused levels in a data.frame after subsetting? I came up with this but I may have missed a better built-in solution, dropit - function (d, columns = names(d), ...) { d[columns] = lapply(d[columns], [, drop=TRUE, ...) d } str(dropit(s)) 'data.frame': 5 obs. of 2 variables: $ x: Factor w/ 5 levels a,d,g,j,..: 1 2 3 4 5 $ y: Factor w/ 1 level A: 1 1 1 1 1 There is a page in the R wiki here: http://wiki.r-project.org/rwiki/doku.php?id=tips:data-manip:drop_unused_levels that has some approaches. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] drop unused levels in subset.data.frame
Neat, I reinvented the wheel! Would that seem like a useful example at the end of the help page for ?subset ? (it currently has very little to say about drop). Thanks also to David for the alternative idea. Best regards, baptiste 2009/11/10 Marc Schwartz marc_schwa...@me.com: On Nov 10, 2009, at 9:49 AM, baptiste auguie wrote: Dear list, subset has a 'drop' argument that I had often mistaken for the one in [.factor which removes unused levels. Clearly it doesn't work that way, as shown below, d - data.frame(x = factor(letters[1:15]), y = factor(LETTERS[1:3])) s - subset(d, y==A, drop=TRUE) str(s) 'data.frame': 5 obs. of 2 variables: $ x: Factor w/ 15 levels a,b,c,d,..: 1 4 7 10 13 $ y: Factor w/ 3 levels A,B,C: 1 1 1 1 1 The subset still retains all the unused factor levels. I wonder how people usually get rid of all unused levels in a data.frame after subsetting? I came up with this but I may have missed a better built-in solution, dropit - function (d, columns = names(d), ...) { d[columns] = lapply(d[columns], [, drop=TRUE, ...) d } str(dropit(s)) 'data.frame': 5 obs. of 2 variables: $ x: Factor w/ 5 levels a,d,g,j,..: 1 2 3 4 5 $ y: Factor w/ 1 level A: 1 1 1 1 1 There is a page in the R wiki here: http://wiki.r-project.org/rwiki/doku.php?id=tips:data-manip:drop_unused_levels that has some approaches. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] drop unused levels in subset.data.frame
If you don't want to preserve factor levels when subsetting use characters. There are very few other differences in behavior. Hadley On Tuesday, November 10, 2009, baptiste auguie baptiste.aug...@googlemail.com wrote: Dear list, subset has a 'drop' argument that I had often mistaken for the one in [.factor which removes unused levels. Clearly it doesn't work that way, as shown below, d - data.frame(x = factor(letters[1:15]), y = factor(LETTERS[1:3])) s - subset(d, y==A, drop=TRUE) str(s) 'data.frame': 5 obs. of 2 variables: $ x: Factor w/ 15 levels a,b,c,d,..: 1 4 7 10 13 $ y: Factor w/ 3 levels A,B,C: 1 1 1 1 1 The subset still retains all the unused factor levels. I wonder how people usually get rid of all unused levels in a data.frame after subsetting? I came up with this but I may have missed a better built-in solution, dropit - function (d, columns = names(d), ...) { d[columns] = lapply(d[columns], [, drop=TRUE, ...) d } str(dropit(s)) 'data.frame': 5 obs. of 2 variables: $ x: Factor w/ 5 levels a,d,g,j,..: 1 2 3 4 5 $ y: Factor w/ 1 level A: 1 1 1 1 1 Best regards, baptiste __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.