[R] drop levels problem
Hi all: I am having trouble dropping levels, got a few hints online without success. Please consider the dataset below: I was under the inpression that subset(..drop=TRUE) would work but it doesn't library(ggplot2) library(hmisc) x - structure(list(first = c(38.2086, 43.1768, 43.146, 41.8044, 42.4232, 46.3646, 38.0813, 40.0745, 40.4889, 38.6246, 40.2826, 41.6056, 34.5353, 40.0768), second = c(43.3295, 42.4326, 38.8994, 37.0894, 42.3218, 46.1726, 39.1206, 41.2072, 42.4874, 40.2657, 38.7766, 40.8822, 42.0165, 49.2055), third = c(42.24, 42.992, 37.7419, 42.3448, 41.9131, 44.385, 42.7811, 44.1963, 40.8088, 43.9634, 38.7079, 38.0791, 44.3136, 39.5333)), .Names = c(first, second, third), class = data.frame, row.names = c(NA, -14L)) head(x);str(x) xmelt - melt(x) names(xmelt) - c(year,fatPerc) # Year variable is a factor with three levels # Subset to plot only 'first' year firstyear - subset(xmelt,year=='first');str(firstyear) # Plot showing three levels still after I made the subset ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter() # Try to drop the levels but dropUnusedLevels() doesn't seem to work here dropUnusedLevels() ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter() # code below also should drop levels but it doesn't #data.frame(lapply(firstyear, function(x) if (is.factor(x)){ factor(x)} else{x})) str(firstyear) Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] drop levels problem
Hi Felipe, On Mon, Nov 29, 2010 at 11:01 AM, Felipe Carrillo mazatlanmex...@yahoo.com wrote: Hi all: I am having trouble dropping levels, got a few hints online without success. Please consider the dataset below: I was under the inpression that subset(..drop=TRUE) would work but it doesn't Here drop is referring to: data.frame(1:10)[, 1] data.frame(1:10)[, 1, drop = FALSE] not to levels of a factor. library(ggplot2) library(hmisc) x - structure(list(first = c(38.2086, 43.1768, 43.146, 41.8044, 42.4232, 46.3646, 38.0813, 40.0745, 40.4889, 38.6246, 40.2826, 41.6056, 34.5353, 40.0768), second = c(43.3295, 42.4326, 38.8994, 37.0894, 42.3218, 46.1726, 39.1206, 41.2072, 42.4874, 40.2657, 38.7766, 40.8822, 42.0165, 49.2055), third = c(42.24, 42.992, 37.7419, 42.3448, 41.9131, 44.385, 42.7811, 44.1963, 40.8088, 43.9634, 38.7079, 38.0791, 44.3136, 39.5333)), .Names = c(first, second, third), class = data.frame, row.names = c(NA, -14L)) Thanks for the nice example! head(x);str(x) xmelt - melt(x) names(xmelt) - c(year,fatPerc) # Year variable is a factor with three levels # Subset to plot only 'first' year firstyear - subset(xmelt,year=='first');str(firstyear) # Plot showing three levels still after I made the subset ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter() right, because it is possible to have levels of a factor that have no observations---sometimes these are the most interesting (e.g., if you subset by smoking and found that there were no instances of lung cancer in non-smokers (not that extreme, but you get the point)). # Try to drop the levels but dropUnusedLevels() doesn't seem to work here dropUnusedLevels() sorry, I have had some difficulty installing Hmisc on my linux system and never gotten around to working it out. ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter() # code below also should drop levels but it doesn't #data.frame(lapply(firstyear, function(x) if (is.factor(x)){ factor(x)} else{x})) it would if you assigned it back to firstyear. You do it, and then just print to screen and the changed data goes off to oblivion. firstyear - data.frame(lapply(firstyear, function(x) if(is.factor(x)) {factor(x)} else {x})) str(firstyear) # should now just have one level Cheers, Josh str(firstyear) Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] drop levels problem
Take a look on droplevels function (R = 2.12) On Mon, Nov 29, 2010 at 5:01 PM, Felipe Carrillo mazatlanmex...@yahoo.comwrote: Hi all: I am having trouble dropping levels, got a few hints online without success. Please consider the dataset below: I was under the inpression that subset(..drop=TRUE) would work but it doesn't library(ggplot2) library(hmisc) x - structure(list(first = c(38.2086, 43.1768, 43.146, 41.8044, 42.4232, 46.3646, 38.0813, 40.0745, 40.4889, 38.6246, 40.2826, 41.6056, 34.5353, 40.0768), second = c(43.3295, 42.4326, 38.8994, 37.0894, 42.3218, 46.1726, 39.1206, 41.2072, 42.4874, 40.2657, 38.7766, 40.8822, 42.0165, 49.2055), third = c(42.24, 42.992, 37.7419, 42.3448, 41.9131, 44.385, 42.7811, 44.1963, 40.8088, 43.9634, 38.7079, 38.0791, 44.3136, 39.5333)), .Names = c(first, second, third), class = data.frame, row.names = c(NA, -14L)) head(x);str(x) xmelt - melt(x) names(xmelt) - c(year,fatPerc) # Year variable is a factor with three levels # Subset to plot only 'first' year firstyear - subset(xmelt,year=='first');str(firstyear) # Plot showing three levels still after I made the subset ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter() # Try to drop the levels but dropUnusedLevels() doesn't seem to work here dropUnusedLevels() ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter() # code below also should drop levels but it doesn't #data.frame(lapply(firstyear, function(x) if (is.factor(x)){ factor(x)} else{x})) str(firstyear) Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] drop levels problem
Thanks Joshua, I get it now, levels sometimes drive me loco Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA - Original Message From: Joshua Wiley jwiley.ps...@gmail.com To: Felipe Carrillo mazatlanmex...@yahoo.com Cc: r-h...@stat.math.ethz.ch Sent: Mon, November 29, 2010 11:18:45 AM Subject: Re: [R] drop levels problem Hi Felipe, On Mon, Nov 29, 2010 at 11:01 AM, Felipe Carrillo mazatlanmex...@yahoo.com wrote: Hi all: I am having trouble dropping levels, got a few hints online without success. Please consider the dataset below: I was under the inpression that subset(..drop=TRUE) would work but it doesn't Here drop is referring to: data.frame(1:10)[, 1] data.frame(1:10)[, 1, drop = FALSE] not to levels of a factor. library(ggplot2) library(hmisc) x - structure(list(first = c(38.2086, 43.1768, 43.146, 41.8044, 42.4232, 46.3646, 38.0813, 40.0745, 40.4889, 38.6246, 40.2826, 41.6056, 34.5353, 40.0768), second = c(43.3295, 42.4326, 38.8994, 37.0894, 42.3218, 46.1726, 39.1206, 41.2072, 42.4874, 40.2657, 38.7766, 40.8822, 42.0165, 49.2055), third = c(42.24, 42.992, 37.7419, 42.3448, 41.9131, 44.385, 42.7811, 44.1963, 40.8088, 43.9634, 38.7079, 38.0791, 44.3136, 39.5333)), .Names = c(first, second, third), class = data.frame, row.names = c(NA, -14L)) Thanks for the nice example! head(x);str(x) xmelt - melt(x) names(xmelt) - c(year,fatPerc) # Year variable is a factor with three levels # Subset to plot only 'first' year firstyear - subset(xmelt,year=='first');str(firstyear) # Plot showing three levels still after I made the subset ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter() right, because it is possible to have levels of a factor that have no observations---sometimes these are the most interesting (e.g., if you subset by smoking and found that there were no instances of lung cancer in non-smokers (not that extreme, but you get the point)). # Try to drop the levels but dropUnusedLevels() doesn't seem to work here dropUnusedLevels() sorry, I have had some difficulty installing Hmisc on my linux system and never gotten around to working it out. ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter() # code below also should drop levels but it doesn't #data.frame(lapply(firstyear, function(x) if (is.factor(x)){ factor(x)} else{x})) it would if you assigned it back to firstyear. You do it, and then just print to screen and the changed data goes off to oblivion. firstyear - data.frame(lapply(firstyear, function(x) if(is.factor(x)) {factor(x)} else {x})) str(firstyear) # should now just have one level Cheers, Josh str(firstyear) Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] drop levels problem
Just to follow up on my own post a bit: xmelt$year[xmelt$year == first, drop = TRUE] will do what you want. I think because in the subset there are multiple columns not all of which are factor, the method for '[' being used is not the factor one that would drop unused levels. I did not make that clear at all the first time around (and probably still butchered it, which some knowledgeable soul may correct me on). Also I did get Hmisc installed, but I think dropUnusedLevels() does not work in this case for a similar reason. Henrique's solution is, as usual, the shortest :) Josh [snip] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.