This comes up every now and then. The fact is that the behavior of R in not throwing away information unless explicitly told to, is a feature, and one that I don't want to see go away.
Yes in your example doing a table or plot based on iris1$Species gives meaningless results, but anything you do with that column in now meaningless, why do you care if there is extra information in a column that you should not be doing anything further with anyways? Does it really make sense to use that column for anything now? It is a bit like a teacher bemoaning the fact that half of his/her students scored below the class median. Now some proposes that all factors should have levels dropped after subsetting, this is worse than useless, consider the following made up example: tmp1 <- rep( c(1:5,1:5), c(10,20,30,20,0,0,10,20,30,20) ) result <- factor(tmp1, levels=1:5, labels=c('Strongly Disagree', 'Disagree', 'No Opinion', 'Agree', 'Strongly Agree') ) my.df <- data.frame( result=result, sex = rep( c('M','F'), each=80 ) ) df.m.2 <- df.m.1 <- my.df[ my.df$sex=='M', ] df.f.2 <- df.f.1 <- my.df[ my.df$sex=='F', ] df.m.1[] <- lapply( df.m.1, factor ) df.f.1[] <- lapply( df.f.1, factor ) dev.new() par(mfrow=c(2,1)) barplot(table(df.m.1$result), main='Males') barplot(table(df.f.1$result), main='Females') dev.new() par(mfrow=c(2,1)) barplot(table(df.m.2$result), main='Males') barplot(table(df.f.2$result), main='Females') Which pair of plots is more meaningful? Easier to read? Not misleading? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > project.org] On Behalf Of Ivan Calandra > Sent: Tuesday, September 21, 2010 7:23 AM > To: r-help@r-project.org > Subject: Re: [R] removed data is still there! > > Hi, > > I knew about that way already, with factor(). Isn't there another > possibility, directly at the subsetting step? That would be of great > help > iris1 <- iris[iris$Species == 'setosa',] ## I mean here > > Ivan > > > > Le 9/21/2010 15:14, David Winsemius a écrit : > > > > On Sep 21, 2010, at 9:04 AM, David Winsemius wrote: > > > >> > >> On Sep 21, 2010, at 8:39 AM, pdb wrote: > >> > >>> > >>> Thanks, but that was what I just discovered myself the hard way. > >>> > >>> What I really wanted to know was how to solve this issue. > >> > >> Although that was _not_ what you requested in your first post. > >> > >> 2 options: > >> > >> ?table > >> > >> ?factor > >> > >> iris1$Species <-factor(iris$Species) # removes "extraneous" levels > > > > And that was not what I meant to type. Meant for factor to be applied > > to second dataframe.: > > > > iris1$Species <-factor(iris1$Species) # removes "extraneous" levels > > > > > >> > >>> -- > > > > David Winsemius, MD > > West Hartford, CT > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Ivan CALANDRA > PhD Student > University of Hamburg > Biozentrum Grindel und Zoologisches Museum > Abt. Säugetiere > Martin-Luther-King-Platz 3 > D-20146 Hamburg, GERMANY > +49(0)40 42838 6231 > ivan.calan...@uni-hamburg.de > > ********** > http://www.for771.uni-bonn.de > http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.