On Jan 14, 2014, at 1:38 PM, Jeff Johnson <mrjeffto...@gmail.com> wrote:
> I'm running the following to get what I would expect is a subset of > countries that are not equal to "US" AND COUNTRY is not in one of my > validcountries values. > > non_us <- subset(mydf, (COUNTRY %in% validcountries) & COUNTRY != "US", > select = COUNTRY, na.rm=TRUE) > > however, when I then do table(non_us) I get: >> table(non_us) > non_us > AE AN AR AT AU BB BD BE BH BM BN BO BR BS CA CH CM CN CO CR CY DE DK DO > EC ES > 0 3 0 2 1 31 4 1 1 1 45 1 1 4 5 86 3 1 8 1 2 1 8 2 1 > 2 4 > FI FR GB GR GU HK ID IE IL IN IO IT JM JP KH KR KY LU LV MO MX MY NG NL NO > NZ PA > 2 4 35 3 3 14 3 5 2 5 1 2 1 15 1 11 2 2 1 1 23 7 1 6 1 > 3 1 > PE PG PH PR PT RO RU SA SE SG TC TH TT TW TZ US ZA > 2 1 1 8 1 1 1 1 1 18 1 1 2 11 1 0 3 >> > > Notice US appears as the second to last. I expected it to NOT appear. > > Do you know if I'm using incorrect syntax? Is the & symbol equivalent to > AND (notice I have 2 criteria for subsetting)? Also, is COUNTRY != "US" > valid syntax? I don't get errors, but then again I don't get what I expect > back. > > Thanks in advance! > > > > -- > Jeff Review the Details section of ?subset, where you will find the following: "Factors may have empty levels after subsetting; unused levels are not automatically removed. See droplevels for a way to drop all unused levels from a data frame." Your syntax is fine and the behavior is as expected. Regards, Marc Schwartz ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.