On Jul 9, 2014, at 12:19 PM, Sam Albers <tonightstheni...@gmail.com> wrote:
> Hello, > > I have recently received a dataset from a metal analysis company. The > dataset is filled with less than symbols. What I am looking for is a > efficient way to subset for any whole numbers from the dataset. The column > is automatically formatted as a factor because of the "<" symbols making it > difficult to deal with the numbers is a useful way. > > So in sum any ideas on how I could subset the example below for only whole > numbers? > > Thanks in advance! > > Sam > > #code > > metals <- > > > structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L, > 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label > = c("Antimony", > "Arsenic", "Barium", "Beryllium", "Boron (Hot Water Soluble)", > "Cadmium", "Chromium", "Cobalt", "Copper", "Lead", "Mercury", > "Molybdenum", "Nickel", "pH 1:2", "Selenium", "Silver", "Thallium", > "Tin", "Vanadium", "Zinc"), class = "factor"), Cedar.Creek = structure(c(3L, > 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L, > 4L, 4L, 3L), .Label = c("<1", "<10", "<100", "<1000", "<200", > "<5", "<500", "0.1", "0.13", "0.5", "0.8", "1.07", "1.1", "1.4", > "1.5", "137", "154", "163", "165", "169", "178", "2.3", "2.4", > "22", "24", "244", "27.2", "274", "3", "3.1", "40.2", "43", "50", > "516", "53.3", "550", "569", "65", "66.1", "68", "7.6", "72", > "77", "89", "951"), class = "factor")), .Names = c("Parameter", > "Cedar.Creek"), row.names = c(NA, 19L), class = "data.frame") Sam, You can use ?gsub to remove the '<' characters from the column and then use ?subset to select the records you wish. Note that gsub() returns a character vector, so you want to coerce to numeric. > as.numeric(gsub("<", "", metals$Cedar.Creek)) [1] 100 100 500 100 10 1000 100 516 550 10 200 500 100 [14] 500 100 951 1000 1000 100 For example: > subset(metals, as.numeric(gsub("<", "", Cedar.Creek)) == 100) Parameter Cedar.Creek 1 Antimony <100 2 Arsenic <100 4 Beryllium <100 7 Cobalt <100 13 Selenium <100 15 Thallium <100 19 Antimony <100 > subset(metals, as.numeric(gsub("<", "", Cedar.Creek)) <= 500) Parameter Cedar.Creek 1 Antimony <100 2 Arsenic <100 3 Barium <500 4 Beryllium <100 5 Cadmium <10 7 Cobalt <100 10 Mercury <10 11 Molybdenum <200 12 Nickel <500 13 Selenium <100 14 Silver <500 15 Thallium <100 19 Antimony <100 You can also just create a new column that is numeric and go from there: metals$CC.Num <- as.numeric(gsub("<", "", metals$Cedar.Creek)) > str(metals) 'data.frame': 19 obs. of 3 variables: $ Parameter : Factor w/ 20 levels "Antimony","Arsenic",..: 1 2 3 4 6 7 8 9 10 11 ... $ Cedar.Creek: Factor w/ 45 levels "<1","<10","<100",..: 3 3 7 3 2 4 3 34 36 2 ... $ CC.Num : num 100 100 500 100 10 1000 100 516 550 10 ... > metals Parameter Cedar.Creek CC.Num 1 Antimony <100 100 2 Arsenic <100 100 3 Barium <500 500 4 Beryllium <100 100 5 Cadmium <10 10 6 Chromium <1000 1000 7 Cobalt <100 100 8 Copper 516 516 9 Lead 550 550 10 Mercury <10 10 11 Molybdenum <200 200 12 Nickel <500 500 13 Selenium <100 100 14 Silver <500 500 15 Thallium <100 100 16 Tin 951 951 17 Vanadium <1000 1000 18 Zinc <1000 1000 19 Antimony <100 100 Regards, Marc Schwartz ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.