Thanks for all the responses. It sometimes difficult to outline exactly what you need. These response were helpful to get there. Speaking to Bert's point a bit, I needed a column to identify where the < symbol was used. If I knew more about R I think I might be embarrassed to post my solution to that problem but here is how I used Sarah's solution but still kept the info about detection limits. I'm sure there is a more elegant way:
metals <- structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label = c("Antimony", "Arsenic", "Barium", "Beryllium", "Boron (Hot Water Soluble)", "Cadmium", "Chromium", "Cobalt", "Copper", "Lead", "Mercury", "Molybdenum", "Nickel", "pH 1:2", "Selenium", "Silver", "Thallium", "Tin", "Vanadium", "Zinc"), class = "factor"), Cedar.Creek = structure(c(3L, 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L, 4L, 4L, 3L), .Label = c("<1", "<10", "<100", "<1000", "<200", "<5", "<500", "0.1", "0.13", "0.5", "0.8", "1.07", "1.1", "1.4", "1.5", "137", "154", "163", "165", "169", "178", "2.3", "2.4", "22", "24", "244", "27.2", "274", "3", "3.1", "40.2", "43", "50", "516", "53.3", "550", "569", "65", "66.1", "68", "7.6", "72", "77", "89", "951"), class = "factor")), .Names = c("Parameter", "Cedar.Creek"), row.names = c(NA, 19L), class = "data.frame") metals$temp1<-metals$Cedar.Creek metals$Cedar.Creek <- as.character(metals$Cedar.Creek) metals$Cedar.Creek <- gsub("<", "", metals$Cedar.Creek) metals$Cedar.Creek <- as.numeric(metals$Cedar.Creek) metals$temp2<-metals$temp1==metals$Cedar.Creek metals$Detection<-factor(ifelse(metals$temp2=="TRUE","Measured","Limit")) metals[,c(1,2,5)] Thanks again! Sam On Wed, Jul 9, 2014 at 10:41 AM, Bert Gunter <gunter.ber...@gene.com> wrote: > Well, ?grep and ?regex are clearly apropos here -- dealing with > character data is an essential skill for handling input from diverse > sources with various formatting conventions. I suggest you go through > one of the many regular expression tutorials on the web to learn more. > > But this may not be the important issue here at all. If "<k" means the > value is left censored at k -- i.e. we know it's less than k but not > how much less -- than Sarah's proposal is not what you want to do. > Exactly what you do want to do depends on context, and as it concerns > statistical methodology, is not something that should be discussed > here. Consult a local statistician if this is a correct guess. > Otherwise ignore. > > ... and please post in plain text in future (as requested) as HTML can > get garbled. > > Bert Gunter > Genentech Nonclinical Biostatistics > (650) 467-7374 > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." > Clifford Stoll > > > > > On Wed, Jul 9, 2014 at 10:26 AM, Sarah Goslee <sarah.gos...@gmail.com> wrote: >> Hi Sam, >> >> I'd take the similar tack of removing the < instead. Note that if you >> import the data frame using the stringsAsFactors=FALSE argument, you >> don't need the first step. >> >> metals$Cedar.Creek <- as.character(metals$Cedar.Creek) >> metals$Cedar.Creek <- gsub("<", "", metals$Cedar.Creek) >> metals$Cedar.Creek <- as.numeric(metals$Cedar.Creek) >> >> R> str(metals) >> 'data.frame': 19 obs. of 2 variables: >> $ Parameter : Factor w/ 20 levels "Antimony","Arsenic",..: 1 2 3 4 6 >> 7 8 9 10 11 ... >> $ Cedar.Creek: num 100 100 500 100 10 1000 100 516 550 10 ... >> >> Sarah >> >> >> On Wed, Jul 9, 2014 at 1:19 PM, Sam Albers <tonightstheni...@gmail.com> >> wrote: >>> Hello, >>> >>> I have recently received a dataset from a metal analysis company. The >>> dataset is filled with less than symbols. What I am looking for is a >>> efficient way to subset for any whole numbers from the dataset. The column >>> is automatically formatted as a factor because of the "<" symbols making it >>> difficult to deal with the numbers is a useful way. >>> >>> So in sum any ideas on how I could subset the example below for only whole >>> numbers? >>> >>> Thanks in advance! >>> >>> Sam >>> >>> #code >>> >>> metals <- >>> >>> >>> structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L, >>> 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label >>> = c("Antimony", >>> "Arsenic", "Barium", "Beryllium", "Boron (Hot Water Soluble)", >>> "Cadmium", "Chromium", "Cobalt", "Copper", "Lead", "Mercury", >>> "Molybdenum", "Nickel", "pH 1:2", "Selenium", "Silver", "Thallium", >>> "Tin", "Vanadium", "Zinc"), class = "factor"), Cedar.Creek = structure(c(3L, >>> 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L, >>> 4L, 4L, 3L), .Label = c("<1", "<10", "<100", "<1000", "<200", >>> "<5", "<500", "0.1", "0.13", "0.5", "0.8", "1.07", "1.1", "1.4", >>> "1.5", "137", "154", "163", "165", "169", "178", "2.3", "2.4", >>> "22", "24", "244", "27.2", "274", "3", "3.1", "40.2", "43", "50", >>> "516", "53.3", "550", "569", "65", "66.1", "68", "7.6", "72", >>> "77", "89", "951"), class = "factor")), .Names = c("Parameter", >>> "Cedar.Creek"), row.names = c(NA, 19L), class = "data.frame") >>> >> >> -- >> Sarah Goslee >> http://www.functionaldiversity.org >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.