Well, ?grep and ?regex are clearly apropos here -- dealing with character data is an essential skill for handling input from diverse sources with various formatting conventions. I suggest you go through one of the many regular expression tutorials on the web to learn more.
But this may not be the important issue here at all. If "<k" means the value is left censored at k -- i.e. we know it's less than k but not how much less -- than Sarah's proposal is not what you want to do. Exactly what you do want to do depends on context, and as it concerns statistical methodology, is not something that should be discussed here. Consult a local statistician if this is a correct guess. Otherwise ignore. ... and please post in plain text in future (as requested) as HTML can get garbled. Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Wed, Jul 9, 2014 at 10:26 AM, Sarah Goslee <sarah.gos...@gmail.com> wrote: > Hi Sam, > > I'd take the similar tack of removing the < instead. Note that if you > import the data frame using the stringsAsFactors=FALSE argument, you > don't need the first step. > > metals$Cedar.Creek <- as.character(metals$Cedar.Creek) > metals$Cedar.Creek <- gsub("<", "", metals$Cedar.Creek) > metals$Cedar.Creek <- as.numeric(metals$Cedar.Creek) > > R> str(metals) > 'data.frame': 19 obs. of 2 variables: > $ Parameter : Factor w/ 20 levels "Antimony","Arsenic",..: 1 2 3 4 6 > 7 8 9 10 11 ... > $ Cedar.Creek: num 100 100 500 100 10 1000 100 516 550 10 ... > > Sarah > > > On Wed, Jul 9, 2014 at 1:19 PM, Sam Albers <tonightstheni...@gmail.com> wrote: >> Hello, >> >> I have recently received a dataset from a metal analysis company. The >> dataset is filled with less than symbols. What I am looking for is a >> efficient way to subset for any whole numbers from the dataset. The column >> is automatically formatted as a factor because of the "<" symbols making it >> difficult to deal with the numbers is a useful way. >> >> So in sum any ideas on how I could subset the example below for only whole >> numbers? >> >> Thanks in advance! >> >> Sam >> >> #code >> >> metals <- >> >> >> structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L, >> 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label >> = c("Antimony", >> "Arsenic", "Barium", "Beryllium", "Boron (Hot Water Soluble)", >> "Cadmium", "Chromium", "Cobalt", "Copper", "Lead", "Mercury", >> "Molybdenum", "Nickel", "pH 1:2", "Selenium", "Silver", "Thallium", >> "Tin", "Vanadium", "Zinc"), class = "factor"), Cedar.Creek = structure(c(3L, >> 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L, >> 4L, 4L, 3L), .Label = c("<1", "<10", "<100", "<1000", "<200", >> "<5", "<500", "0.1", "0.13", "0.5", "0.8", "1.07", "1.1", "1.4", >> "1.5", "137", "154", "163", "165", "169", "178", "2.3", "2.4", >> "22", "24", "244", "27.2", "274", "3", "3.1", "40.2", "43", "50", >> "516", "53.3", "550", "569", "65", "66.1", "68", "7.6", "72", >> "77", "89", "951"), class = "factor")), .Names = c("Parameter", >> "Cedar.Creek"), row.names = c(NA, 19L), class = "data.frame") >> > > -- > Sarah Goslee > http://www.functionaldiversity.org > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.