Re: [R] < symbols in a data frame

Bert Gunter Wed, 09 Jul 2014 10:44:16 -0700

Well, ?grep and ?regex are clearly apropos here -- dealing with
character data is an essential skill for handling input from diverse
sources with various formatting conventions. I suggest you go through
one of the many regular expression tutorials on the web to learn more.


But this may not be the important issue here at all. If "<k" means the
value is left censored at k -- i.e. we know it's less than k but not
how much less -- than Sarah's proposal is not what you want to do.
Exactly what you do want to do depends on context, and as it concerns
statistical methodology, is not something that should be discussed
here. Consult a local statistician if this is a correct guess.
Otherwise ignore.

... and please post in plain text in future (as requested) as HTML can
get garbled.

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Wed, Jul 9, 2014 at 10:26 AM, Sarah Goslee <sarah.gos...@gmail.com> wrote:
> Hi Sam,
>
> I'd take the similar tack of removing the < instead. Note that if you
> import the data frame using the stringsAsFactors=FALSE argument, you
> don't need the first step.
>
> metals$Cedar.Creek <- as.character(metals$Cedar.Creek)
> metals$Cedar.Creek <- gsub("<", "", metals$Cedar.Creek)
> metals$Cedar.Creek <- as.numeric(metals$Cedar.Creek)
>
> R> str(metals)
> 'data.frame':    19 obs. of  2 variables:
>  $ Parameter  : Factor w/ 20 levels "Antimony","Arsenic",..: 1 2 3 4 6
> 7 8 9 10 11 ...
>  $ Cedar.Creek: num  100 100 500 100 10 1000 100 516 550 10 ...
>
> Sarah
>
>
> On Wed, Jul 9, 2014 at 1:19 PM, Sam Albers <tonightstheni...@gmail.com> wrote:
>> Hello,
>>
>> I have recently received a dataset from a metal analysis company. The
>> dataset is filled with less than symbols. What I am looking for is a
>> efficient way to subset for any whole numbers from the dataset. The column
>> is automatically formatted as a factor because of the "<" symbols making it
>> difficult to deal with the numbers is a useful way.
>>
>> So in sum any ideas on how I could subset the example below for only whole
>> numbers?
>>
>> Thanks in advance!
>>
>> Sam
>>
>> #code
>>
>> metals <-
>>
>>
>> structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L,
>> 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label
>> = c("Antimony",
>> "Arsenic", "Barium", "Beryllium", "Boron (Hot Water Soluble)",
>> "Cadmium", "Chromium", "Cobalt", "Copper", "Lead", "Mercury",
>> "Molybdenum", "Nickel", "pH 1:2", "Selenium", "Silver", "Thallium",
>> "Tin", "Vanadium", "Zinc"), class = "factor"), Cedar.Creek = structure(c(3L,
>> 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L,
>> 4L, 4L, 3L), .Label = c("<1", "<10", "<100", "<1000", "<200",
>> "<5", "<500", "0.1", "0.13", "0.5", "0.8", "1.07", "1.1", "1.4",
>> "1.5", "137", "154", "163", "165", "169", "178", "2.3", "2.4",
>> "22", "24", "244", "27.2", "274", "3", "3.1", "40.2", "43", "50",
>> "516", "53.3", "550", "569", "65", "66.1", "68", "7.6", "72",
>> "77", "89", "951"), class = "factor")), .Names = c("Parameter",
>> "Cedar.Creek"), row.names = c(NA, 19L), class = "data.frame")
>>
>
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] < symbols in a data frame

Reply via email to