On 17/11/2015 2:25 PM, Duncan Murdoch wrote:
On 17/11/2015 2:14 PM, Karl Schilling wrote:
> Dear all,
>
> I have one observation that I do not quite understand. Maybe someone
> can clarify this issue for me.
>
> I have a data frame which I want to subset based on a grouping variable,
> say "group". Actually, "group" is a numeric value, but it is saved as a
> character. I give some code to generate an exemplary data frame below.
>
> Now, if I use
>
> MySubset <- subset(Data, Data$group == "..")
>
> everything works fine, as expected. ".." stands here for the value of
> group given as a character string.
>
> Surprisingly, I also get a correct subsetting if I simply give the plain
> numeric value of group (like MySubset <- subset(Data, Data$group == ..),
> AS LONG AS this numeric value is less then 100000.
>
> If the numeric value is 100000 or larger, I get an empty subset.
>
> OK, I know how to avoid this situation, but I wonder what the
> explanation for this for me rather strange behavior might be.
>
> Thank you so much for your suggestions.

If you are comparing a character value to a numeric value, the numeric
value is converted to character using as.character() for the
comparison.  as.character(100000) or a larger number is likely not
"100000"; try it.  (With the options I have on my
computer, I get "1e+05".)

If you want a numeric comparison, be explicit:

subset(Data, as.numeric(Data$group) == ..)

This might be bad advice. If Data$group is a factor (as it tends to be when character data is put in a dataframe), this will use the underlying factor code, not the visible one. You need to use

as.numeric(as.character(Data$group))

to do the conversion you probably want.

Duncan Murdoch


Duncan Murdoch

>
>
> Karl Schilling
>
>
> #####
> Exemplary code for reproducing the above described problem:
>
> options(stringsAsFactors = F)
>
> # set up some data frame
> value <- c(1:6)
> group <- rep(c("20000", "99999", "100000"), each = 2)
> Data <- data.frame(value = value, group = group)
> str(Data)
>
> # subset data frame based on the value of the variable "group",
> # treating this value once as a character, and once as a number:
>
> Data20 <- subset(Data, Data$group =="20000")
> str(Data20)
> Data20N <- subset(Data, Data$group ==20000)
> str(Data20N)
>
>
> Data99 <- subset(Data, Data$group =="99999")
> str(Data99)
> Data99N <- subset(Data, Data$group ==99999)
> str(Data99N)
> Data100 <- subset(Data, Data$group =="100000")
> str(Data100)
> Data100N <- subset(Data, Data$group ==100000)
> str(Data100N)
>


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to