Re: [R] Strange result when subsetting a data frame based on a character variable
The conversion seems to be controlled by the scipen setting: > options("scipen") $scipen [1] 0 > as.character(10) [1] "1e+05" > options(scipen=5) > as.character(10) [1] "10" > as.character(100) [1] "100" > as.character(1000) [1] "1000" - David L Carlson Department of Anthropology Texas A University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of peter dalgaard Sent: Tuesday, November 17, 2015 3:57 PM To: Bert Gunter Cc: r-help Subject: Re: [R] Strange result when subsetting a data frame based on a character variable > On 17 Nov 2015, at 20:37 , Bert Gunter <bgunter.4...@gmail.com> wrote: > >> 2 == "2" > [1] TRUE > > ?"==" says: > > "If the two arguments are atomic vectors of different types, one is > coerced to the type of the other, the (decreasing) order of precedence > being character, complex, numeric, integer, logical and raw." > >> as.character(9) > [1] "9" >> as.character(10) > [1] "1e+05" >> as.character(10) == "10" > [1] FALSE > Also notice that, for similar reasons > 10 > "2" [1] FALSE (At least in most collations. I recently discovered that OSX Finder sorted 2dnorm.R between 02-Probability.toc and 03-Combinatorics-2x2.pdf.) -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange result when subsetting a data frame based on a character variable
Thanks, David. Probably as one should expect. But reinforces what others said about first doing explicit conversions so that comparisons are not made made between differing types. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Tue, Nov 17, 2015 at 3:03 PM, David L Carlson <dcarl...@tamu.edu> wrote: > The conversion seems to be controlled by the scipen setting: > >> options("scipen") > $scipen > [1] 0 >> as.character(10) > [1] "1e+05" >> options(scipen=5) >> as.character(10) > [1] "10" >> as.character(100) > [1] "100" >> as.character(1000) > [1] "1000" > > - > David L Carlson > Department of Anthropology > Texas A University > College Station, TX 77840-4352 > > > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of peter dalgaard > Sent: Tuesday, November 17, 2015 3:57 PM > To: Bert Gunter > Cc: r-help > Subject: Re: [R] Strange result when subsetting a data frame based on a > character variable > > >> On 17 Nov 2015, at 20:37 , Bert Gunter <bgunter.4...@gmail.com> wrote: >> >>> 2 == "2" >> [1] TRUE >> >> ?"==" says: >> >> "If the two arguments are atomic vectors of different types, one is >> coerced to the type of the other, the (decreasing) order of precedence >> being character, complex, numeric, integer, logical and raw." >> >>> as.character(9) >> [1] "9" >>> as.character(10) >> [1] "1e+05" >>> as.character(10) == "10" >> [1] FALSE >> > > Also notice that, for similar reasons > >> 10 > "2" > [1] FALSE > > (At least in most collations. I recently discovered that OSX Finder sorted > 2dnorm.R between 02-Probability.toc and 03-Combinatorics-2x2.pdf.) > > > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd@cbs.dk Priv: pda...@gmail.com > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange result when subsetting a data frame based on a character variable
> 2 == "2" [1] TRUE ?"==" says: "If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw." > as.character(9) [1] "9" > as.character(10) [1] "1e+05" > as.character(10) == "10" [1] FALSE Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Tue, Nov 17, 2015 at 11:14 AM, Karl Schillingwrote: > Dear all, > > I have one observation that I do not quite understand. Maybe someone > can clarify this issue for me. > > I have a data frame which I want to subset based on a grouping variable, say > "group". Actually, "group" is a numeric value, but it is saved as a > character. I give some code to generate an exemplary data frame below. > > Now, if I use > > MySubset <- subset(Data, Data$group == "..") > > everything works fine, as expected. ".." stands here for the value of group > given as a character string. > > Surprisingly, I also get a correct subsetting if I simply give the plain > numeric value of group (like MySubset <- subset(Data, Data$group == ..), AS > LONG AS this numeric value is less then 10. > > If the numeric value is 10 or larger, I get an empty subset. > > OK, I know how to avoid this situation, but I wonder what the explanation > for this for me rather strange behavior might be. > > Thank you so much for your suggestions. > > > Karl Schilling > > > # > Exemplary code for reproducing the above described problem: > > options(stringsAsFactors = F) > > # set up some data frame > value <- c(1:6) > group <- rep(c("2", "9", "10"), each = 2) > Data <- data.frame(value = value, group = group) > str(Data) > > # subset data frame based on the value of the variable "group", > # treating this value once as a character, and once as a number: > > Data20 <- subset(Data, Data$group =="2") > str(Data20) > Data20N <- subset(Data, Data$group ==2) > str(Data20N) > > > Data99 <- subset(Data, Data$group =="9") > str(Data99) > Data99N <- subset(Data, Data$group ==9) > str(Data99N) > Data100 <- subset(Data, Data$group =="10") > str(Data100) > Data100N <- subset(Data, Data$group ==10) > str(Data100N) > > -- > Karl Schilling > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange result when subsetting a data frame based on a character variable
Dear Karl, Since you compare a character with a numeric, R converts the numeric silently. And then you're into trouble. as.character(9) # "9" as.character(10) # "1e+5" Bottom line, use the same type on both sides of the binary operator. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-11-17 20:14 GMT+01:00 Karl Schilling: > Dear all, > > I have one observation that I do not quite understand. Maybe someone > can clarify this issue for me. > > I have a data frame which I want to subset based on a grouping variable, > say "group". Actually, "group" is a numeric value, but it is saved as a > character. I give some code to generate an exemplary data frame below. > > Now, if I use > > MySubset <- subset(Data, Data$group == "..") > > everything works fine, as expected. ".." stands here for the value of > group given as a character string. > > Surprisingly, I also get a correct subsetting if I simply give the plain > numeric value of group (like MySubset <- subset(Data, Data$group == ..), AS > LONG AS this numeric value is less then 10. > > If the numeric value is 10 or larger, I get an empty subset. > > OK, I know how to avoid this situation, but I wonder what the explanation > for this for me rather strange behavior might be. > > Thank you so much for your suggestions. > > > Karl Schilling > > > # > Exemplary code for reproducing the above described problem: > > options(stringsAsFactors = F) > > # set up some data frame > value <- c(1:6) > group <- rep(c("2", "9", "10"), each = 2) > Data <- data.frame(value = value, group = group) > str(Data) > > # subset data frame based on the value of the variable "group", > # treating this value once as a character, and once as a number: > > Data20 <- subset(Data, Data$group =="2") > str(Data20) > Data20N <- subset(Data, Data$group ==2) > str(Data20N) > > > Data99 <- subset(Data, Data$group =="9") > str(Data99) > Data99N <- subset(Data, Data$group ==9) > str(Data99N) > Data100 <- subset(Data, Data$group =="10") > str(Data100) > Data100N <- subset(Data, Data$group ==10) > str(Data100N) > > -- > Karl Schilling > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange result when subsetting a data frame based on a character variable
Dear Duncan, I'd rather convert the numeric to character. E.g. with sprintf() or format() in case it is a numeric vector. subset(Data, group == "10") subset(Data, group == sprintf("%.f", 10)) sprintf("%.f", 10) # "10" It requires the user to think about the format, which can reduce errors. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-11-17 21:27 GMT+01:00 Duncan Murdoch: > On 17/11/2015 2:25 PM, Duncan Murdoch wrote: > >> On 17/11/2015 2:14 PM, Karl Schilling wrote: >> > Dear all, >> > >> > I have one observation that I do not quite understand. Maybe someone >> > can clarify this issue for me. >> > >> > I have a data frame which I want to subset based on a grouping variable, >> > say "group". Actually, "group" is a numeric value, but it is saved as a >> > character. I give some code to generate an exemplary data frame below. >> > >> > Now, if I use >> > >> > MySubset <- subset(Data, Data$group == "..") >> > >> > everything works fine, as expected. ".." stands here for the value of >> > group given as a character string. >> > >> > Surprisingly, I also get a correct subsetting if I simply give the plain >> > numeric value of group (like MySubset <- subset(Data, Data$group == ..), >> > AS LONG AS this numeric value is less then 10. >> > >> > If the numeric value is 10 or larger, I get an empty subset. >> > >> > OK, I know how to avoid this situation, but I wonder what the >> > explanation for this for me rather strange behavior might be. >> > >> > Thank you so much for your suggestions. >> >> If you are comparing a character value to a numeric value, the numeric >> value is converted to character using as.character() for the >> comparison. as.character(10) or a larger number is likely not >> "10"; try it. (With the options I have on my >> computer, I get "1e+05".) >> >> If you want a numeric comparison, be explicit: >> >> subset(Data, as.numeric(Data$group) == ..) >> > > This might be bad advice. If Data$group is a factor (as it tends to be > when character data is put in a dataframe), this will use the underlying > factor code, not the visible one. You need to use > > as.numeric(as.character(Data$group)) > > to do the conversion you probably want. > > Duncan Murdoch > > >> >> Duncan Murdoch >> >> > >> > >> > Karl Schilling >> > >> > >> > # >> > Exemplary code for reproducing the above described problem: >> > >> > options(stringsAsFactors = F) >> > >> > # set up some data frame >> > value <- c(1:6) >> > group <- rep(c("2", "9", "10"), each = 2) >> > Data <- data.frame(value = value, group = group) >> > str(Data) >> > >> > # subset data frame based on the value of the variable "group", >> > # treating this value once as a character, and once as a number: >> > >> > Data20 <- subset(Data, Data$group =="2") >> > str(Data20) >> > Data20N <- subset(Data, Data$group ==2) >> > str(Data20N) >> > >> > >> > Data99 <- subset(Data, Data$group =="9") >> > str(Data99) >> > Data99N <- subset(Data, Data$group ==9) >> > str(Data99N) >> > Data100 <- subset(Data, Data$group =="10") >> > str(Data100) >> > Data100N <- subset(Data, Data$group ==10) >> > str(Data100N) >> > >> >> > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange result when subsetting a data frame based on a character variable
R silently converts the integer to a character for comparison in the subset operation. But if we explicitly do the conversion we see that it does not work with the default R settings. > as.character(10) [1] "1e+05" > as.character(9) [1] "9" -- W. Michael Conklin EVP Marketing & Data Sciences GfK T +1 763 417 4545 | M +1 612 567 8287 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Karl Schilling Sent: Tuesday, November 17, 2015 1:14 PM To: r-help@r-project.org Subject: [R] Strange result when subsetting a data frame based on a character variable Dear all, I have one observation that I do not quite understand. Maybe someone can clarify this issue for me. I have a data frame which I want to subset based on a grouping variable, say "group". Actually, "group" is a numeric value, but it is saved as a character. I give some code to generate an exemplary data frame below. Now, if I use MySubset <- subset(Data, Data$group == "..") everything works fine, as expected. ".." stands here for the value of group given as a character string. Surprisingly, I also get a correct subsetting if I simply give the plain numeric value of group (like MySubset <- subset(Data, Data$group == ..), AS LONG AS this numeric value is less then 10. If the numeric value is 10 or larger, I get an empty subset. OK, I know how to avoid this situation, but I wonder what the explanation for this for me rather strange behavior might be. Thank you so much for your suggestions. Karl Schilling # Exemplary code for reproducing the above described problem: options(stringsAsFactors = F) # set up some data frame value <- c(1:6) group <- rep(c("2", "9", "10"), each = 2) Data <- data.frame(value = value, group = group) str(Data) # subset data frame based on the value of the variable "group", # treating this value once as a character, and once as a number: Data20 <- subset(Data, Data$group =="2") str(Data20) Data20N <- subset(Data, Data$group ==2) str(Data20N) Data99 <- subset(Data, Data$group =="9") str(Data99) Data99N <- subset(Data, Data$group ==9) str(Data99N) Data100 <- subset(Data, Data$group =="10") str(Data100) Data100N <- subset(Data, Data$group ==10) str(Data100N) -- Karl Schilling __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange result when subsetting a data frame based on a character variable
Are you sure that wasn't oh-3 rather than 03? --- Jeff NewmillerThe . . Go Live... DCN:Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On November 17, 2015 1:57:15 PM PST, peter dalgaard wrote: > >> On 17 Nov 2015, at 20:37 , Bert Gunter >wrote: >> >>> 2 == "2" >> [1] TRUE >> >> ?"==" says: >> >> "If the two arguments are atomic vectors of different types, one is >> coerced to the type of the other, the (decreasing) order of >precedence >> being character, complex, numeric, integer, logical and raw." >> >>> as.character(9) >> [1] "9" >>> as.character(10) >> [1] "1e+05" >>> as.character(10) == "10" >> [1] FALSE >> > >Also notice that, for similar reasons > >> 10 > "2" >[1] FALSE > >(At least in most collations. I recently discovered that OSX Finder >sorted 2dnorm.R between 02-Probability.toc and >03-Combinatorics-2x2.pdf.) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange result when subsetting a data frame based on a character variable
> On 18 Nov 2015, at 01:59 , Jeff Newmillerwrote: > > Are you sure that wasn't oh-3 rather than 03? Sure I'm sure. I even cut+pasted the filenames from the offending dir... It's all just Apple trying to be helpful (and failing, again). O2 < 2d < O3 had been even stranger, no? -p > --- > Jeff NewmillerThe . . Go Live... > DCN: Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/BatteriesO.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --- > Sent from my phone. Please excuse my brevity. > > On November 17, 2015 1:57:15 PM PST, peter dalgaard wrote: >> >>> On 17 Nov 2015, at 20:37 , Bert Gunter >> wrote: >>> 2 == "2" >>> [1] TRUE >>> >>> ?"==" says: >>> >>> "If the two arguments are atomic vectors of different types, one is >>> coerced to the type of the other, the (decreasing) order of >> precedence >>> being character, complex, numeric, integer, logical and raw." >>> as.character(9) >>> [1] "9" as.character(10) >>> [1] "1e+05" as.character(10) == "10" >>> [1] FALSE >>> >> >> Also notice that, for similar reasons >> >>> 10 > "2" >> [1] FALSE >> >> (At least in most collations. I recently discovered that OSX Finder >> sorted 2dnorm.R between 02-Probability.toc and >> 03-Combinatorics-2x2.pdf.) > -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange result when subsetting a data frame based on a character variable
peter dalgaard wrote: > O2 < 2d < O3 had been even stranger, no? Don't give those dudes in Cupertino any more bright ideas, okay? Jim On Wed, Nov 18, 2015 at 12:11 PM, peter dalgaardwrote: > > > On 18 Nov 2015, at 01:59 , Jeff Newmiller > wrote: > > > > Are you sure that wasn't oh-3 rather than 03? > > Sure I'm sure. I even cut+pasted the filenames from the offending dir... > It's all just Apple trying to be helpful (and failing, again). > > O2 < 2d < O3 had been even stranger, no? > > -p > > > > --- > > Jeff NewmillerThe . . Go > Live... > > DCN: Basics: ##.#. ##.#. Live > Go... > > Live: OO#.. Dead: OO#.. Playing > > Research Engineer (Solar/BatteriesO.O#. #.O#. with > > /Software/Embedded Controllers) .OO#. .OO#. > rocks...1k > > > --- > > Sent from my phone. Please excuse my brevity. > > > > On November 17, 2015 1:57:15 PM PST, peter dalgaard > wrote: > >> > >>> On 17 Nov 2015, at 20:37 , Bert Gunter > >> wrote: > >>> > 2 == "2" > >>> [1] TRUE > >>> > >>> ?"==" says: > >>> > >>> "If the two arguments are atomic vectors of different types, one is > >>> coerced to the type of the other, the (decreasing) order of > >> precedence > >>> being character, complex, numeric, integer, logical and raw." > >>> > as.character(9) > >>> [1] "9" > as.character(10) > >>> [1] "1e+05" > as.character(10) == "10" > >>> [1] FALSE > >>> > >> > >> Also notice that, for similar reasons > >> > >>> 10 > "2" > >> [1] FALSE > >> > >> (At least in most collations. I recently discovered that OSX Finder > >> sorted 2dnorm.R between 02-Probability.toc and > >> 03-Combinatorics-2x2.pdf.) > > > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd@cbs.dk Priv: pda...@gmail.com > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange result when subsetting a data frame based on a character variable
On 17/11/2015 2:14 PM, Karl Schilling wrote: Dear all, I have one observation that I do not quite understand. Maybe someone can clarify this issue for me. I have a data frame which I want to subset based on a grouping variable, say "group". Actually, "group" is a numeric value, but it is saved as a character. I give some code to generate an exemplary data frame below. Now, if I use MySubset <- subset(Data, Data$group == "..") everything works fine, as expected. ".." stands here for the value of group given as a character string. Surprisingly, I also get a correct subsetting if I simply give the plain numeric value of group (like MySubset <- subset(Data, Data$group == ..), AS LONG AS this numeric value is less then 10. If the numeric value is 10 or larger, I get an empty subset. OK, I know how to avoid this situation, but I wonder what the explanation for this for me rather strange behavior might be. Thank you so much for your suggestions. If you are comparing a character value to a numeric value, the numeric value is converted to character using as.character() for the comparison. as.character(10) or a larger number is likely not "10"; try it. (With the options I have on my computer, I get "1e+05".) If you want a numeric comparison, be explicit: subset(Data, as.numeric(Data$group) == ..) Duncan Murdoch Karl Schilling # Exemplary code for reproducing the above described problem: options(stringsAsFactors = F) # set up some data frame value <- c(1:6) group <- rep(c("2", "9", "10"), each = 2) Data <- data.frame(value = value, group = group) str(Data) # subset data frame based on the value of the variable "group", # treating this value once as a character, and once as a number: Data20 <- subset(Data, Data$group =="2") str(Data20) Data20N <- subset(Data, Data$group ==2) str(Data20N) Data99 <- subset(Data, Data$group =="9") str(Data99) Data99N <- subset(Data, Data$group ==9) str(Data99N) Data100 <- subset(Data, Data$group =="10") str(Data100) Data100N <- subset(Data, Data$group ==10) str(Data100N) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange result when subsetting a data frame based on a character variable
On 17/11/2015 2:25 PM, Duncan Murdoch wrote: On 17/11/2015 2:14 PM, Karl Schilling wrote: > Dear all, > > I have one observation that I do not quite understand. Maybe someone > can clarify this issue for me. > > I have a data frame which I want to subset based on a grouping variable, > say "group". Actually, "group" is a numeric value, but it is saved as a > character. I give some code to generate an exemplary data frame below. > > Now, if I use > > MySubset <- subset(Data, Data$group == "..") > > everything works fine, as expected. ".." stands here for the value of > group given as a character string. > > Surprisingly, I also get a correct subsetting if I simply give the plain > numeric value of group (like MySubset <- subset(Data, Data$group == ..), > AS LONG AS this numeric value is less then 10. > > If the numeric value is 10 or larger, I get an empty subset. > > OK, I know how to avoid this situation, but I wonder what the > explanation for this for me rather strange behavior might be. > > Thank you so much for your suggestions. If you are comparing a character value to a numeric value, the numeric value is converted to character using as.character() for the comparison. as.character(10) or a larger number is likely not "10"; try it. (With the options I have on my computer, I get "1e+05".) If you want a numeric comparison, be explicit: subset(Data, as.numeric(Data$group) == ..) This might be bad advice. If Data$group is a factor (as it tends to be when character data is put in a dataframe), this will use the underlying factor code, not the visible one. You need to use as.numeric(as.character(Data$group)) to do the conversion you probably want. Duncan Murdoch Duncan Murdoch > > > Karl Schilling > > > # > Exemplary code for reproducing the above described problem: > > options(stringsAsFactors = F) > > # set up some data frame > value <- c(1:6) > group <- rep(c("2", "9", "10"), each = 2) > Data <- data.frame(value = value, group = group) > str(Data) > > # subset data frame based on the value of the variable "group", > # treating this value once as a character, and once as a number: > > Data20 <- subset(Data, Data$group =="2") > str(Data20) > Data20N <- subset(Data, Data$group ==2) > str(Data20N) > > > Data99 <- subset(Data, Data$group =="9") > str(Data99) > Data99N <- subset(Data, Data$group ==9) > str(Data99N) > Data100 <- subset(Data, Data$group =="10") > str(Data100) > Data100N <- subset(Data, Data$group ==10) > str(Data100N) > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange result when subsetting a data frame based on a character variable
> On 17 Nov 2015, at 20:37 , Bert Gunterwrote: > >> 2 == "2" > [1] TRUE > > ?"==" says: > > "If the two arguments are atomic vectors of different types, one is > coerced to the type of the other, the (decreasing) order of precedence > being character, complex, numeric, integer, logical and raw." > >> as.character(9) > [1] "9" >> as.character(10) > [1] "1e+05" >> as.character(10) == "10" > [1] FALSE > Also notice that, for similar reasons > 10 > "2" [1] FALSE (At least in most collations. I recently discovered that OSX Finder sorted 2dnorm.R between 02-Probability.toc and 03-Combinatorics-2x2.pdf.) -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.