Re: [R] Strange result when subsetting a data frame based on a character variable

2015-11-17 Thread David L Carlson
The conversion seems to be controlled by the scipen setting:

> options("scipen")
$scipen
[1] 0
> as.character(10)
[1] "1e+05"
> options(scipen=5)
> as.character(10)
[1] "10"
> as.character(100)
[1] "100"
> as.character(1000)
[1] "1000"

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of peter dalgaard
Sent: Tuesday, November 17, 2015 3:57 PM
To: Bert Gunter
Cc: r-help
Subject: Re: [R] Strange result when subsetting a data frame based on a 
character variable


> On 17 Nov 2015, at 20:37 , Bert Gunter <bgunter.4...@gmail.com> wrote:
> 
>> 2 == "2"
> [1] TRUE
> 
> ?"=="  says:
> 
> "If the two arguments are atomic vectors of different types, one is
> coerced to the type of the other, the (decreasing) order of precedence
> being character, complex, numeric, integer, logical and raw."
> 
>> as.character(9)
> [1] "9"
>> as.character(10)
> [1] "1e+05"
>> as.character(10) == "10"
> [1] FALSE
> 

Also notice that, for similar reasons

> 10 > "2"
[1] FALSE

(At least in most collations. I recently discovered that OSX Finder sorted 
2dnorm.R between 02-Probability.toc and 03-Combinatorics-2x2.pdf.)   



-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange result when subsetting a data frame based on a character variable

2015-11-17 Thread Bert Gunter
Thanks, David.

Probably as one should expect.

But reinforces what others said about first doing explicit conversions
so that comparisons are not made made between differing types.

Cheers,
Bert


Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Tue, Nov 17, 2015 at 3:03 PM, David L Carlson <dcarl...@tamu.edu> wrote:
> The conversion seems to be controlled by the scipen setting:
>
>> options("scipen")
> $scipen
> [1] 0
>> as.character(10)
> [1] "1e+05"
>> options(scipen=5)
>> as.character(10)
> [1] "10"
>> as.character(100)
> [1] "100"
>> as.character(1000)
> [1] "1000"
>
> -
> David L Carlson
> Department of Anthropology
> Texas A University
> College Station, TX 77840-4352
>
>
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of peter dalgaard
> Sent: Tuesday, November 17, 2015 3:57 PM
> To: Bert Gunter
> Cc: r-help
> Subject: Re: [R] Strange result when subsetting a data frame based on a 
> character variable
>
>
>> On 17 Nov 2015, at 20:37 , Bert Gunter <bgunter.4...@gmail.com> wrote:
>>
>>> 2 == "2"
>> [1] TRUE
>>
>> ?"=="  says:
>>
>> "If the two arguments are atomic vectors of different types, one is
>> coerced to the type of the other, the (decreasing) order of precedence
>> being character, complex, numeric, integer, logical and raw."
>>
>>> as.character(9)
>> [1] "9"
>>> as.character(10)
>> [1] "1e+05"
>>> as.character(10) == "10"
>> [1] FALSE
>>
>
> Also notice that, for similar reasons
>
>> 10 > "2"
> [1] FALSE
>
> (At least in most collations. I recently discovered that OSX Finder sorted 
> 2dnorm.R between 02-Probability.toc and 03-Combinatorics-2x2.pdf.)
>
>
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd@cbs.dk  Priv: pda...@gmail.com
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange result when subsetting a data frame based on a character variable

2015-11-17 Thread Bert Gunter
> 2 == "2"
[1] TRUE

?"=="  says:

"If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw."

> as.character(9)
[1] "9"
> as.character(10)
[1] "1e+05"
> as.character(10) == "10"
[1] FALSE


Cheers,
Bert




Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Tue, Nov 17, 2015 at 11:14 AM, Karl Schilling
 wrote:
> Dear all,
>
> I have one observation that I do not quite understand. Maybe someone
> can clarify this issue for me.
>
> I have a data frame which I want to subset based on a grouping variable, say
> "group". Actually, "group" is a numeric value, but it is saved as a
> character. I give some code to generate an exemplary data frame below.
>
> Now, if I use
>
> MySubset <- subset(Data, Data$group == "..")
>
> everything works fine, as expected. ".." stands here for the value of group
> given as a character string.
>
> Surprisingly, I also get a correct subsetting if I simply give the plain
> numeric value of group (like MySubset <- subset(Data, Data$group == ..), AS
> LONG AS this numeric value is less then 10.
>
> If the numeric value is 10 or larger, I get an empty subset.
>
> OK, I know how to avoid this situation, but I wonder what the explanation
> for this for me rather strange behavior might be.
>
> Thank you so much for your suggestions.
>
>
> Karl Schilling
>
>
> #
> Exemplary code for reproducing the above described problem:
>
> options(stringsAsFactors = F)
>
> # set up some data frame
> value <- c(1:6)
> group <- rep(c("2", "9", "10"), each = 2)
> Data <- data.frame(value = value, group = group)
> str(Data)
>
> # subset data frame based on the value of the variable "group",
> # treating this value once as a character, and once as a number:
>
> Data20 <- subset(Data, Data$group =="2")
> str(Data20)
> Data20N <- subset(Data, Data$group ==2)
> str(Data20N)
>
>
> Data99 <- subset(Data, Data$group =="9")
> str(Data99)
> Data99N <- subset(Data, Data$group ==9)
> str(Data99N)
> Data100 <- subset(Data, Data$group =="10")
> str(Data100)
> Data100N <- subset(Data, Data$group ==10)
> str(Data100N)
>
> --
> Karl Schilling
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange result when subsetting a data frame based on a character variable

2015-11-17 Thread Thierry Onkelinx
Dear Karl,

Since you compare a character with a numeric, R converts the numeric
silently. And then you're into trouble.

as.character(9) # "9"
as.character(10) # "1e+5"

Bottom line, use the same type on both sides of the binary operator.

Best regards,


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2015-11-17 20:14 GMT+01:00 Karl Schilling :

> Dear all,
>
> I have one observation that I do not quite understand. Maybe someone
> can clarify this issue for me.
>
> I have a data frame which I want to subset based on a grouping variable,
> say "group". Actually, "group" is a numeric value, but it is saved as a
> character. I give some code to generate an exemplary data frame below.
>
> Now, if I use
>
> MySubset <- subset(Data, Data$group == "..")
>
> everything works fine, as expected. ".." stands here for the value of
> group given as a character string.
>
> Surprisingly, I also get a correct subsetting if I simply give the plain
> numeric value of group (like MySubset <- subset(Data, Data$group == ..), AS
> LONG AS this numeric value is less then 10.
>
> If the numeric value is 10 or larger, I get an empty subset.
>
> OK, I know how to avoid this situation, but I wonder what the explanation
> for this for me rather strange behavior might be.
>
> Thank you so much for your suggestions.
>
>
> Karl Schilling
>
>
> #
> Exemplary code for reproducing the above described problem:
>
> options(stringsAsFactors = F)
>
> # set up some data frame
> value <- c(1:6)
> group <- rep(c("2", "9", "10"), each = 2)
> Data <- data.frame(value = value, group = group)
> str(Data)
>
> # subset data frame based on the value of the variable "group",
> # treating this value once as a character, and once as a number:
>
> Data20 <- subset(Data, Data$group =="2")
> str(Data20)
> Data20N <- subset(Data, Data$group ==2)
> str(Data20N)
>
>
> Data99 <- subset(Data, Data$group =="9")
> str(Data99)
> Data99N <- subset(Data, Data$group ==9)
> str(Data99N)
> Data100 <- subset(Data, Data$group =="10")
> str(Data100)
> Data100N <- subset(Data, Data$group ==10)
> str(Data100N)
>
> --
> Karl Schilling
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange result when subsetting a data frame based on a character variable

2015-11-17 Thread Thierry Onkelinx
Dear Duncan,

I'd rather convert the numeric to character. E.g. with sprintf() or
format() in case it is a numeric vector.

subset(Data, group == "10")
subset(Data, group == sprintf("%.f", 10))

sprintf("%.f", 10) # "10"

It requires the user to think about the format, which can reduce errors.

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2015-11-17 21:27 GMT+01:00 Duncan Murdoch :

> On 17/11/2015 2:25 PM, Duncan Murdoch wrote:
>
>> On 17/11/2015 2:14 PM, Karl Schilling wrote:
>> > Dear all,
>> >
>> > I have one observation that I do not quite understand. Maybe someone
>> > can clarify this issue for me.
>> >
>> > I have a data frame which I want to subset based on a grouping variable,
>> > say "group". Actually, "group" is a numeric value, but it is saved as a
>> > character. I give some code to generate an exemplary data frame below.
>> >
>> > Now, if I use
>> >
>> > MySubset <- subset(Data, Data$group == "..")
>> >
>> > everything works fine, as expected. ".." stands here for the value of
>> > group given as a character string.
>> >
>> > Surprisingly, I also get a correct subsetting if I simply give the plain
>> > numeric value of group (like MySubset <- subset(Data, Data$group == ..),
>> > AS LONG AS this numeric value is less then 10.
>> >
>> > If the numeric value is 10 or larger, I get an empty subset.
>> >
>> > OK, I know how to avoid this situation, but I wonder what the
>> > explanation for this for me rather strange behavior might be.
>> >
>> > Thank you so much for your suggestions.
>>
>> If you are comparing a character value to a numeric value, the numeric
>> value is converted to character using as.character() for the
>> comparison.  as.character(10) or a larger number is likely not
>> "10"; try it.  (With the options I have on my
>> computer, I get "1e+05".)
>>
>> If you want a numeric comparison, be explicit:
>>
>> subset(Data, as.numeric(Data$group) == ..)
>>
>
> This might be bad advice.  If Data$group is a factor (as it tends to be
> when character data is put in a dataframe), this will use the underlying
> factor code, not the visible one.  You need to use
>
> as.numeric(as.character(Data$group))
>
> to do the conversion you probably want.
>
> Duncan Murdoch
>
>
>>
>> Duncan Murdoch
>>
>> >
>> >
>> > Karl Schilling
>> >
>> >
>> > #
>> > Exemplary code for reproducing the above described problem:
>> >
>> > options(stringsAsFactors = F)
>> >
>> > # set up some data frame
>> > value <- c(1:6)
>> > group <- rep(c("2", "9", "10"), each = 2)
>> > Data <- data.frame(value = value, group = group)
>> > str(Data)
>> >
>> > # subset data frame based on the value of the variable "group",
>> > # treating this value once as a character, and once as a number:
>> >
>> > Data20 <- subset(Data, Data$group =="2")
>> > str(Data20)
>> > Data20N <- subset(Data, Data$group ==2)
>> > str(Data20N)
>> >
>> >
>> > Data99 <- subset(Data, Data$group =="9")
>> > str(Data99)
>> > Data99N <- subset(Data, Data$group ==9)
>> > str(Data99N)
>> > Data100 <- subset(Data, Data$group =="10")
>> > str(Data100)
>> > Data100N <- subset(Data, Data$group ==10)
>> > str(Data100N)
>> >
>>
>>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange result when subsetting a data frame based on a character variable

2015-11-17 Thread Conklin, Mike (GfK)
R silently converts the integer to a character for comparison in the subset 
operation.  But if we explicitly do the conversion we see that it does not work 
with the default R settings.

> as.character(10)
[1] "1e+05"
> as.character(9)
[1] "9"


--
W. Michael Conklin
EVP Marketing & Data Sciences
GfK 
T +1 763 417 4545 | M +1 612 567 8287 


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Karl Schilling
Sent: Tuesday, November 17, 2015 1:14 PM
To: r-help@r-project.org
Subject: [R] Strange result when subsetting a data frame based on a character 
variable

Dear all,

I have one observation that I do not quite understand. Maybe someone can 
clarify this issue for me.

I have a data frame which I want to subset based on a grouping variable, say 
"group". Actually, "group" is a numeric value, but it is saved as a character. 
I give some code to generate an exemplary data frame below.

Now, if I use

MySubset <- subset(Data, Data$group == "..")

everything works fine, as expected. ".." stands here for the value of group 
given as a character string.

Surprisingly, I also get a correct subsetting if I simply give the plain 
numeric value of group (like MySubset <- subset(Data, Data$group == ..), AS 
LONG AS this numeric value is less then 10.

If the numeric value is 10 or larger, I get an empty subset.

OK, I know how to avoid this situation, but I wonder what the explanation for 
this for me rather strange behavior might be.

Thank you so much for your suggestions.


Karl Schilling


#
Exemplary code for reproducing the above described problem:

options(stringsAsFactors = F)

# set up some data frame
value <- c(1:6)
group <- rep(c("2", "9", "10"), each = 2) Data <- data.frame(value 
= value, group = group)
str(Data)

# subset data frame based on the value of the variable "group", # treating this 
value once as a character, and once as a number:

Data20 <- subset(Data, Data$group =="2")
str(Data20)
Data20N <- subset(Data, Data$group ==2)
str(Data20N)


Data99 <- subset(Data, Data$group =="9")
str(Data99)
Data99N <- subset(Data, Data$group ==9)
str(Data99N)
Data100 <- subset(Data, Data$group =="10")
str(Data100)
Data100N <- subset(Data, Data$group ==10)
str(Data100N)

--
Karl Schilling

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange result when subsetting a data frame based on a character variable

2015-11-17 Thread Jeff Newmiller
Are you sure that wasn't oh-3 rather than 03?
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On November 17, 2015 1:57:15 PM PST, peter dalgaard  wrote:
>
>> On 17 Nov 2015, at 20:37 , Bert Gunter 
>wrote:
>> 
>>> 2 == "2"
>> [1] TRUE
>> 
>> ?"=="  says:
>> 
>> "If the two arguments are atomic vectors of different types, one is
>> coerced to the type of the other, the (decreasing) order of
>precedence
>> being character, complex, numeric, integer, logical and raw."
>> 
>>> as.character(9)
>> [1] "9"
>>> as.character(10)
>> [1] "1e+05"
>>> as.character(10) == "10"
>> [1] FALSE
>> 
>
>Also notice that, for similar reasons
>
>> 10 > "2"
>[1] FALSE
>
>(At least in most collations. I recently discovered that OSX Finder
>sorted 2dnorm.R between 02-Probability.toc and
>03-Combinatorics-2x2.pdf.)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange result when subsetting a data frame based on a character variable

2015-11-17 Thread peter dalgaard

> On 18 Nov 2015, at 01:59 , Jeff Newmiller  wrote:
> 
> Are you sure that wasn't oh-3 rather than 03?

Sure I'm sure. I even cut+pasted the filenames from the offending dir... It's 
all just Apple trying to be helpful (and failing, again).

O2 < 2d < O3 had been even stranger, no?

-p

> ---
> Jeff NewmillerThe .   .  Go Live...
> DCN:Basics: ##.#.   ##.#.  Live Go...
>  Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
> /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
> --- 
> Sent from my phone. Please excuse my brevity.
> 
> On November 17, 2015 1:57:15 PM PST, peter dalgaard  wrote:
>> 
>>> On 17 Nov 2015, at 20:37 , Bert Gunter 
>> wrote:
>>> 
 2 == "2"
>>> [1] TRUE
>>> 
>>> ?"=="  says:
>>> 
>>> "If the two arguments are atomic vectors of different types, one is
>>> coerced to the type of the other, the (decreasing) order of
>> precedence
>>> being character, complex, numeric, integer, logical and raw."
>>> 
 as.character(9)
>>> [1] "9"
 as.character(10)
>>> [1] "1e+05"
 as.character(10) == "10"
>>> [1] FALSE
>>> 
>> 
>> Also notice that, for similar reasons
>> 
>>> 10 > "2"
>> [1] FALSE
>> 
>> (At least in most collations. I recently discovered that OSX Finder
>> sorted 2dnorm.R between 02-Probability.toc and
>> 03-Combinatorics-2x2.pdf.)   
> 

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange result when subsetting a data frame based on a character variable

2015-11-17 Thread Jim Lemon
peter dalgaard wrote:

> O2 < 2d < O3 had been even stranger, no?

Don't give those dudes in Cupertino any more bright ideas, okay?

Jim

On Wed, Nov 18, 2015 at 12:11 PM, peter dalgaard  wrote:

>
> > On 18 Nov 2015, at 01:59 , Jeff Newmiller 
> wrote:
> >
> > Are you sure that wasn't oh-3 rather than 03?
>
> Sure I'm sure. I even cut+pasted the filenames from the offending dir...
> It's all just Apple trying to be helpful (and failing, again).
>
> O2 < 2d < O3 had been even stranger, no?
>
> -p
>
> >
> ---
> > Jeff NewmillerThe .   .  Go
> Live...
> > DCN:Basics: ##.#.   ##.#.  Live
> Go...
> >  Live:   OO#.. Dead: OO#..  Playing
> > Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
> > /Software/Embedded Controllers)   .OO#.   .OO#.
> rocks...1k
> >
> ---
> > Sent from my phone. Please excuse my brevity.
> >
> > On November 17, 2015 1:57:15 PM PST, peter dalgaard 
> wrote:
> >>
> >>> On 17 Nov 2015, at 20:37 , Bert Gunter 
> >> wrote:
> >>>
>  2 == "2"
> >>> [1] TRUE
> >>>
> >>> ?"=="  says:
> >>>
> >>> "If the two arguments are atomic vectors of different types, one is
> >>> coerced to the type of the other, the (decreasing) order of
> >> precedence
> >>> being character, complex, numeric, integer, logical and raw."
> >>>
>  as.character(9)
> >>> [1] "9"
>  as.character(10)
> >>> [1] "1e+05"
>  as.character(10) == "10"
> >>> [1] FALSE
> >>>
> >>
> >> Also notice that, for similar reasons
> >>
> >>> 10 > "2"
> >> [1] FALSE
> >>
> >> (At least in most collations. I recently discovered that OSX Finder
> >> sorted 2dnorm.R between 02-Probability.toc and
> >> 03-Combinatorics-2x2.pdf.)
> >
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd@cbs.dk  Priv: pda...@gmail.com
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange result when subsetting a data frame based on a character variable

2015-11-17 Thread Duncan Murdoch

On 17/11/2015 2:14 PM, Karl Schilling wrote:

Dear all,

I have one observation that I do not quite understand. Maybe someone
can clarify this issue for me.

I have a data frame which I want to subset based on a grouping variable,
say "group". Actually, "group" is a numeric value, but it is saved as a
character. I give some code to generate an exemplary data frame below.

Now, if I use

MySubset <- subset(Data, Data$group == "..")

everything works fine, as expected. ".." stands here for the value of
group given as a character string.

Surprisingly, I also get a correct subsetting if I simply give the plain
numeric value of group (like MySubset <- subset(Data, Data$group == ..),
AS LONG AS this numeric value is less then 10.

If the numeric value is 10 or larger, I get an empty subset.

OK, I know how to avoid this situation, but I wonder what the
explanation for this for me rather strange behavior might be.

Thank you so much for your suggestions.


If you are comparing a character value to a numeric value, the numeric 
value is converted to character using as.character() for the 
comparison.  as.character(10) or a larger number is likely not 
"10"; try it.  (With the options I have on my

computer, I get "1e+05".)

If you want a numeric comparison, be explicit:

subset(Data, as.numeric(Data$group) == ..)


Duncan Murdoch




Karl Schilling


#
Exemplary code for reproducing the above described problem:

options(stringsAsFactors = F)

# set up some data frame
value <- c(1:6)
group <- rep(c("2", "9", "10"), each = 2)
Data <- data.frame(value = value, group = group)
str(Data)

# subset data frame based on the value of the variable "group",
# treating this value once as a character, and once as a number:

Data20 <- subset(Data, Data$group =="2")
str(Data20)
Data20N <- subset(Data, Data$group ==2)
str(Data20N)


Data99 <- subset(Data, Data$group =="9")
str(Data99)
Data99N <- subset(Data, Data$group ==9)
str(Data99N)
Data100 <- subset(Data, Data$group =="10")
str(Data100)
Data100N <- subset(Data, Data$group ==10)
str(Data100N)



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange result when subsetting a data frame based on a character variable

2015-11-17 Thread Duncan Murdoch

On 17/11/2015 2:25 PM, Duncan Murdoch wrote:

On 17/11/2015 2:14 PM, Karl Schilling wrote:
> Dear all,
>
> I have one observation that I do not quite understand. Maybe someone
> can clarify this issue for me.
>
> I have a data frame which I want to subset based on a grouping variable,
> say "group". Actually, "group" is a numeric value, but it is saved as a
> character. I give some code to generate an exemplary data frame below.
>
> Now, if I use
>
> MySubset <- subset(Data, Data$group == "..")
>
> everything works fine, as expected. ".." stands here for the value of
> group given as a character string.
>
> Surprisingly, I also get a correct subsetting if I simply give the plain
> numeric value of group (like MySubset <- subset(Data, Data$group == ..),
> AS LONG AS this numeric value is less then 10.
>
> If the numeric value is 10 or larger, I get an empty subset.
>
> OK, I know how to avoid this situation, but I wonder what the
> explanation for this for me rather strange behavior might be.
>
> Thank you so much for your suggestions.

If you are comparing a character value to a numeric value, the numeric
value is converted to character using as.character() for the
comparison.  as.character(10) or a larger number is likely not
"10"; try it.  (With the options I have on my
computer, I get "1e+05".)

If you want a numeric comparison, be explicit:

subset(Data, as.numeric(Data$group) == ..)


This might be bad advice.  If Data$group is a factor (as it tends to be 
when character data is put in a dataframe), this will use the underlying 
factor code, not the visible one.  You need to use


as.numeric(as.character(Data$group))

to do the conversion you probably want.

Duncan Murdoch



Duncan Murdoch

>
>
> Karl Schilling
>
>
> #
> Exemplary code for reproducing the above described problem:
>
> options(stringsAsFactors = F)
>
> # set up some data frame
> value <- c(1:6)
> group <- rep(c("2", "9", "10"), each = 2)
> Data <- data.frame(value = value, group = group)
> str(Data)
>
> # subset data frame based on the value of the variable "group",
> # treating this value once as a character, and once as a number:
>
> Data20 <- subset(Data, Data$group =="2")
> str(Data20)
> Data20N <- subset(Data, Data$group ==2)
> str(Data20N)
>
>
> Data99 <- subset(Data, Data$group =="9")
> str(Data99)
> Data99N <- subset(Data, Data$group ==9)
> str(Data99N)
> Data100 <- subset(Data, Data$group =="10")
> str(Data100)
> Data100N <- subset(Data, Data$group ==10)
> str(Data100N)
>



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange result when subsetting a data frame based on a character variable

2015-11-17 Thread peter dalgaard

> On 17 Nov 2015, at 20:37 , Bert Gunter  wrote:
> 
>> 2 == "2"
> [1] TRUE
> 
> ?"=="  says:
> 
> "If the two arguments are atomic vectors of different types, one is
> coerced to the type of the other, the (decreasing) order of precedence
> being character, complex, numeric, integer, logical and raw."
> 
>> as.character(9)
> [1] "9"
>> as.character(10)
> [1] "1e+05"
>> as.character(10) == "10"
> [1] FALSE
> 

Also notice that, for similar reasons

> 10 > "2"
[1] FALSE

(At least in most collations. I recently discovered that OSX Finder sorted 
2dnorm.R between 02-Probability.toc and 03-Combinatorics-2x2.pdf.)   



-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.