Re: [R] Confusing behavior when using gsub to insert unicode character (minimal working example provided)

2014-06-04 Thread Thomas Stewart
Yep.  You are right.  That is better.
-tgs


On Thu, May 29, 2014 at 5:23 PM, Ista Zahn  wrote:

> 10Hi Thomas,
>
> On Thu, May 29, 2014 at 9:15 AM, Thomas Stewart
>  wrote:
> > Thanks to to Ista Zahn, I was able to find a work around solution.  The
> key
> > seems to be that string1 needs to be encoded as UTF-8 prior to being
> passed
> > to gsub.  For whatever reason,
> >
> > Encoding(string1) <- "UTF-8"
> >
> > does not change the encoding on my Windows machine.
>
> Right, because "ASCII strings will never be marked with a declared
> encoding" (read ?Encoding again).
>
> The work around:  I
> > paste an obvious UTF-8 character "\u00A0" to the start of the string,
> send
> > the string through gsub, then remove the "\u00A0" character from the
> output.
> >
> > string1 <- "\u00A0text X"; string1
> > Encoding(string1)
> > new_string1 <- gsub("X","\u2265",string1); new_string1
> > new_string2 <- substring(new_string1,2); new_string2
> >
> > If you know of a less hackish way to accomplish this, I'm interested to
> > hear it.
>
> Why not just set the encoding after the fact, as I suggested?
>
> string1 <- "X"; string1
> new_string1 <- gsub("X","\u2265",string1); new_string1
> Encoding(new_string1) <- "UTF-8"; new_string1
>
> Best,
> Ista
>   However, this work around is sufficient for now.
> >
> > Thanks,
> > -tgs
> >
> >
> > On Wed, May 28, 2014 at 10:25 PM, Thomas Stewart <
> tgs.public.m...@gmail.com>
> > wrote:
> >
> >> Can anyone help me understand the following behavior?
> >>
> >> I want to replace the letter 'X' in
> >> the string
> >> 'text X' with '≥' (\u226
> >> 5
> >> ).  The output from gsub is not what I expect.  It gives: "text ≥".
> >>
> >> Now, suppose I want to replace the character '≤' in
> >> the string
> >> 'text ≤' with '≥'.  Then, gsub gives the expected, desired output.
> >>
> >> What am I missing?
> >>
> >> Thanks for any insight.
> >> -tgs
> >>
> >> Minimal Working Example:
> >>
> >> string1 <- "text X"; string1
> >> new_string1 <- gsub("X","\u2265",string1); new_string1
> >>
> >> string2 <- "text \u2264"; string2
> >> new_string2 <- gsub("\u2264","\u2265",string2); new_string2
> >>
> >> charToRaw(new_string1)
> >> charToRaw(new_string2)
> >>
> >> sessionInfo()
> >>
> >> ## OUTPUT
> >>
> >> > string1 <- "text X"; string1
> >> [1] "text X"
> >>
> >> > new_string1 <- gsub("X","\u2265",string1); new_string1
> >> [1] "text ≥"
> >>
> >> > string2 <- "text \u2264"; string2
> >> [1] "text ≤"
> >>
> >> > new_string2 <- gsub("\u2264","\u2265",string2); new_string2
> >> [1] "text ≥"
> >>
> >> > charToRaw(new_string1)
> >> [1] 74 65 78 74 20 e2 89 a5
> >>
> >> > charToRaw(new_string2)
> >> [1] 74 65 78 74 20 e2 89 a5
> >>
> >> > sessionInfo()
> >> R version 3.0.2 (2013-09-25)
> >> Platform: x86_64-w64-mingw32/x64 (64-bit)
> >>
> >> locale:
> >> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> >> States.1252LC_MONETARY=English_United States.1252
> >> [4] LC_NUMERIC=C   LC_TIME=English_United
> >> States.1252
> >>
> >> attached base packages:
> >> [1] stats graphics  grDevices utils datasets  methods   base
> >>
> >> loaded via a namespace (and not attached):
> >> [1] tools_3.0.2
> >>
> >
> > [[alternative HTML version deleted]]
> >
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Confusing behavior when using gsub to insert unicode character (minimal working example provided)

2014-05-29 Thread Ista Zahn
10Hi Thomas,

On Thu, May 29, 2014 at 9:15 AM, Thomas Stewart
 wrote:
> Thanks to to Ista Zahn, I was able to find a work around solution.  The key
> seems to be that string1 needs to be encoded as UTF-8 prior to being passed
> to gsub.  For whatever reason,
>
> Encoding(string1) <- "UTF-8"
>
> does not change the encoding on my Windows machine.

Right, because "ASCII strings will never be marked with a declared
encoding" (read ?Encoding again).

The work around:  I
> paste an obvious UTF-8 character "\u00A0" to the start of the string, send
> the string through gsub, then remove the "\u00A0" character from the output.
>
> string1 <- "\u00A0text X"; string1
> Encoding(string1)
> new_string1 <- gsub("X","\u2265",string1); new_string1
> new_string2 <- substring(new_string1,2); new_string2
>
> If you know of a less hackish way to accomplish this, I'm interested to
> hear it.

Why not just set the encoding after the fact, as I suggested?

string1 <- "X"; string1
new_string1 <- gsub("X","\u2265",string1); new_string1
Encoding(new_string1) <- "UTF-8"; new_string1

Best,
Ista
  However, this work around is sufficient for now.
>
> Thanks,
> -tgs
>
>
> On Wed, May 28, 2014 at 10:25 PM, Thomas Stewart 
> wrote:
>
>> Can anyone help me understand the following behavior?
>>
>> I want to replace the letter 'X' in
>> the string
>> 'text X' with '≥' (\u226
>> 5
>> ).  The output from gsub is not what I expect.  It gives: "text ≥".
>>
>> Now, suppose I want to replace the character '≤' in
>> the string
>> 'text ≤' with '≥'.  Then, gsub gives the expected, desired output.
>>
>> What am I missing?
>>
>> Thanks for any insight.
>> -tgs
>>
>> Minimal Working Example:
>>
>> string1 <- "text X"; string1
>> new_string1 <- gsub("X","\u2265",string1); new_string1
>>
>> string2 <- "text \u2264"; string2
>> new_string2 <- gsub("\u2264","\u2265",string2); new_string2
>>
>> charToRaw(new_string1)
>> charToRaw(new_string2)
>>
>> sessionInfo()
>>
>> ## OUTPUT
>>
>> > string1 <- "text X"; string1
>> [1] "text X"
>>
>> > new_string1 <- gsub("X","\u2265",string1); new_string1
>> [1] "text ≥"
>>
>> > string2 <- "text \u2264"; string2
>> [1] "text ≤"
>>
>> > new_string2 <- gsub("\u2264","\u2265",string2); new_string2
>> [1] "text ≥"
>>
>> > charToRaw(new_string1)
>> [1] 74 65 78 74 20 e2 89 a5
>>
>> > charToRaw(new_string2)
>> [1] 74 65 78 74 20 e2 89 a5
>>
>> > sessionInfo()
>> R version 3.0.2 (2013-09-25)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
>> States.1252LC_MONETARY=English_United States.1252
>> [4] LC_NUMERIC=C   LC_TIME=English_United
>> States.1252
>>
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods   base
>>
>> loaded via a namespace (and not attached):
>> [1] tools_3.0.2
>>
>
> [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Confusing behavior when using gsub to insert unicode character (minimal working example provided)

2014-05-29 Thread Thomas Stewart
Thanks to to Ista Zahn, I was able to find a work around solution.  The key
seems to be that string1 needs to be encoded as UTF-8 prior to being passed
to gsub.  For whatever reason,

Encoding(string1) <- "UTF-8"

does not change the encoding on my Windows machine.  The work around:  I
paste an obvious UTF-8 character "\u00A0" to the start of the string, send
the string through gsub, then remove the "\u00A0" character from the output.

string1 <- "\u00A0text X"; string1
Encoding(string1)
new_string1 <- gsub("X","\u2265",string1); new_string1
new_string2 <- substring(new_string1,2); new_string2

If you know of a less hackish way to accomplish this, I'm interested to
hear it.  However, this work around is sufficient for now.

Thanks,
-tgs


On Wed, May 28, 2014 at 10:25 PM, Thomas Stewart 
wrote:

> Can anyone help me understand the following behavior?
>
> I want to replace the letter 'X' in
> ​the string ​
> 'text X' with '≥' (\u226
> ​5
> ).  The output from gsub is not what I expect.  It gives: "text ≥".
>
> Now, suppose I want to replace the character '≤' in
> ​ the string​
> 'text ≤' with '≥'.  Then, gsub gives the expected, desired output.
>
> ​What am I missing?
>
> Thanks for any insight.
> -tgs
>
> Minimal Working Example:
>
> string1 <- "text X"; string1
> new_string1 <- gsub("X","\u2265",string1); new_string1
>
> string2 <- "text \u2264"; string2
> new_string2 <- gsub("\u2264","\u2265",string2); new_string2
>
> charToRaw(new_string1)
> charToRaw(new_string2)
>
> sessionInfo()
>
> ## OUTPUT
>
> > string1 <- "text X"; string1
> [1] "text X"
>
> > new_string1 <- gsub("X","\u2265",string1); new_string1
> [1] "text ≥"
>
> > string2 <- "text \u2264"; string2
> [1] "text ≤"
>
> > new_string2 <- gsub("\u2264","\u2265",string2); new_string2
> [1] "text ≥"
>
> > charToRaw(new_string1)
> [1] 74 65 78 74 20 e2 89 a5
>
> > charToRaw(new_string2)
> [1] 74 65 78 74 20 e2 89 a5
>
> > sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C   LC_TIME=English_United
> States.1252
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] tools_3.0.2
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Confusing behavior when using gsub to insert unicode character (minimal working example provided)

2014-05-28 Thread David Winsemius

On May 28, 2014, at 7:25 PM, Thomas Stewart wrote:

> Can anyone help me understand the following behavior?
> 
> I want to replace the letter 'X' in
> ​the string ​
> 'text X' with '≥' (\u226
> ​5
> ).  The output from gsub is not what I expect.  It gives: "text ≥".
> 
> Now, suppose I want to replace the character '≤' in
> ​ the string​
> 'text ≤' with '≥'.  Then, gsub gives the expected, desired output.
> 
> ​What am I missing?
> 
> Thanks for any insight.
> -tgs
> 
> Minimal Working Example:
> 
> string1 <- "text X"; string1
> new_string1 <- gsub("X","\u2265",string1); new_string1

Try this instead:

> new_string1 <- gsub("X","\\\u2265",string1); new_string1
[1] "text ≥"

Each "\" needs to be escaped, both the "\" in \u2265 as well as the "\" that 
escapes it.

> nchar("\\")
[1] 1
> nchar("\\\u2265")
[1] 2

You would be well-served by spending effort at reading:

?Quotes

-- 
David.
> 
> string2 <- "text \u2264"; string2
> new_string2 <- gsub("\u2264","\u2265",string2); new_string2
> 
> charToRaw(new_string1)
> charToRaw(new_string2)
> 
> sessionInfo()
> 
> ## OUTPUT
> 
>> string1 <- "text X"; string1
> [1] "text X"
> 
>> new_string1 <- gsub("X","\u2265",string1); new_string1
> [1] "text ≥"
> 
>> string2 <- "text \u2264"; string2
> [1] "text ≤"
> 
>> new_string2 <- gsub("\u2264","\u2265",string2); new_string2
> [1] "text ≥"
> 
>> charToRaw(new_string1)
> [1] 74 65 78 74 20 e2 89 a5


> charToRaw("\\\u2265")
[1] 5c e2 89 a5



> 
>> charToRaw(new_string2)
> [1] 74 65 78 74 20 e2 89 a5
> 
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> 

It was a good idea to post sessionInfo(), but it would have been even better to 
have posted in plain text.


>   [[alternative HTML version deleted]]
> 
-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Confusing behavior when using gsub to insert unicode character (minimal working example provided)

2014-05-28 Thread Thomas Stewart
Can anyone help me understand the following behavior?

I want to replace the letter 'X' in
​the string ​
'text X' with '≥' (\u226
​5
).  The output from gsub is not what I expect.  It gives: "text ≥".

Now, suppose I want to replace the character '≤' in
​ the string​
'text ≤' with '≥'.  Then, gsub gives the expected, desired output.

​What am I missing?

Thanks for any insight.
-tgs

Minimal Working Example:

string1 <- "text X"; string1
new_string1 <- gsub("X","\u2265",string1); new_string1

string2 <- "text \u2264"; string2
new_string2 <- gsub("\u2264","\u2265",string2); new_string2

charToRaw(new_string1)
charToRaw(new_string2)

sessionInfo()

## OUTPUT

> string1 <- "text X"; string1
[1] "text X"

> new_string1 <- gsub("X","\u2265",string1); new_string1
[1] "text ≥"

> string2 <- "text \u2264"; string2
[1] "text ≤"

> new_string2 <- gsub("\u2264","\u2265",string2); new_string2
[1] "text ≥"

> charToRaw(new_string1)
[1] 74 65 78 74 20 e2 89 a5

> charToRaw(new_string2)
[1] 74 65 78 74 20 e2 89 a5

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C   LC_TIME=English_United
States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.0.2

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.