[R] Obtain the hex code for a given character.

2014-02-04 Thread Rolf Turner



If I have a character such as "£" stored in a object called "xxx", how 
can I obtain the hex code representation of this character?  In this 
case I know that the hex code is "\u00A3", but if I didn't, how would I 
find out?


I would like a function "foo()" such that foo(xxx) would return, say, 
the string "00A3".


I have googled and otherwise searched around and have come up with 
nothing that seemed at all helpful to me.  If I am missing something 
obvious, please point me at it.


(I have found a table on the web, which contains the information that I 
need, but it is only accessible "by eye" as far as I can discern.)


Supplementary question:  Suppose I have the string "00A3" stored in
an object called "yyy".  How do I put that string together with "\u"
so as to obtain "£"?  I thought I could do

xxx <- paste("\u",yyy,sep="")

but R won't let me use "\u" "without hex digits".  How can I get around 
this?


Thanks.

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Obtain the hex code for a given character.

2014-02-04 Thread Duncan Murdoch

On 14-02-04 7:57 PM, Rolf Turner wrote:



If I have a character such as "£" stored in a object called "xxx", how
can I obtain the hex code representation of this character?  In this
case I know that the hex code is "\u00A3", but if I didn't, how would I
find out?


charToRaw will give you the bytes used to store it:

> charToRaw("£")
[1] c2 a3

That was on MacOS, which uses UTF-8 encoding.  On Windows, using Latin1,

> charToRaw("£")
[1] a3

You won't see 00A3, because that's not an encoding that R uses, that's 
the Unicode "code point".  It's not too hard to get to that from the 
UTF-8 encoding, but I don't know any R function that does it.




I would like a function "foo()" such that foo(xxx) would return, say,
the string "00A3".


I don't know how to get that string, but as.character(charToRaw(x)) will 
put the bytes for x in strings, e.g.


as.character(charToRaw("£"))

gives

[1] "c2" "a3"

on a Mac.



I have googled and otherwise searched around and have come up with
nothing that seemed at all helpful to me.  If I am missing something
obvious, please point me at it.

(I have found a table on the web, which contains the information that I
need, but it is only accessible "by eye" as far as I can discern.)

Supplementary question:  Suppose I have the string "00A3" stored in
an object called "yyy".  How do I put that string together with "\u"
so as to obtain "£"?  I thought I could do

xxx <- paste("\u",yyy,sep="")

but R won't let me use "\u" "without hex digits".  How can I get around
this?


The \u notation with a code point is handled by the R parser, so you 
need to parse that string, which means putting it in quotes first, e.g.


xxx <- eval(parse(text = paste0("'\\u", yyy, "'")))

That seems pretty excessive.  You'd probably be better off doing all of 
this in C instead...


Duncan Murdoch



Thanks.

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Obtain the hex code for a given character.

2014-02-04 Thread Jim Lemon

On 02/05/2014 01:01 PM, Duncan Murdoch wrote:

On 14-02-04 7:57 PM, Rolf Turner wrote:



If I have a character such as "£" stored in a object called "xxx", how
can I obtain the hex code representation of this character? In this
case I know that the hex code is "\u00A3", but if I didn't, how would I
find out?


charToRaw will give you the bytes used to store it:

 > charToRaw("£")
[1] c2 a3

That was on MacOS, which uses UTF-8 encoding. On Windows, using Latin1,

 > charToRaw("£")
[1] a3

You won't see 00A3, because that's not an encoding that R uses, that's
the Unicode "code point". It's not too hard to get to that from the
UTF-8 encoding, but I don't know any R function that does it.



I would like a function "foo()" such that foo(xxx) would return, say,
the string "00A3".


I don't know how to get that string, but as.character(charToRaw(x)) will
put the bytes for x in strings, e.g.

as.character(charToRaw("£"))

gives

[1] "c2" "a3"

on a Mac.



I have googled and otherwise searched around and have come up with
nothing that seemed at all helpful to me. If I am missing something
obvious, please point me at it.

(I have found a table on the web, which contains the information that I
need, but it is only accessible "by eye" as far as I can discern.)

Supplementary question: Suppose I have the string "00A3" stored in
an object called "yyy". How do I put that string together with "\u"
so as to obtain "£"? I thought I could do

xxx <- paste("\u",yyy,sep="")

but R won't let me use "\u" "without hex digits". How can I get around
this?


The \u notation with a code point is handled by the R parser, so you
need to parse that string, which means putting it in quotes first, e.g.

xxx <- eval(parse(text = paste0("'\\u", yyy, "'")))

That seems pretty excessive. You'd probably be better off doing all of
this in C instead...


Hi Rolf,
I almost got it in Linux with:

x<-\u00A3
paste("\\u",
 toupper(paste(as.character(charToRaw(x)),sep="",collapse="")),
 sep="",collapse="")
[1] "\\uC2A3"

But I couldn't get rid of the double backslash, so I must agree with 
Duncan. Also, I don't know how the "C2" gets in there.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Obtain the hex code for a given character.

2014-02-04 Thread David Winsemius

On Feb 4, 2014, at 4:57 PM, Rolf Turner wrote:

> 
> 
> If I have a character such as "£" stored in a object called "xxx", how can I 
> obtain the hex code representation of this character?  In this case I know 
> that the hex code is "\u00A3", but if I didn't, how would I find out?
> 
> I would like a function "foo()" such that foo(xxx) would return, say, the 
> string "00A3".
> 

Close:

> as.hexmode(utf8ToInt("£"))
[1] "a3"


> I have googled and otherwise searched around and have come up with nothing 
> that seemed at all helpful to me.  If I am missing something obvious, please 
> point me at it.
> 
> (I have found a table on the web, which contains the information that I need, 
> but it is only accessible "by eye" as far as I can discern.)
> 
> Supplementary question:  Suppose I have the string "00A3" stored in
> an object called "yyy".


> intToUtf8(as.hexmode('00A3'))
[1] "£"


>  How do I put that string together with "\u"
> so as to obtain "£"?  I thought I could do
> 
>   xxx <- paste("\u",yyy,sep="")
> 
> but R won't let me use "\u" "without hex digits".  How can I get around this?
> 
> Thanks.
> 
> cheers,
> 
> Rolf Turner
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Obtain the hex code for a given character.

2014-02-04 Thread David Winsemius

On Feb 4, 2014, at 8:56 PM, David Winsemius wrote:

> 
> On Feb 4, 2014, at 4:57 PM, Rolf Turner wrote:
> 
>> 
>> 
>> If I have a character such as "£" stored in a object called "xxx", how can I 
>> obtain the hex code representation of this character?  In this case I know 
>> that the hex code is "\u00A3", but if I didn't, how would I find out?
>> 
>> I would like a function "foo()" such that foo(xxx) would return, say, the 
>> string "00A3".
>> 
> 
> Close:
> 
>> as.hexmode(utf8ToInt("£"))
> [1] "a3"
> 

Looking at the help page again I realized that there was a `format.hexmode` to 
deliver as requested:

> format(as.hexmode(utf8ToInt("£")), width=4, upper=TRUE)
[1] "00A3"

-- 
David.

> 
>> I have googled and otherwise searched around and have come up with nothing 
>> that seemed at all helpful to me.  If I am missing something obvious, please 
>> point me at it.
>> 
>> (I have found a table on the web, which contains the information that I 
>> need, but it is only accessible "by eye" as far as I can discern.)
>> 
>> Supplementary question:  Suppose I have the string "00A3" stored in
>> an object called "yyy".
> 
> 
>> intToUtf8(as.hexmode('00A3'))
> [1] "£"
> 
> 
>> How do I put that string together with "\u"
>> so as to obtain "£"?  I thought I could do
>> 
>>  xxx <- paste("\u",yyy,sep="")
>> 
>> but R won't let me use "\u" "without hex digits".  How can I get around this?
>> 
>> Thanks.
>> 
>> cheers,
>> 
>> Rolf Turner
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius
> Alameda, CA, USA
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Obtain the hex code for a given character.

2014-02-05 Thread Duncan Murdoch

On 14-02-04 11:56 PM, David Winsemius wrote:


On Feb 4, 2014, at 4:57 PM, Rolf Turner wrote:




If I have a character such as "£" stored in a object called "xxx", how can I obtain the 
hex code representation of this character?  In this case I know that the hex code is "\u00A3", but 
if I didn't, how would I find out?

I would like a function "foo()" such that foo(xxx) would return, say, the string 
"00A3".



Close:


as.hexmode(utf8ToInt("£"))

[1] "a3"


Nice one.  I have seen that function before, but I didn't remember it 
this time.  One part of the documentation makes it harder to find:  it 
talks about "UTF-8 code points", rather than "Unicode code points".

UTF-8 is the way the Unicode code point is encoded into bytes.

Duncan Murdoch







I have googled and otherwise searched around and have come up with nothing that 
seemed at all helpful to me.  If I am missing something obvious, please point 
me at it.

(I have found a table on the web, which contains the information that I need, but it is 
only accessible "by eye" as far as I can discern.)

Supplementary question:  Suppose I have the string "00A3" stored in
an object called "yyy".




intToUtf8(as.hexmode('00A3'))

[1] "£"



  How do I put that string together with "\u"
so as to obtain "£"?  I thought I could do

xxx <- paste("\u",yyy,sep="")

but R won't let me use "\u" "without hex digits".  How can I get around this?

Thanks.

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Obtain the hex code for a given character --- thanks.

2014-02-05 Thread Rolf Turner



And the winner is ... (drum roll) ... DAVID WINSEMIUS!!!

Thank you hugely David.  You have completely solved my problem.
The last bit with format.hexmode() so as to get "00A3" rather than
just "a3" was actually unnecessary; I could've lived with "a3".  But it 
was a nice bit of polish.


How you manage to find the relevant functions in the bewildering (to me) 
superabundance of functions is impressive as well as astonishing.


Thanks again.

cheers,

Rolf

On 05/02/14 19:07, David Winsemius wrote:


On Feb 4, 2014, at 8:56 PM, David Winsemius wrote:



On Feb 4, 2014, at 4:57 PM, Rolf Turner wrote:




If I have a character such as "£" stored in a object called "xxx", how can I obtain the 
hex code representation of this character?  In this case I know that the hex code is "\u00A3", but 
if I didn't, how would I find out?

I would like a function "foo()" such that foo(xxx) would return, say, the string 
"00A3".



Close:


as.hexmode(utf8ToInt("£"))

[1] "a3"



Looking at the help page again I realized that there was a `format.hexmode` to 
deliver as requested:


format(as.hexmode(utf8ToInt("£")), width=4, upper=TRUE)

[1] "00A3"



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Obtain the hex code for a given character --- thanks.

2014-02-05 Thread David Winsemius

On Feb 5, 2014, at 12:53 AM, Rolf Turner wrote:

> 
> 
> And the winner is ... (drum roll) ... DAVID WINSEMIUS!!!
> 
> Thank you hugely David.  You have completely solved my problem.
> The last bit with format.hexmode() so as to get "00A3" rather than
> just "a3" was actually unnecessary; I could've lived with "a3".  But it was a 
> nice bit of polish.
> 
> How you manage to find the relevant functions in the bewildering (to me) 
> superabundance of functions is impressive as well as astonishing.

I have in the past found `as.hexmode` (which I learned after needing as.octmode 
as part of my efforts to understand plotmath and using symbol fonts) to be very 
useful so I did not need to do anything more than "?as.hexmode". I got to the 
help page for `utf8ToInt` by way of the links on the ?Encodings page. I can 
never remember the names of all these functions either. I just follow links in 
the help system until I get what I need.

-- 
David.

> 
> Thanks again.
> 
> cheers,
> 
> Rolf
> 
> On 05/02/14 19:07, David Winsemius wrote:
>> 
>> On Feb 4, 2014, at 8:56 PM, David Winsemius wrote:
>> 
>>> 
>>> On Feb 4, 2014, at 4:57 PM, Rolf Turner wrote:
>>> 
 
 
 If I have a character such as "£" stored in a object called "xxx", how can 
 I obtain the hex code representation of this character?  In this case I 
 know that the hex code is "\u00A3", but if I didn't, how would I find out?
 
 I would like a function "foo()" such that foo(xxx) would return, say, the 
 string "00A3".
 
>>> 
>>> Close:
>>> 
 as.hexmode(utf8ToInt("£"))
>>> [1] "a3"
>>> 
>> 
>> Looking at the help page again I realized that there was a `format.hexmode` 
>> to deliver as requested:
>> 
>>> format(as.hexmode(utf8ToInt("£")), width=4, upper=TRUE)
>> [1] "00A3"
>> 
> 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.