On 06/08/2013 08:34, Qiang Wang wrote:
Thanks for your Elaborative explanation. If I'm understanding correct. "ߟ"
belongs to those characters that CAN be interpreted by UTF-8. Others are
left as they are, such as, "\xe4" and "\xac". So the following code will
show an error message, but it won't affect the use of x?
x <- "\xe4"
I have a question maybe off the topic, but it bothered me much and can't
find the answer anywhere:
In R, how to add a null character to a string? Even just to store one null
character seems not possible:
x <- "\0". The question raised from a web api which requires submitted
strings to contain a null character.
It is not possible. Character strings in R cannot contain nuls (not
nulls, sic). Use raw vectors instead.
This is documented, so time to read some manuals ....
On Tue, Aug 6, 2013 at 1:43 AM, Enrico Schumann <e...@enricoschumann.net>wrote:
On Mon, 05 Aug 2013, Qiang Wang <uns...@gmail.com> writes:
On Sat, Aug 3, 2013 at 3:49 PM, Enrico Schumann <e...@enricoschumann.net
wrote:
On Fri, 02 Aug 2013, Qiang Wang <uns...@gmail.com> writes:
Hi,
I'm struggling with encode/decode strings in R. Don't know why the
second
example below would fail. Thanks in advance for your help.
succeed: s <- "saf" x <- base64encode(s) y <- base64decode(x,
"character")
fail: s <- "safs" x <- base64encode(s) y <- base64decode(x,
"character")
And the first example works for you?
require("base64enc")
s <- "saf"
x <- base64encode(s)
## Error in file(what, "rb") : cannot open the connection
## In addition: Warning message:
## In file(what, "rb") : cannot open file 'saf': No such file or
directory
?base64encode says that its first argument is
"data to be encoded/decoded. For ‘base64encode’ it can be a raw
vector, text connection or file name. For ‘base64decode’ it can be
a string or a binary connection."
Try this:
rawToChar(base64decode(base64encode(charToRaw("saf"))))
## [1] "saf"
--
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net
Thanks for your reply!
Sorry I did not clarify that I was using base64encode and base64decode
functions provide from "caTools" package. It seems that if I convert the
string to the raw type first, it still solves my problem.
My original problem actually is that I have a string:
secret <-
'5Kwug+Byrq+ilULMz3IBD5tquNt5CcdYi3XPc8jnKwtXvIgHw/vcSGU1VCIo4b/OfcRDm7uH359syfhWzXFrNg=='
It was claimed to be encoded in Base64. So I tried to decode it:
require("base64enc")
rawToChar(base64decode(secret))
Then, I got
"\xe4\xac.\x83\xe0r\xae\xaf\xa2\x95B\xcc\xcfr\001\017\x9bj\xb8\xdby\t\xc7X\x8bu\xcfs\xc8\xe7+\vW\xbc\x88\a\xc3\xfb\xdcHe5T\"(\xe1\xbf\xce}\xc4C\x9b\xbb\x87ߟl\xc9\xf8V\xcdqk6"
But what I suppose to get is:
'\xe4\xac.\x83\xe0r\xae\xaf\xa2\x95B\xcc\xcfr\x01\x0f\x9bj\xb8\xdby\t\xc7X\x8bu\xcfs\xc8\xe7+\x0bW\xbc\x88\x07\xc3\xfb\xdcHe5T"(\xe1\xbf\xce}\xc4C\x9b\xbb\x87\xdf\x9fl\xc9\xf8V\xcdqk6'
Most part of the result is correct except several characters near the
end.
I don't know where the problem is.
See the help page of 'rawToChar': the function transforms raw bytes into
characters. But, depending on your locale, one character may be more
than one byte. On my computer, with a UTF-8 locale (see my
'?sessionInfo' below),
rawToChar(base64decode(secret), TRUE)
gives me
## [1] "\xe4" "\xac" "." "\x83" "\xe0" "r" "\xae"
## [8] "\xaf" "\xa2" "\x95" "B" "\xcc" "\xcf" "r"
## [15] "\001" "\017" "\x9b" "j" "\xb8" "\xdb" "y"
## [22] "\t" "\xc7" "X" "\x8b" "u" "\xcf" "s"
## [29] "\xc8" "\xe7" "+" "\v" "W" "\xbc" "\x88"
## [36] "\a" "\xc3" "\xfb" "\xdc" "H" "e" "5"
## [43] "T" "\"" "(" "\xe1" "\xbf" "\xce" "}"
## [50] "\xc4" "C" "\x9b" "\xbb" "\x87" "\xdf" "\x9f"
## [57] "l" "\xc9" "\xf8" "V" "\xcd" "q" "k"
## [64] "6"
That is, every *single* byte is converted into character. For example:
rawToChar(base64decode(secret), TRUE)[55:56]
gives
## [1] "\xdf" "\x9f"
which probably is what you expected. But if I paste those two
characters together,
paste(rawToChar(base64decode(s), TRUE)[55:56], collapse = "")
they will be shown like so:
## [1] "ߟ"
because this is how this byte pattern will be interpreted in UTF-8.
Abbreviated 'sessionInfo':
R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
--
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Brian D. Ripley, rip...@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.