Re: [R] cannot base64decode string which is base64encode in R

Prof Brian Ripley Tue, 06 Aug 2013 00:49:55 -0700

On 06/08/2013 08:34, Qiang Wang wrote:

Thanks for your Elaborative explanation. If I'm understanding correct. "ßŸ"
belongs to those characters that CAN be interpreted by UTF-8. Others are
left as they are, such as, "\xe4" and "\xac". So the following code will
show an error message, but it won't affect the use of x?
x <- "\xe4"


I have a question maybe off the topic, but it bothered me much and can't
find the answer anywhere:
In R, how to add a null character to a string? Even just to store one null
character seems not possible:
x <- "\0". The question raised from a web api which requires submitted
strings to contain a null character.

It is not possible. Character strings in R cannot contain nuls (notnulls, sic). Use raw vectors instead.


This is documented, so time to read some manuals ....



On Tue, Aug 6, 2013 at 1:43 AM, Enrico Schumann <e...@enricoschumann.net>wrote:

On Mon, 05 Aug 2013, Qiang Wang <uns...@gmail.com> writes:

On Sat, Aug 3, 2013 at 3:49 PM, Enrico Schumann <e...@enricoschumann.net

wrote:

On Fri, 02 Aug 2013, Qiang Wang <uns...@gmail.com> writes:

Hi,

I'm struggling with encode/decode strings in R. Don't know why the

second

example below would fail. Thanks in advance for your help.
succeed: s <- "saf" x <- base64encode(s) y <- base64decode(x,

"character")

fail: s <- "safs" x <- base64encode(s) y <- base64decode(x,

"character")


And the first example works for you?

   require("base64enc")
   s <- "saf"
   x <- base64encode(s)

## Error in file(what, "rb") : cannot open the connection
## In addition: Warning message:
## In file(what, "rb") : cannot open file 'saf': No such file or

directory


?base64encode says that its first argument is

     "data to be encoded/decoded. For â€˜base64encodeâ€™ it can be a raw
      vector, text connection or file name. For â€˜base64decodeâ€™ it can be
      a string or a binary connection."

Try this:

   rawToChar(base64decode(base64encode(charToRaw("saf"))))

## [1] "saf"

--
Enrico Schumann
Lucerne, Switzerland

http://enricoschumann.net


Thanks for your reply!

Sorry I did not clarify that I was using base64encode and base64decode
functions provide from "caTools" package. It seems that if I convert the
string to the raw type first, it still solves my problem.

My original problem actually is that I have a string:
secret <-

'5Kwug+Byrq+ilULMz3IBD5tquNt5CcdYi3XPc8jnKwtXvIgHw/vcSGU1VCIo4b/OfcRDm7uH359syfhWzXFrNg=='


It was claimed to be encoded in Base64. So I tried to decode it:

require("base64enc")
rawToChar(base64decode(secret))

Then, I got

"\xe4\xac.\x83\xe0r\xae\xaf\xa2\x95B\xcc\xcfr\001\017\x9bj\xb8\xdby\t\xc7X\x8bu\xcfs\xc8\xe7+\vW\xbc\x88\a\xc3\xfb\xdcHe5T\"(\xe1\xbf\xce}\xc4C\x9b\xbb\x87ßŸl\xc9\xf8V\xcdqk6"


But what I suppose to get is:

'\xe4\xac.\x83\xe0r\xae\xaf\xa2\x95B\xcc\xcfr\x01\x0f\x9bj\xb8\xdby\t\xc7X\x8bu\xcfs\xc8\xe7+\x0bW\xbc\x88\x07\xc3\xfb\xdcHe5T"(\xe1\xbf\xce}\xc4C\x9b\xbb\x87\xdf\x9fl\xc9\xf8V\xcdqk6'


Most part of the result is correct except several characters near the

end.

I don't know where the problem is.


See the help page of 'rawToChar': the function transforms raw bytes into
characters.  But, depending on your locale, one character may be more
than one byte.  On my computer, with a UTF-8 locale (see my
'?sessionInfo' below),

   rawToChar(base64decode(secret), TRUE)

gives me

   ##  [1] "\xe4" "\xac" "."    "\x83" "\xe0" "r"    "\xae"
   ##  [8] "\xaf" "\xa2" "\x95" "B"    "\xcc" "\xcf" "r"
   ## [15] "\001" "\017" "\x9b" "j"    "\xb8" "\xdb" "y"
   ## [22] "\t"   "\xc7" "X"    "\x8b" "u"    "\xcf" "s"
   ## [29] "\xc8" "\xe7" "+"    "\v"   "W"    "\xbc" "\x88"
   ## [36] "\a"   "\xc3" "\xfb" "\xdc" "H"    "e"    "5"
   ## [43] "T"    "\""   "("    "\xe1" "\xbf" "\xce" "}"
   ## [50] "\xc4" "C"    "\x9b" "\xbb" "\x87" "\xdf" "\x9f"
   ## [57] "l"    "\xc9" "\xf8" "V"    "\xcd" "q"    "k"
   ## [64] "6"

That is, every *single* byte is converted into character.  For example:

   rawToChar(base64decode(secret), TRUE)[55:56]

gives

   ## [1] "\xdf" "\x9f"

which probably is what you expected.  But if I paste those two
characters together,

   paste(rawToChar(base64decode(s), TRUE)[55:56], collapse = "")

they will be shown like so:

   ## [1] "ßŸ"

because this is how this byte pattern will be interpreted in UTF-8.




Abbreviated 'sessionInfo':

R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_GB.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_GB.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C



--
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net


        [[alternative HTML version deleted]]



______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cannot base64decode string which is base64encode in R

Reply via email to