Thanks for your Elaborative explanation. If I'm understanding correct. "ߟ"
belongs to those characters that CAN be interpreted by UTF-8. Others are
left as they are, such as, "\xe4" and "\xac". So the following code will
show an error message, but it won't affect the use of x?
x <- "\xe4"

I have a question maybe off the topic, but it bothered me much and can't
find the answer anywhere:
In R, how to add a null character to a string? Even just to store one null
character seems not possible:
x <- "\0". The question raised from a web api which requires submitted
strings to contain a null character.


On Tue, Aug 6, 2013 at 1:43 AM, Enrico Schumann <e...@enricoschumann.net>wrote:

> On Mon, 05 Aug 2013, Qiang Wang <uns...@gmail.com> writes:
>
> >> On Sat, Aug 3, 2013 at 3:49 PM, Enrico Schumann <e...@enricoschumann.net
> >wrote:
> >>
> >>> On Fri, 02 Aug 2013, Qiang Wang <uns...@gmail.com> writes:
> >>>
> >>> > Hi,
> >>> >
> >>> > I'm struggling with encode/decode strings in R. Don't know why the
> second
> >>> > example below would fail. Thanks in advance for your help.
> >>> > succeed: s <- "saf" x <- base64encode(s) y <- base64decode(x,
> "character")
> >>> > fail: s <- "safs" x <- base64encode(s) y <- base64decode(x,
> "character")
> >>> >
> >>>
> >>> And the first example works for you?
> >>>
> >>>   require("base64enc")
> >>>   s <- "saf"
> >>>   x <- base64encode(s)
> >>>
> >>> ## Error in file(what, "rb") : cannot open the connection
> >>> ## In addition: Warning message:
> >>> ## In file(what, "rb") : cannot open file 'saf': No such file or
> directory
> >>>
> >>> ?base64encode says that its first argument is
> >>>
> >>>     "data to be encoded/decoded. For ‘base64encode’ it can be a raw
> >>>      vector, text connection or file name. For ‘base64decode’ it can 
> >>> be
> >>>      a string or a binary connection."
> >>>
> >>> Try this:
> >>>
> >>>   rawToChar(base64decode(base64encode(charToRaw("saf"))))
> >>>
> >>> ## [1] "saf"
> >>>
> >>> --
> >>> Enrico Schumann
> >>> Lucerne, Switzerland
> >> http://enricoschumann.net
> >>
> >
> > Thanks for your reply!
> >
> > Sorry I did not clarify that I was using base64encode and base64decode
> > functions provide from "caTools" package. It seems that if I convert the
> > string to the raw type first, it still solves my problem.
> >
> > My original problem actually is that I have a string:
> > secret <-
> >
> '5Kwug+Byrq+ilULMz3IBD5tquNt5CcdYi3XPc8jnKwtXvIgHw/vcSGU1VCIo4b/OfcRDm7uH359syfhWzXFrNg=='
> >
> > It was claimed to be encoded in Base64. So I tried to decode it:
> >
> > require("base64enc")
> > rawToChar(base64decode(secret))
> >
> > Then, I got
> >
> "\xe4\xac.\x83\xe0r\xae\xaf\xa2\x95B\xcc\xcfr\001\017\x9bj\xb8\xdby\t\xc7X\x8bu\xcfs\xc8\xe7+\vW\xbc\x88\a\xc3\xfb\xdcHe5T\"(\xe1\xbf\xce}\xc4C\x9b\xbb\x87ߟl\xc9\xf8V\xcdqk6"
> >
> > But what I suppose to get is:
> >
> '\xe4\xac.\x83\xe0r\xae\xaf\xa2\x95B\xcc\xcfr\x01\x0f\x9bj\xb8\xdby\t\xc7X\x8bu\xcfs\xc8\xe7+\x0bW\xbc\x88\x07\xc3\xfb\xdcHe5T"(\xe1\xbf\xce}\xc4C\x9b\xbb\x87\xdf\x9fl\xc9\xf8V\xcdqk6'
> >
> > Most part of the result is correct except several characters near the
> end.
> > I don't know where the problem is.
> >
>
> See the help page of 'rawToChar': the function transforms raw bytes into
> characters.  But, depending on your locale, one character may be more
> than one byte.  On my computer, with a UTF-8 locale (see my
> '?sessionInfo' below),
>
>   rawToChar(base64decode(secret), TRUE)
>
> gives me
>
>   ##  [1] "\xe4" "\xac" "."    "\x83" "\xe0" "r"    "\xae"
>   ##  [8] "\xaf" "\xa2" "\x95" "B"    "\xcc" "\xcf" "r"
>   ## [15] "\001" "\017" "\x9b" "j"    "\xb8" "\xdb" "y"
>   ## [22] "\t"   "\xc7" "X"    "\x8b" "u"    "\xcf" "s"
>   ## [29] "\xc8" "\xe7" "+"    "\v"   "W"    "\xbc" "\x88"
>   ## [36] "\a"   "\xc3" "\xfb" "\xdc" "H"    "e"    "5"
>   ## [43] "T"    "\""   "("    "\xe1" "\xbf" "\xce" "}"
>   ## [50] "\xc4" "C"    "\x9b" "\xbb" "\x87" "\xdf" "\x9f"
>   ## [57] "l"    "\xc9" "\xf8" "V"    "\xcd" "q"    "k"
>   ## [64] "6"
>
> That is, every *single* byte is converted into character.  For example:
>
>   rawToChar(base64decode(secret), TRUE)[55:56]
>
> gives
>
>   ## [1] "\xdf" "\x9f"
>
> which probably is what you expected.  But if I paste those two
> characters together,
>
>   paste(rawToChar(base64decode(s), TRUE)[55:56], collapse = "")
>
> they will be shown like so:
>
>   ## [1] "ߟ"
>
> because this is how this byte pattern will be interpreted in UTF-8.
>
>
>
>
> Abbreviated 'sessionInfo':
>
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_GB.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_GB.UTF-8
>  [7] LC_PAPER=C                 LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
>
>
> --
> Enrico Schumann
> Lucerne, Switzerland
> http://enricoschumann.net
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to