Thanks for your Elaborative explanation. If I'm understanding correct. "ß" belongs to those characters that CAN be interpreted by UTF-8. Others are left as they are, such as, "\xe4" and "\xac". So the following code will show an error message, but it won't affect the use of x? x <- "\xe4"
I have a question maybe off the topic, but it bothered me much and can't find the answer anywhere: In R, how to add a null character to a string? Even just to store one null character seems not possible: x <- "\0". The question raised from a web api which requires submitted strings to contain a null character. On Tue, Aug 6, 2013 at 1:43 AM, Enrico Schumann <e...@enricoschumann.net>wrote: > On Mon, 05 Aug 2013, Qiang Wang <uns...@gmail.com> writes: > > >> On Sat, Aug 3, 2013 at 3:49 PM, Enrico Schumann <e...@enricoschumann.net > >wrote: > >> > >>> On Fri, 02 Aug 2013, Qiang Wang <uns...@gmail.com> writes: > >>> > >>> > Hi, > >>> > > >>> > I'm struggling with encode/decode strings in R. Don't know why the > second > >>> > example below would fail. Thanks in advance for your help. > >>> > succeed: s <- "saf" x <- base64encode(s) y <- base64decode(x, > "character") > >>> > fail: s <- "safs" x <- base64encode(s) y <- base64decode(x, > "character") > >>> > > >>> > >>> And the first example works for you? > >>> > >>> require("base64enc") > >>> s <- "saf" > >>> x <- base64encode(s) > >>> > >>> ## Error in file(what, "rb") : cannot open the connection > >>> ## In addition: Warning message: > >>> ## In file(what, "rb") : cannot open file 'saf': No such file or > directory > >>> > >>> ?base64encode says that its first argument is > >>> > >>> "data to be encoded/decoded. For âbase64encodeâ it can be a raw > >>> vector, text connection or file name. For âbase64decodeâ it can > >>> be > >>> a string or a binary connection." > >>> > >>> Try this: > >>> > >>> rawToChar(base64decode(base64encode(charToRaw("saf")))) > >>> > >>> ## [1] "saf" > >>> > >>> -- > >>> Enrico Schumann > >>> Lucerne, Switzerland > >> http://enricoschumann.net > >> > > > > Thanks for your reply! > > > > Sorry I did not clarify that I was using base64encode and base64decode > > functions provide from "caTools" package. It seems that if I convert the > > string to the raw type first, it still solves my problem. > > > > My original problem actually is that I have a string: > > secret <- > > > '5Kwug+Byrq+ilULMz3IBD5tquNt5CcdYi3XPc8jnKwtXvIgHw/vcSGU1VCIo4b/OfcRDm7uH359syfhWzXFrNg==' > > > > It was claimed to be encoded in Base64. So I tried to decode it: > > > > require("base64enc") > > rawToChar(base64decode(secret)) > > > > Then, I got > > > "\xe4\xac.\x83\xe0r\xae\xaf\xa2\x95B\xcc\xcfr\001\017\x9bj\xb8\xdby\t\xc7X\x8bu\xcfs\xc8\xe7+\vW\xbc\x88\a\xc3\xfb\xdcHe5T\"(\xe1\xbf\xce}\xc4C\x9b\xbb\x87ßl\xc9\xf8V\xcdqk6" > > > > But what I suppose to get is: > > > '\xe4\xac.\x83\xe0r\xae\xaf\xa2\x95B\xcc\xcfr\x01\x0f\x9bj\xb8\xdby\t\xc7X\x8bu\xcfs\xc8\xe7+\x0bW\xbc\x88\x07\xc3\xfb\xdcHe5T"(\xe1\xbf\xce}\xc4C\x9b\xbb\x87\xdf\x9fl\xc9\xf8V\xcdqk6' > > > > Most part of the result is correct except several characters near the > end. > > I don't know where the problem is. > > > > See the help page of 'rawToChar': the function transforms raw bytes into > characters. But, depending on your locale, one character may be more > than one byte. On my computer, with a UTF-8 locale (see my > '?sessionInfo' below), > > rawToChar(base64decode(secret), TRUE) > > gives me > > ## [1] "\xe4" "\xac" "." "\x83" "\xe0" "r" "\xae" > ## [8] "\xaf" "\xa2" "\x95" "B" "\xcc" "\xcf" "r" > ## [15] "\001" "\017" "\x9b" "j" "\xb8" "\xdb" "y" > ## [22] "\t" "\xc7" "X" "\x8b" "u" "\xcf" "s" > ## [29] "\xc8" "\xe7" "+" "\v" "W" "\xbc" "\x88" > ## [36] "\a" "\xc3" "\xfb" "\xdc" "H" "e" "5" > ## [43] "T" "\"" "(" "\xe1" "\xbf" "\xce" "}" > ## [50] "\xc4" "C" "\x9b" "\xbb" "\x87" "\xdf" "\x9f" > ## [57] "l" "\xc9" "\xf8" "V" "\xcd" "q" "k" > ## [64] "6" > > That is, every *single* byte is converted into character. For example: > > rawToChar(base64decode(secret), TRUE)[55:56] > > gives > > ## [1] "\xdf" "\x9f" > > which probably is what you expected. But if I paste those two > characters together, > > paste(rawToChar(base64decode(s), TRUE)[55:56], collapse = "") > > they will be shown like so: > > ## [1] "ß" > > because this is how this byte pattern will be interpreted in UTF-8. > > > > > Abbreviated 'sessionInfo': > > R version 3.0.1 (2013-05-16) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > > > -- > Enrico Schumann > Lucerne, Switzerland > http://enricoschumann.net > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.