On Thu, 26 Oct 2006, Henrik Bengtsson wrote: > I'm observing the following on different platforms: > >> parse(text='"\\x7F"') > expression("\177") >> parse(text='"\\x80"') > Error: invalid multibyte string
Yes. It's an invalid multibyte string. In UTF-8 a single byte is a valid character string only if it is below x80, so x7F is fine but x80 is not. In fact x80 is not the leading byte of any valid UTF-8 character. You have to work out what the Unicode code point is for whatever character you were expecting to be x80 and convert that to UTF-8. I'm surprised that one of your UTF-8 machines worked -- I don't think it should. -thomas > ... >> parse(text='"\\xFF"') > Error: invalid multibyte string > > However, > > cat("\x7F\n\x80\n...\xFF\n") > > works. Using R --vanilla. > SYSTEMS GIVING THE ERROR: >> sessionInfo() > R version 2.4.0 (2006-10-03) > x86_64-unknown-linux-gnu > locale: > LC_CTYPE=en_AU.UTF-8;LC_NUMERIC=C;LC_TIME=en_AU.UTF-8;LC_COLLATE=en_AU.UTF-8;LC_MONETARY=en_AU.UTF-8;LC_MESSAGES=en_AU.UTF-8;LC_PAPER=en_AU.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_AU.UTF-8;LC_IDENTIFICATION=C > > R version 2.4.0 Patched (2006-10-03 r39576) > i686-pc-linux-gnu > locale: > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C > > > SYSTEMS OK: > R version 2.4.0 Under development (unstable) (2006-07-23 r38687) > x86_64-unknown-linux-gnu > locale: > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C > > R version 2.4.0 (2006-10-03) > i386-pc-mingw32 > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > R version 2.4.0 Patched (2006-10-10 r39600) > i386-pc-mingw32 > locale: > LC_COLLATE=English_Australia.1252;LC_CTYPE=English_Australia.1252;LC_MONETARY=En > glish_Australia.1252;LC_NUMERIC=C;LC_TIME=English_Australia.1252 > > Version 2.3.0 (2006-04-24) > x86_64-unknown-linux-gnu > locale: <not reported> > > > All of the above have the following packages attached: > [1] "methods" "stats" "graphics" "grDevices" "utils" "datasets" > [7] "base" > > We identified this problem because R CMD check complained: > >> * checking package dependencies ... WARNING >> Error in deparse(e[[2]]) : invalid multibyte string >> Execution halted > > because we use "\xFF" (or "\377") in the source code to be used as a > terminator in a vector buffer; "\0" can't be used for other reasons. > > Is the above a bug in R or one in my head? > > /H > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel