By UTF-18 I meant UTF-16, obviously. On Sun, Nov 10, 2013 at 8:41 PM, Sverre Stausland <stausland.john...@iln.uio.no> wrote: > With respect to your comment (sorry, the e-mail you wrote that in > didn't get to my inbox): > >>> I don't think so. In general, functions that convert to the native >>> encoding break UTF-8 on Windows, because the native encoding is often >>> Latin1 or some other encoding that doesn't cover all the characters in >>> UTF-8. > > As I understand it, the native encoding in Windows is UTF-18, not Latin1: > http://msdn.microsoft.com/en-us/library/dd374081.aspx > > And UTF-18 is a superset of UTF-8, isn't it? > > Sverre > > On Sun, Nov 10, 2013 at 1:49 PM, Duncan Murdoch > <murdoch.dun...@gmail.com> wrote: >> On 13-11-10 7:31 AM, Sverre Stausland wrote: >>> >>> My e-mail was intended as a typical "feature request", and I couldn't >>> find any more suitable place for that than the r-devel mailing list. I >>> am not a programmer, so I don't have the skills to write this into R's >>> source code myself. >>> >>> The incentive is nevertheless clear enough. I believe a software >>> program in 2013 which imports, manipulates, and exports text in >>> various formats (text files, picture files, postscript files, etc.) >>> would normally be expected to support UTF-8. It might not be trivial >>> to implement as R is written now, but the expectation will still be >>> there. So I still believe it would be a good idea if R soon would be >>> able to support UTF-8. >> >> >> R does support UTF-8. It all works smoothly in a UTF-8 locale, not so >> smoothly if you have your computer set up to use a different 8 bit encoding. >> >>> >>> I'm not quite able to piece together from the information you gave >>> what the underlying issues are. What I read is: >>> (1) Some R functions convert characters to the native encoding. >>> (2) Windows did not support UTF-8 when R was first written. >>> (3) Unix did not support UCS-2 when R was first written. >>> >>> I'm guessing here that the implications are: >>> (1) R's write.table() converts characters to a native encoding. >>> (2) The native encoding in Windows 7 is not UTF-8. >>> (3) The native encoding in Unix systems is UTF-8. >> >> >> You got it right for the first 4. Regarding (2) in your second list, that's >> right, and in fact UTF-8 is not supported as a native encoding. >> And point (3) is optional, though UTF-8 is the dominant encoding nowadays. >> >> The easiest solution is for you to switch to a Unix variant and set it up to >> use UTF-8 as the native encoding. >> >> Next easiest would be for Microsoft to add UTF-8 as a code page. >> >> Most difficult would be for R to handle UTF-8 properly on systems with >> limited support for it. >> >> We probably will add small changes that let you work around the Windows >> problems, but they won't be very satisfactory to anyone. I don't think we >> will make the big changes that would make R look like "a software program in >> 2013", since it would be so much work, and there's such an easy workaround. >> >> Duncan Murdoch >> >> >>> But this is just guesswork. >> >> >> >>> >>> PS. A related issue: >>> >>> http://stackoverflow.com/questions/19881553/using-unicode-inside-rs-expression-command >>> >>> Sverre >>> >>
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel