Thanks for looking into this. A few notes regarding all the UTF encodings on Windows 10 ...
The default eol for write.csv (via write.table) is "\n" and always gives as.raw (c (0x0d, 0x0a)), that is, <Carriage Return> <Line Feed> as adjacent bytes. This is fine for UTF-8 but wrong for UTF-16 and UTF-32. EXAMPLE: Using UTF-32 for exaggeration (note also that 3 nul bytes are missing in the final CR+LF): df <- data.frame (x = 1:2, y = 3:4) $`UTF-32LE`$default.eol$raw [1] 22 00 00 00 78 00 00 00 22 00 00 00 2c 00 00 00 22 00 00 00 79 00 00 00 22 [26] 00 00 00 0d 0a 00 00 00 31 00 00 00 2c 00 00 00 33 00 00 00 0d 0a 00 00 00 [51] 32 00 00 00 2c 00 00 00 34 00 00 00 0d 0a 00 00 00 $`UTF-32BE`$default.eol$raw [1] 00 00 00 22 00 00 00 78 00 00 00 22 00 00 00 2c 00 00 00 22 00 00 00 79 00 [26] 00 00 22 00 00 00 0d 0a 00 00 00 31 00 00 00 2c 00 00 00 33 00 00 00 0d 0a [51] 00 00 00 32 00 00 00 2c 00 00 00 34 00 00 00 0d 0a (Nevertheless, Microsoft Excel 2013 tolerates these CSVs!) One trick/solution is to use eol = "\r" (that is, <Carriage Return> only). Regards -- Jack Kelley ---------------------------------------------------------------------------- -------- remove (list = objects()) print (sessionInfo()) cat ("##########################################################\n\n") ENCODING <- c ( "UTF-8", "UTF-16LE", "UTF-16BE", "UTF-16", "UTF-32LE", "UTF-32BE", "UTF-32" ) df <- data.frame (x = 1:2, y = 3:4) csv <- structure (lapply (ENCODING, function (encoding) { csv <- sprintf ("df_%s.csv", encoding) write.csv (df, csv, fileEncoding = encoding, row.names = FALSE) list (default.eol = list ( csv = csv, raw = readBin (csv, "raw", 1000)) ) }), .Names = ENCODING) EOL <- c (LF = "\n", CR = "\r", "CR+LF" = "\r\n") CSV <- structure (lapply (ENCODING, function (encoding) { structure ( lapply (names (EOL), function (EOL.name) { csv <- sprintf ("df_%s_eol=%s.csv", encoding, EOL.name) write.csv ( df, csv, fileEncoding = encoding, row.names = FALSE, eol = EOL [EOL.name] ) list (csv = csv, raw = readBin (csv, "raw", 1000)) }), .Names = names (EOL)) }), .Names = ENCODING) print (csv) print (CSV) ---------------------------------------------------------------------------- ---------------- -----Original Message----- From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] Sent: Tuesday, 2 May 2017 04:22 To: Jack Kelley <jack.kel...@bigpond.com>; r-devel@r-project.org Subject: Re: [Rd] Any progress on write.csv fileEncoding for UTF-16 and UTF-32 ? On 30/04/2017 12:23 PM, Duncan Murdoch wrote: > No, I don't think anyone is working on this. > > There's a fairly simple workaround for the UTF-16 and UTF-32 iconv > issues: don't attempt to produce character vectors, produce raw vectors > instead. (The "toRaw" argument to iconv() asks for this.) Raw vectors > can contain embedded nulls. Character vectors can't, because > internally, R is using 8 bit C strings, and the nulls are string > terminators. > > I don't know how difficult it would be to fix the write.table problems. I've now taken a look, and it appears as if it's not too hard. I'll see if I can work out a patch that I trust. Duncan Murdoch > > Duncan Murdoch > > On 29/04/2017 7:53 PM, Jack Kelley wrote: >> "R version 3.4.0 (2017-04-21)" on "x86_64-w64-mingw32" platform >> ... [rest omitted] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel