I wrote: > Having slept on this, I think I agree that 'open-input-file' should > auto-consume BOMs.
On the other hand, there's a nasty complication. Of course (open-input-file FILENAME) is just (open-file FILENAME "r"), so the auto-consuming logic should be in 'open-file'. So what should (open-file FILENAME "r+") do? The problem is that we don't know if the user will read or write first. If they write first, then they may reasonably assume that what they write will be put at the very beginning of the file, no? Also, Unicode 6.2 section 2.6 table 2-4 says that BOMs are only allowed for the encoding schemes UTF-8, UTF-16, and UTF-32. They are *not* allowed for UTF-16BE, UTF-16LE, UTF-32BE, or UTF-32LE. Unicode 6.2 section 16.8 goes into more detail: For compatibility with versions of the Unicode Standard prior to Version 3.2, the code point U+FEFF has the word-joining semantics of zero width no-break space when it is not used as a BOM. [...] Where the byte order is explicitly specified, such as in UTF-16BE or UTF-16LE, then all U+FEFF characters -- even at the very beginning of the text -- are to be interpreted as zero width no-break spaces. Similarly, where Unicode text has known byte order, initial U+FEFF characters are not required, but for backward compatibility are to be interpreted as zero width no-break spaces. [...] Systems that use the byte order mark must recognize when an initial U+FEFF signals the byte order. In those cases, it is not part of the textual content and should be removed before processing, because otherwise it may be mistaken for a legitimate zero width no-break space. To represent an initial U+FEFF zero width no-break space in a UTF-16 file, use U+FEFF twice in a row. The first one is a byte order mark; the second one is the initial zero width no-break space. [...] This will require some more research and thought. Mark