Re: binary at the front of CHANGES.txt

DM Smith Wed, 18 Jul 2007 04:16:58 -0700


On Jul 17, 2007, at 8:40 PM, Yonik Seeley wrote:

On 7/17/07, DM Smith <[EMAIL PROTECTED]> wrote:

According to the UTF-8 spec \uFEFF is not a BOM. In UTF-8 the byte
order is always the same.


But there is a BOM for UTF-8 (even though there is no endian
component, it does serve as a marker indicating the text file is
unicode text encoded in UTF-8).

http://unicode.org/faq/utf_bom.html#29


This is all rather academic at this point as you have fixed the problem.

I stand corrected \uFEFF (the code point) is the BOM for all UTF,with its representation differing by encoding. But UTF-8 byte orderis always the same, regardless of the presence of the BOM.

According to the Unicode 5.0 Standard book, Chapter 13, Section 13.6,the byte sequence of the BOM for UTF-8 is EF BB BF (3 bytes) and forUTF-16 it is FE FF or FF FE (2 bytes). It appears that the bytesequence is unique for each unicode representation.


See http://www.unicode.org/unicode/uni2book/ch13.pdf#BOM

I frequently will see FE FF at the beginning of UTF-8 files. I haveonly seen MS editors add this. This is wrong for UTF-8 files. I wasassuming that this was the junk at the beginning of the file.

But, the junk at the beginning of the file was C2 BF. Not at all surewhat this would be.







---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: binary at the front of CHANGES.txt

Reply via email to