"Terry Reedy" <tjre...@udel.edu> wrote in message news:hnjkuo$n1...@dough.gmane.org...
On 3/14/2010 4:40 PM, Guillermo wrote:
Adding the byte that some call a 'utf-8 bom' makes the file an invalid utf-8 file.

Not true.  From http://unicode.org/faq/utf_bom.html:

Q: When a BOM is used, is it only in 16-bit Unicode text?
A: No, a BOM can be used as a signature no matter how the Unicode text is transformed: UTF-16, UTF-8, UTF-7, etc. The exact bytes comprising the BOM will be whatever the Unicode character FEFF is converted into by that transformation format. In that form, the BOM serves to indicate both that it is a Unicode file, and which of the formats it is in. Examples:
BytesEncoding Form
00 00 FE FF UTF-32, big-endian
FF FE 00 00 UTF-32, little-endian
FE FF            UTF-16, big-endian
FF FE            UTF-16, little-endian
EF BB BF      UTF-8

-Mark


--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to