On Mon, 17 Jan 2011 14:19:13 -0800 (PST) carlo <syseng...@gmail.com> wrote: > Is it true UTF-8 does not have any "big-endian/little-endian" issue > because of its encoding method?
Yes. > And if it is true, why Mark (and > everyone does) writes about UTF-8 with and without BOM some chapters > later? What would be the BOM purpose then? "BOM" in this case is a misnomer. For UTF-8, it is only used as a marker (a magic number, if you like) to signal than a given text file is UTF-8. The UTF-8 "BOM" does not say anything about byte order; and, actually, it does not change with endianness. (note that it is not required to put an UTF-8 "BOM" at the beginning of text files; it is just a hint that some tools use when generating/reading UTF-8) > 2- If that were true, can you point me to some documentation about the > math that, as Mark says, demonstrates this? Math? UTF-8 is simply a byte-oriented (rather than word-oriented) encoding. There is no math involved, it just works by construction. Regards Antoine. -- http://mail.python.org/mailman/listinfo/python-list