Re: UTF-8 question from Dive into Python 3

Tim Harig Mon, 17 Jan 2011 14:54:46 -0800

On 2011-01-17, carlo <[email protected]> wrote:
> Is it true UTF-8 does not have any "big-endian/little-endian" issue
> because of its encoding method? And if it is true, why Mark (and
> everyone does) writes about UTF-8 with and without BOM some chapters
> later? What would be the BOM purpose then?


Yes, it is true.  The BOM simply identifies that the encoding as a UTF-8.:

        http://unicode.org/faq/utf_bom.html#bom5

> 2- If that were true, can you point me to some documentation about the
> math that, as Mark says, demonstrates this?

It is true because UTF-8 is essentially an 8 bit encoding that resorts
to the next bit once it exhausts the addressible space of the current
byte it moves to the next one.  Since the bytes are accessed and assessed
sequentially, they must be in big-endian order.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: UTF-8 question from Dive into Python 3

Reply via email to