Re: UTF-8 question from Dive into Python 3

Tim Roberts Tue, 18 Jan 2011 23:28:17 -0800

Tim Harig <user...@ilthio.net> wrote:
>On 2011-01-17, carlo <syseng...@gmail.com> wrote:
>
>> 2- If that were true, can you point me to some documentation about the
>> math that, as Mark says, demonstrates this?
>
>It is true because UTF-8 is essentially an 8 bit encoding that resorts
>to the next bit once it exhausts the addressible space of the current
>byte it moves to the next one.  Since the bytes are accessed and assessed
>sequentially, they must be in big-endian order.


You were doing excellently up to that last phrase.  Endianness only applies
when you treat a series of bytes as a larger entity.  That doesn't apply to
UTF-8.  None of the bytes is more "significant" than any other, so by
definition it is neither big-endian or little-endian.
-- 
Tim Roberts, t...@probo.com
Providenza & Boekelheide, Inc.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: UTF-8 question from Dive into Python 3

Reply via email to