Re: A few questiosn about encoding

Νικόλαος Κούρας Wed, 12 Jun 2013 23:29:14 -0700

On 12/6/2013 11:30 μμ, Nobody wrote:

On Wed, 12 Jun 2013 14:23:49 +0300, Νικόλαος Κούρας wrote:

So, how many bytes does UTF-8 stored for codepoints > 127 ?


U+0000..U+007F  1 byte
U+0080..U+07FF  2 bytes
U+0800..U+FFFF  3 bytes

=U+10000       4 bytes


'U' stands for Unicode code-point which means a character right?

How can you be able to tell up to what character utf-8 needs 1 byte or 2bytes or 3?

And some of the bytes' bits are used to tell where a code-pointsrepresentations stops, right? i mean if we have a code-point that needs2 bytes to be stored that the high bit must be set to 1 to signify thatthis character's encoding stops at 2 bytes.

I just know that 2^8 = 256, that's by first look 265 places, which mean256 positions to hold a code-point which in turn means a character.

We take the high bit out and then we have 2^7 which is enough positionsfor 0-127 standard ASCII. High bit is set to '0' to signify that char isencoded in 1 byte.


Please tell me that i understood correct so far.

But how about for 2 or 3 or 4 bytes?

Am i saying ti correct ?



--
http://mail.python.org/mailman/listinfo/python-list

Re: A few questiosn about encoding

Reply via email to