>> (*) infact UTF8 also indicates the end of each character

> Up to a point.  The initial byte encodes the length and the top few
> bits, but the subsequent octets aren’t distinguishable as final in
> isolation.  0x80-0xBF can all be either medial or final.


So, the first high-bits are a directive that UTF-8 uses to know how many 
bytes each character is being represented as.

0-127 codepoints(characters) use 1 bit to signify they need 1 bit for 
storage and the rest 7 bits to actually store the character ?

while

128-256 codepoints(characters) use 2 bit to signify they need 2 bits for 
storage and the rest 14 bits to actually store the character ?

Isn't 14 bits way to many to store a character ? 
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to