On 12/6/2013 12:24 μμ, Steven D'Aprano wrote:
On Wed, 12 Jun 2013 09:09:05 +0000, Νικόλαος Κούρας wrote:

Isn't 14 bits way to many to store a character ?

No.

There are 1114111 possible characters in Unicode. (And in Japan, they
sometimes use TRON instead of Unicode, which has even more.)

If you list out all the combinations of 14 bits:

0000 0000 0000 00
0000 0000 0000 01
0000 0000 0000 10
0000 0000 0000 11
[...]
1111 1111 1111 10
1111 1111 1111 11

you will see that there are only 32767 (2**15-1) such values. You can't
fit 1114111 characters with just 32767 values.



Thanks Steven,
So, how many bytes does UTF-8 stored for codepoints > 127 ?

example for codepoint 256, 1345, 16474 ?
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to