Re: A few questiosn about encoding

Νικόλαος Κούρας Thu, 13 Jun 2013 00:48:49 -0700

On 13/6/2013 10:11 πμ, Steven D'Aprano wrote:

  >>> chr(16474)
'䁚'


Some Chinese symbol.
So code-point '䁚' has a Unicode ordinal value of 16474, correct?


Correct.

where in after encoding this glyph's ordinal value to binary gives us
the following bytes:

  >>> bin(16474).encode('utf-8')
b'0b100000001011010'


An observations here that you please confirm as valid.

1. A code-point and the code-point's ordinal value are associated into aUnicode charset. They have the so called 1:1 mapping.

So, i was under the impression that by encoding the code-point intoutf-8 was the same as encoding the code-point's ordinal value into utf-8.


That is why i tried to:
bin(16474).encode('utf-8') instead of chr(16474).encode('utf-8')

So, now i believe they are two different things.

The code-point *is what actually* needs to be encoded and *not* itsordinal value.

The leading 0b is just syntax to tell you "this is base 2, not base 8
(0o) or base 10 or base 16 (0x)". Also, leading zero bits are dropped.

But byte objects are represented as '\x' instead of the aforementioned'0x'. Why is that?



> No! That creates a string from 16474 in base two:
> '0b100000001011010'

I disagree here.

16474 is a number in base 10. Doing bin(16474) we get the binaryrepresentation of number 16474 and not a string.

Why you say we receive a string while python presents a binary number?

Then you encode the string '0b100000001011010' into UTF-8. There are 17
characters in this string, and they are all ASCII characters to they take
up 1 byte each, giving you bytes b'0b100000001011010' (in ASCII form).


0b100000001011010 stands for a number in base 2 for me not as a string.
Have i understood something wrong?


--
http://mail.python.org/mailman/listinfo/python-list

Re: A few questiosn about encoding

Reply via email to