On 7/14/2017 10:30 AM, Michael Torrie wrote:
On 07/14/2017 07:31 AM, Marko Rauhamaa wrote:
Of course, UTF-8 in a bytes object doesn't make the situation any
better, but does it make it any worse?


As it stands, we have

    รจ --[encode>-- Unicode --[reencode>-- UTF-8

Why is one encoding format better than the other?

All digital data are ultimately bits, usually collected together in groups of 8, called bytes. The point of python 3 is that text should normally be instances of a text class, separate from the raw bytes class, with a defined internal encoding. The actual internal encoding is secondary. And it changed in 3.3.

Python ints are encoded bytes, so are floats, and everything else. When one prints a float, one certainly does not see a representation of the raw bytes in the float object. Instead, one sees a representation of the value it represents. There is a proposal to change the internal encoding of int, as least on 64-bit machines, which are now standard. However, because print(87987282738472387429748) prints 87987282738472387429748 and not the internal bytes, the change in the internal bytes will not affect the user view of ints.

This is precisely the logic behind Google using UTF-8 for strings in Go,
rather than having some O(1) abstract type like Python has.  And many
other languages do the same.  The argument is that because of the very
issues that you mention, having O(1) lookup in a string isn't that
important, since looking up a particular index in a unicode string is
rarely the right thing to do, so UTF-8 is just fine as a native,
in-memory type.

Does go use bytes for text, like most people did in Python 2, a separate text string class, that hides the internal encoding format and implementation? In other words, if you do the equivalent of print(s) where s is a text string with a mixture of greek, cyrillic, hindi, chinese, japanese, and korean chars, do you see the characters, or some representation of the internal bytes?


--
Terry Jan Reedy


--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to