On 2/1/23 3:59 AM, mutt...@dastardlyhq.com wrote:
On Wed, 1 Feb 2023 11:59:25 +1300
Greg Ewing <greg.ew...@canterbury.ac.nz> wrote:
On 31/01/23 10:24 pm, mutt...@dastardlyhq.com wrote:
All languages have their ugly corners due to initial design mistakes and/or
constraints. Eg: java with the special behaviour of its string class, C++
with "=0" pure virtual declaration. But they don't dump them and make all old
code suddenly cease to execute.
No, but it was decided that Python 3 would have to be backwards
incompatible, mainly to sort out the Unicode mess. Given that,
the opportunity was taken to clean up some other mistakes as well.
Unicode is just a string of bytes. C supports it with a few extra library
functions to get unicode length vs byte length and similar. Its really
not that hard. Rewriting an entire language just to support that sounds a
bit absurd to me but hey ho...

No, Unicode is a string of 21 bit characters. UTF-8 is a representation that uses bytes, but isn't itself "Unicode".

The key fact is that a "String" variable is indexed not by bytes of UTF-8 encoding, but by actual characters.

Python3 will store a string as either a sequence of Bytes if the data is all Latin-1, as a sequence of 16-bit words if the data all fits on th BMP, and a sequence of 32 bit words if it has a value outside the BMP.

--
Richard Damon

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to