Re: evaluation question

Richard Damon Wed, 01 Feb 2023 17:24:04 -0800

On 2/1/23 3:59 AM, [email protected] wrote:

On Wed, 1 Feb 2023 11:59:25 +1300
Greg Ewing <[email protected]> wrote:

On 31/01/23 10:24 pm, [email protected] wrote:

All languages have their ugly corners due to initial design mistakes and/or
constraints. Eg: java with the special behaviour of its string class, C++
with "=0" pure virtual declaration. But they don't dump them and make all old
code suddenly cease to execute.

No, but it was decided that Python 3 would have to be backwards
incompatible, mainly to sort out the Unicode mess. Given that,
the opportunity was taken to clean up some other mistakes as well.

Unicode is just a string of bytes. C supports it with a few extra library
functions to get unicode length vs byte length and similar. Its really
not that hard. Rewriting an entire language just to support that sounds a
bit absurd to me but hey ho...

No, Unicode is a string of 21 bit characters. UTF-8 is a representationthat uses bytes, but isn't itself "Unicode".

The key fact is that a "String" variable is indexed not by bytes ofUTF-8 encoding, but by actual characters.

Python3 will store a string as either a sequence of Bytes if the data isall Latin-1, as a sequence of 16-bit words if the data all fits on thBMP, and a sequence of 32 bit words if it has a value outside the BMP.


--
Richard Damon

--
https://mail.python.org/mailman/listinfo/python-list

Re: evaluation question

Reply via email to