On 06/30/2019 06:21 AM, Richard Damon wrote: > On 6/30/19 4:00 AM, moi wrote: >> Unfortunately not. >> >> The only thing Python succeeds to propose is a mechanism >> which does the opposite of UTF-8 when it comes to handle >> memory *and* - at the same time - which also does the opposite >> of UTF-32 regarding performance.
I guess "moi" is banned from the mailing list for posting this kind of rubbish, just like our other old unicode troll as I see no trace of his post on the list. Which is just as well. It's completely wrong. The in-memory, internal byte encoding of unicode is irrelevant to the programmer. In Python 3 we deal with unicode. Period. Any performance issues he or our other unicode troll (perhaps the same person?) stem from not understanding the nature of immutable strings. >> For some other reasons, this mechanism leads to buggy >> code. No it doesn't. Without any evidence to back him up, this is a complete fabrication on Moi's part. > My understanding was that the Python 3 'String' class always used a > Unicode encoding (never a code-page encoding). If you indexed into a > string you would get at each location the full code point value of that > character. Now Unicode isn't just UTF-8 or UTF-32/UCS-4 or the like, > those are just different ways to encode into memory/a stream Unicode > code points. It may be that Python makes some awkward choices of how it > wants to store the characters in memory, but to the programmer, it is > just Unicode code points. If you specifically want something list a > UTF-8 encoding, that is one of the usages of Bytes was. That's correct. It doesn't matter what format Python chooses to use in memory. Some argue that O(1) indexing of a unicode string is not important because indexing a unicode string by code point (a "character") is incorrect some/much of the time, owing to the fact that sometimes what is seen as a single character on the screen is actually composed of more than one code point (grapheme cluster). Hence using UTF-8 internally is good enough, and encoding to bytes is a no-op (fast). -- https://mail.python.org/mailman/listinfo/python-list