On Tue, 17 Jul 2018 15:20:16 +0900, INADA Naoki wrote (replying to Marko): > I still don't understand what's your original point. I think UTF-8 vs > UTF-32 is totally different from Python 2 vs 3. > > For example, string in Rust and Swift (2010s languages!) are *valid* > UTF-8. There are strong separation between byte array and string, even > they use UTF-8. They looks similar to Python 3, not Python 2. > > And Python can use UTF-8 for internal encoding in the future. AFAIK, > PyPy tries it now. After they succeeded, I want to try port it to > CPython after we removed legacy Unicode APIs. (ref PEP 393)
I'm not sure about PyPy, but I'm fairly certain that MicroPython uses UTF-8. I would be very interested to see the results of using UTF-8 in CPython. At the least, it would remove the need to keep a separate UTF-8 representation in the string object, as they do now. It might even be more compact, although a naive implementation would lose the ability to do constant time indexing into strings. That might be a tradeoff worth keeping, if indexing remained sufficiently fast. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list