On Wed, Jun 4, 2014 at 8:38 PM, Paul Sokolovsky <pmis...@gmail.com> wrote: > That's another reason why people don't like Unicode enforced upon them > - all the talk about supporting all languages and scripts is demagogy > and hypocrisy, given a choice, Unicode zealots would rather limit > people to Latin script then give up on their arbitrarily chosen, > one-among-thousands, > soon-to-be-replaced-by-apples'-and-microsofts'-"exciting-new" encoding.
Wrong. I use and recommend Unicode, with UTF-8 for transmission, and I do not ever want to limit people to Latin-1 or any other such subset. Even though English is the only language I speak, I am *frequently* using non-ASCII characters (eg when I discuss mathematics on a MUD), and if I could be absolutely sure that everyone in the conversation correctly comprehended Unicode, I could do this with a lot more confidence. Unfortunately, the server I use just passes bytes in and out, and some clients assume CP-1252, others assume Latin-1, and others (including my Gypsum) try UTF-8 first and fall back on an eight-bit encoding (currently CP-1252 because of the first group). But in an ideal world, server and clients would all speak Unicode everywhere, and transmit and receive UTF-8. This is not hypocrisy, this is the way to work reliably. > Once again, my claim is what MicroPython implements now is more correct > - in a sense wider than technical - handling. We don't provide Unicode > encoding support, because it's highly bloated, but let people use any > encoding they like. That comes at some price, like length of strings in > characters are not know to runtime, only in bytes, but quite a lot of > applications can be written by having just that. The current implementation is flat-out lying, actually. It claims that it's storing Unicode codepoints (as per the Python spec) while actually storing bytes, and then it transmits those bytes to the console etc as-is. This is a bug. It needs to be fixed. The only question is, what form will the fix take? Will it be PEP 393's flexible fixed-width representation? UTF-8? UTF-16 (I hope not!)? A hybrid of Latin-1 where possible and UTF-8 otherwise? But something has to be done. ChrisA _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com