On 1/5/2014 9:23 AM, wxjmfa...@gmail.com wrote:

My examples are ONLY ILLUSTRATING, this FSR
is wrong by design,

Let me answer you a different way. If FSR is 'wrong by design', so are the alternatives. Hence, the claim is, in itself, useless as a guide to choosing. The choices:

* Keep the previous complicated system of buggy narrow builds on some systems and space-wasting wide builds on other systems, with Python code potentially acting differently on the different builds. I am sure that you agree that this is a bad design.

* Improved the dual-build system by de-bugging narrow builds. I proposed to do this (and gave Python code proving the idea) by adding the complication of an auxiliary array of indexes of astral chars in a UTF-16 string. I suspect you would call this design 'wrong' also.

* Use the memory-wasting UTF-32 (wide) build on all systems. I know you do not consider this 'wrong', but come on. From an information theoretic and coding viewpoint, it clearly is. The top (4th) byte is *never* used. The 3rd byte is *almost never* used. The 2nd byte usage ranges from common to almost never for different users.

Memory waste is also time waste, as moving information-free 0 bytes takes the same time as moving informative bytes.

Here is the beginning of the rationale for the FSR (from http://www.python.org/dev/peps/pep-0393/ -- have you ever read it?).

"There are two classes of complaints about the current implementation of the unicode type: on systems only supporting UTF-16, users complain that non-BMP characters are not properly supported. On systems using UCS-4 internally (and also sometimes on systems using UCS-2), there is a complaint that Unicode strings take up too much memory - especially compared to Python 2.x, where the same code would often use ASCII strings...".

The memory waste was a reason to stick with 2.7. It could break code that worked in 2.x. By removing the waste, the FSR makes switching to Python 3 more feasible for some people. It was a response to real problems encountered by real people using Python. It fixed both classes of complaint about the previous system.

* Switch to the time-wasting UTF-8 for text storage, as some have done. This is different from using UTF-8 for text transmission, which I hope becomes the norm soon.

--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to