On 21 March 2013 06:40, jmfauth <wxjmfa...@gmail.com> wrote:

> ----
> [snip usual rant from jmf]


Franz, please pay no attention to jmf. He has become obsessed with a single
small regression in Python 3.3 in performance with how strings perform in a
very small domain that rarely shows up in practice (although as he has
demonstrated, it is easy to create a microbenchmark that makes it appear to
be much worse than it is).

The regression is a consequence of the decision in Python 3.3 to
*correctly* support the full range of Unicode characters whilst also
reducing the required memory where possible. In the vast majority of cases
this is a performance *improvement*. It is only "optimised for the ascii
user" in the sense that in the Unicode standard the pre-existing ASCII
characters only require 1 byte per code point and hence can be stored in
less memory than most other Unicode code points. The possible character
widths are 1, 2 and 4 bytes.

The actual regression occurs when concatentating/replacing/etc a character
to a string that is wider than any other character currently in the string.
In this situation the new string needs to be widened (increase the number
of bytes used by every character) which is a much more expensive operation
than simply creating a new string (which is what would happen if the
character was the same size or smaller).

It has been acknowledged as a real regression, but he keeps hijacking every
thread where strings are mentioned to harp on about it. He has shown no
inclination to attempt to *fix* the regression and is rapidly coming to be
regarded as a troll by most participants in this list.

Tim Delaney
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to