On 21 March 2013 06:40, jmfauth <wxjmfa...@gmail.com> wrote: > ---- > [snip usual rant from jmf]
Franz, please pay no attention to jmf. He has become obsessed with a single small regression in Python 3.3 in performance with how strings perform in a very small domain that rarely shows up in practice (although as he has demonstrated, it is easy to create a microbenchmark that makes it appear to be much worse than it is). The regression is a consequence of the decision in Python 3.3 to *correctly* support the full range of Unicode characters whilst also reducing the required memory where possible. In the vast majority of cases this is a performance *improvement*. It is only "optimised for the ascii user" in the sense that in the Unicode standard the pre-existing ASCII characters only require 1 byte per code point and hence can be stored in less memory than most other Unicode code points. The possible character widths are 1, 2 and 4 bytes. The actual regression occurs when concatentating/replacing/etc a character to a string that is wider than any other character currently in the string. In this situation the new string needs to be widened (increase the number of bytes used by every character) which is a much more expensive operation than simply creating a new string (which is what would happen if the character was the same size or smaller). It has been acknowledged as a real regression, but he keeps hijacking every thread where strings are mentioned to harp on about it. He has shown no inclination to attempt to *fix* the regression and is rapidly coming to be regarded as a troll by most participants in this list. Tim Delaney
-- http://mail.python.org/mailman/listinfo/python-list