On 2014-04-29 10:37, wxjmfa...@gmail.com wrote: > >>> timeit.repeat("(x*1000 + y)[:-1]", setup="x = 'abc'; y = 'z'") > [1.4027834829454946, 1.38714224331963, 1.3822586635296261] > >>> timeit.repeat("(x*1000 + y)[:-1]", setup="x = 'abc'; y = > >>> '\u0fce'") > [5.462776291480395, 5.4479432055423445, 5.447874284053398] > >>> > >>> > >>> # more interesting > >>> timeit.repeat("(x*1000 + y)[:-1]",\ > ... setup="x = 'abc'.encode('utf-8'); y = > '\u0fce'.encode('utf-8')") [1.3496489533188765, 1.328654286266783, > 1.3300913977710707] > >>>
While I dislike feeding the troll, what I see here is: on your machine, all unicode manipulations in the test should take ~5.4 seconds. But Python notices that some of your strings *don't* require a full 32-bits and thus optimizes those operations, cutting about 75% of the processing time (wow...4-bytes-per-char to 1-byte-per-char, I wonder where that 75% savings comes from). So rather than highlight any *problem* with Python, your [mostly worthless microbenchmark non-realworld] tests show that Python's unicode implementation is awesome. Still waiting to see an actual bug-report as mentioned on the other thread. -tkc -- https://mail.python.org/mailman/listinfo/python-list