Le mardi 29 octobre 2013 16:52:49 UTC+1, Tim Chase a écrit : > On 2013-10-29 08:38, wxjmfa...@gmail.com wrote: > > > >>> import timeit > > > >>> timeit.timeit("a = 'hundred'; 'x' in a") > > > 0.12621293837694095 > > > >>> timeit.timeit("a = 'hundreij'; 'x' in a") > > > 0.26411553466961735 > > > > That reads to me as "If things were purely UCS4 internally, Python > > would normally take 0.264... seconds to execute this test, but core > > devs managed to optimize a particular (lower 127 ASCII characters > > only) case so that it runs in less than half the time." > > > > Is this not what you intended to demonstrate? 'cuz that sounds > > like a pretty awesome optimization to me. > > > > -tkc
-------- That's very naive. In fact, what happens is just the opposite. The "best case" with the FSR is worst than the "worst case" without the FSR. And this is just without counting the effect that this poor Python is spending its time in switching from one internal representation to one another, without forgetting the fact that this has to be tested every time. The more unicode manipulations one applies, the more time it demands. Two tasks, that come in my mind: re and normalization. It's very interesting to observe what happens when one normalizes latin text and polytonic Greek text, both with plenty of diactrics. ---- Something different, based on my previous example. What a European user is supposed to think, when she/he sees, she/he can be "penalized" by such an amount, simply by using non ascii characters for a product which is supposed to be "unicode compliant" ? jmf -- https://mail.python.org/mailman/listinfo/python-list