Marc-Andre Lemburg <m...@egenix.com> added the comment: Antoine Pitrou wrote: > > New submission from Antoine Pitrou <pit...@free.fr>: > > This patch speeds up _PyUnicode_CONVERT_BYTES by unrolling its loop. > > Example micro-benchmark: > > ./python -m timeit -s "a='x'*10000;b='\u0102'*1000;c='\U00100000'" "a+b+c" > > -> before: > 100000 loops, best of 3: 14.9 usec per loop > -> after: > 100000 loops, best of 3: 9.19 usec per loop
Before going further with this, I'd suggest you have a look at your compiler settings. Such optimizations are normally performed by the compiler and don't need to be implemented in C, making maintenance harder. The fact that Windows doesn't exhibit the same performance difference suggests that the optimizer is not using the same level or feature set as on Linux. MSVC is at least as good at optimizing code as gcc, often better. I tested using memchr() when writing those "naive" loops. It turned out that using memchr() was slower than using the direct loops. memchr() is inlined by the compiler just like the direct loop and the generated code for the direct version is often easier to optimize for the compiler than the memchr() one, since it receives more knowledge about the used data types. ---------- nosy: +lemburg _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue13136> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com