STINNER Victor added the comment: Oh, I forgot my benchmark results.
decodebench.py result results on Linux 32 bits: (Linux-3.2.0-32-generic-pae-i686-with-debian-wheezy-sid) $ ./python bench-diff.py original writer ascii 'A'*10000 4109 (-3%) 3974 latin1 'A'*10000 3851 (-5%) 3644 latin1 '\x80'*10000 14832 (-3%) 14430 utf-8 'A'*10000 3747 (-4%) 3608 utf-8 '\x80'*10000 976 (-2%) 961 utf-8 '\u0100'*10000 974 (-2%) 959 utf-8 '\u8000'*10000 804 (-14%) 694 utf-8 '\U00010000'*10000 666 (-5%) 635 utf-16le 'A'*10000 4154 (-1%) 4117 utf-16le '\x80'*10000 4055 (-2%) 3988 utf-16le '\u0100'*10000 4047 (-2%) 3974 utf-16le '\u8000'*10000 917 (-1%) 912 utf-16le '\U00010000'*10000 872 (-0%) 870 utf-16be 'A'*10000 3218 (-1%) 3185 utf-16be '\x80'*10000 3163 (-2%) 3114 utf-16be '\u0100'*10000 2591 (-1%) 2556 utf-16be '\u8000'*10000 979 (-1%) 974 utf-16be '\U00010000'*10000 928 (-0%) 925 utf-32le 'A'*10000 1681 (+12%) 1885 utf-32le '\x80'*10000 1697 (+10%) 1865 utf-32le '\u0100'*10000 2224 (+1%) 2254 utf-32le '\u8000'*10000 2224 (+2%) 2269 utf-32le '\U00010000'*10000 2234 (+1%) 2260 utf-32be 'A'*10000 1685 (+11%) 1868 utf-32be '\x80'*10000 1684 (+10%) 1860 utf-32be '\u0100'*10000 2223 (+1%) 2253 utf-32be '\u8000'*10000 2222 (+1%) 2255 utf-32be '\U00010000'*10000 2243 (+1%) 2257 decodebench.py result results on Linux 64 bits: (Linux-3.4.9-2.fc16.x86_64-x86_64-with-fedora-16-Verne) ascii 'A'*10000 10043 (+1%) 10144 latin1 'A'*10000 8351 (-1%) 8258 latin1 '\x80'*10000 19184 (+2%) 19560 utf-8 'A'*10000 8083 (+5%) 8461 utf-8 '\x80'*10000 982 (+1%) 993 utf-8 '\u0100'*10000 984 (+1%) 992 utf-8 '\u8000'*10000 806 (+31%) 1053 utf-8 '\U00010000'*10000 639 (+12%) 718 utf-16le 'A'*10000 5547 (-2%) 5422 utf-16le '\x80'*10000 5205 (+1%) 5271 utf-16le '\u0100'*10000 4900 (-4%) 4695 utf-16le '\u8000'*10000 1062 (+9%) 1154 utf-16le '\U00010000'*10000 1040 (+4%) 1078 utf-16be 'A'*10000 5416 (-5%) 5157 utf-16be '\x80'*10000 5077 (-1%) 5011 utf-16be '\u0100'*10000 4261 (-1%) 4218 utf-16be '\u8000'*10000 1146 (+0%) 1147 utf-16be '\U00010000'*10000 1125 (-1%) 1119 utf-32le 'A'*10000 1743 (+8%) 1880 utf-32le '\x80'*10000 1751 (+5%) 1842 utf-32le '\u0100'*10000 2114 (+29%) 2721 utf-32le '\u8000'*10000 2120 (+28%) 2718 utf-32le '\U00010000'*10000 2065 (+30%) 2690 utf-32be 'A'*10000 1761 (+6%) 1860 utf-32be '\x80'*10000 1749 (+6%) 1856 utf-32be '\u0100'*10000 2101 (+29%) 2715 utf-32be '\u8000'*10000 2083 (+30%) 2715 utf-32be '\U00010000'*10000 2058 (+31%) 2689 Most significant changes: * -14% to decode '\u8000'*10000 from UTF-8 on Linux 32 bits * +31% to decode '\u8000'*10000 from UTF-8 on Linux 32 bits * +28% to +31% to decode UCS-2 and UCS-4 characters from UTF-8 on Linux 32 bits @Serhiy Storchaka: If you feel able to tune _PyUnicodeWriter to improve its performance, please open a new issue. I consider the performance changes acceptable and I don't plan to work on this topic. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue16311> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com