New submission from Serhiy Storchaka <storch...@gmail.com>: I propose a complex patch, which significantly speeds up UTF-8 decoding. Now decoder faster even decoder in 3.2 (except in a few unreal patological cases).
Also the decoder code reduced and simplified (formerly decoding code was repeated in at least three places). As a side effect ASCII decoding now faster on some platforms (issue14419). Related issues: [issue4868] Faster utf-8 decoding [issue13417] faster utf-8 decoding [issue14419] Faster ascii decoding [issue14624] Faster utf-16 decoder [issue14625] Faster utf-32 decoder [issue14654] Faster utf-8 decoding Here are the results of benchmarking (numbers is speed in MB/s). On 32-bit Linux, AMD Athlon 64 X2 4600+ @ 2.4GHz: 3.2 3.3(vanilla) patched utf-8 'A'*10000 1199 (+69%) 1721 (+18%) 2032 utf-8 'A'*9999+'\x80' 1189 (+25%) 996 (+49%) 1488 utf-8 'A'*9999+'\u0100' 1192 (-25%) 887 (+1%) 894 utf-8 'A'*9999+'\u8000' 1178 (-24%) 888 (+0%) 890 utf-8 'A'*9999+'\U00010000' 1177 (-29%) 872 (-4%) 837 utf-8 '\x80'*10000 220 (+74%) 172 (+122%) 382 utf-8 '\x80'+'A'*9999 1192 (+5%) 376 (+232%) 1250 utf-8 '\x80'*9999+'\u0100' 220 (+54%) 160 (+112%) 339 utf-8 '\x80'*9999+'\u8000' 220 (+54%) 160 (+112%) 339 utf-8 '\x80'*9999+'\U00010000' 221 (+49%) 176 (+88%) 330 utf-8 '\u0100'*10000 220 (+74%) 163 (+134%) 382 utf-8 '\u0100'+'A'*9999 1177 (+4%) 382 (+219%) 1220 utf-8 '\u0100'+'\x80'*9999 220 (+74%) 163 (+134%) 382 utf-8 '\u0100'*9999+'\u8000' 220 (+74%) 163 (+134%) 382 utf-8 '\u0100'*9999+'\U00010000' 220 (+50%) 180 (+83%) 330 utf-8 '\u8000'*10000 261 (+66%) 191 (+126%) 432 utf-8 '\u8000'+'A'*9999 1197 (+1%) 384 (+216%) 1212 utf-8 '\u8000'+'\x80'*9999 216 (+77%) 163 (+134%) 382 utf-8 '\u8000'+'\u0100'*9999 215 (+77%) 164 (+132%) 381 utf-8 '\u8000'*9999+'\U00010000' 261 (+46%) 201 (+89%) 380 utf-8 '\U00010000'*10000 248 (+44%) 198 (+80%) 357 utf-8 '\U00010000'+'A'*9999 1192 (-5%) 383 (+196%) 1135 utf-8 '\U00010000'+'\x80'*9999 220 (+73%) 180 (+111%) 380 utf-8 '\U00010000'+'\u0100'*9999 220 (+73%) 180 (+111%) 380 utf-8 '\U00010000'+'\u8000'*9999 261 (+54%) 201 (+100%) 403 ascii 'A'*10000 233 (+971%) 1876 (+33%) 2496 On 32-bit Linux, Intel Atom N570 @ 1.66GHz: 3.2 3.3(vanilla) patched utf-8 'A'*10000 345 (+81%) 596 (+5%) 623 utf-8 'A'*9999+'\x80' 335 (+41%) 303 (+56%) 474 utf-8 'A'*9999+'\u0100' 336 (-23%) 123 (+110%) 258 utf-8 'A'*9999+'\u8000' 337 (-24%) 123 (+108%) 256 utf-8 'A'*9999+'\U00010000' 336 (-24%) 261 (-3%) 254 utf-8 '\x80'*10000 88 (+66%) 65 (+125%) 146 utf-8 '\x80'+'A'*9999 334 (+8%) 124 (+190%) 360 utf-8 '\x80'*9999+'\u0100' 88 (+43%) 65 (+94%) 126 utf-8 '\x80'*9999+'\u8000' 88 (+43%) 65 (+94%) 126 utf-8 '\x80'*9999+'\U00010000' 89 (+40%) 65 (+92%) 125 utf-8 '\u0100'*10000 88 (+85%) 65 (+151%) 163 utf-8 '\u0100'+'A'*9999 336 (+2%) 77 (+345%) 343 utf-8 '\u0100'+'\x80'*9999 88 (+86%) 65 (+152%) 164 utf-8 '\u0100'*9999+'\u8000' 88 (+86%) 65 (+152%) 164 utf-8 '\u0100'*9999+'\U00010000' 88 (+57%) 65 (+112%) 138 utf-8 '\u8000'*10000 98 (+79%) 69 (+154%) 175 utf-8 '\u8000'+'A'*9999 339 (+3%) 77 (+353%) 349 utf-8 '\u8000'+'\x80'*9999 89 (+84%) 66 (+148%) 164 utf-8 '\u8000'+'\u0100'*9999 88 (+86%) 65 (+152%) 164 utf-8 '\u8000'*9999+'\U00010000' 98 (+58%) 69 (+125%) 155 utf-8 '\U00010000'*10000 104 (+46%) 79 (+92%) 152 utf-8 '\U00010000'+'A'*9999 339 (-5%) 124 (+160%) 323 utf-8 '\U00010000'+'\x80'*9999 88 (+84%) 68 (+138%) 162 utf-8 '\U00010000'+'\u0100'*9999 88 (+83%) 68 (+137%) 161 utf-8 '\U00010000'+'\u8000'*9999 98 (+63%) 72 (+122%) 160 ascii 'A'*10000 132 (+499%) 758 (+4%) 791 ---------- components: Interpreter Core files: decode_utf8_4.patch keywords: patch messages: 160103 nosy: Arfrever, haypo, jcea, loewis, pitrou, storchaka priority: normal severity: normal status: open title: Amazingly faster UTF-8 decoding type: performance versions: Python 3.3 Added file: http://bugs.python.org/file25484/decode_utf8_4.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue14738> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com