Le lundi 29 août 2011 21:34:48, vous avez écrit : > >> Those haven't been ported to the new API, yet. Consider, for example, > >> d9821affc9ee. Before that, I got 253 MB/s on the 4096 units read test; > >> with that change, I get 610 MB/s. The trunk gives me 488 MB/s, so this > >> is a 25% speedup for PEP 393. > > > > If I understand correctly, the performance now highly depend on the used > > characters? A pure ASCII string is faster than a string with characters > > in the ISO-8859-1 charset? > > How did you infer that from above paragraph??? ASCII and Latin-1 are > mostly identical in terms of performance - the ASCII decoder should be > slightly slower than the Latin-1 decoder, since the ASCII decoder needs > to check for errors, whereas the Latin-1 decoder will never be > confronted with errors.
I don't compare ASCII and ISO-8859-1 decoders. I was asking if decoding b'abc' from ISO-8859-1 is faster than decoding b'ab\xff' from ISO-8859-1, and if yes: why? Your patch replaces PyUnicode_New(size, 255) ... memcpy(), by PyUnicode_FromUCS1(). I don't understand how it makes Python faster: PyUnicode_FromUCS1() does first scan the input string for the maximum code point. I suppose that the main difference is that the ISO-8859-1 encoded string is stored as the UTF-8 encoded string (shared pointer) if all characters of the string are ASCII characters. In this case, encoding the string to UTF-8 doesn't cost anything, we already have the result. Am I correct? Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com