>> Those haven't been ported to the new API, yet. Consider, for example, >> d9821affc9ee. Before that, I got 253 MB/s on the 4096 units read test; >> with that change, I get 610 MB/s. The trunk gives me 488 MB/s, so this >> is a 25% speedup for PEP 393. > > If I understand correctly, the performance now highly depend on the used > characters? A pure ASCII string is faster than a string with characters > in the ISO-8859-1 charset?
How did you infer that from above paragraph??? ASCII and Latin-1 are mostly identical in terms of performance - the ASCII decoder should be slightly slower than the Latin-1 decoder, since the ASCII decoder needs to check for errors, whereas the Latin-1 decoder will never be confronted with errors. What matters is a) is the codec already rewritten to use the new representation, or must it go through Py_UNICODE[] first, requiring then a second copy to the canonical form? b) what is the cost of finding out the highest character? - regardless of what the highest character turns out to be > Is it also true for BMP characters vs non-BMP > characters? Well... If you are talking about the ASCII and Latin-1 codecs - neither of these support most BMP characters, let alone non-BMP characters. In general, non-BMP characters are more expensive to process since they take more space. > Do these benchmark tools use only ASCII characters, or also some > ISO-8859-1 characters? See for yourself. iobench uses Latin-1, including non-ASCII, but not non-Latin-1. > Or, better, different Unicode ranges in different tests? That's why I asked for a list of benchmarks to perform. I cannot run an infinite number of benchmarks prior to adoption of the PEP. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com