Paul Moore <p.f.moore <at> gmail.com> writes: > > > > As I pointed out, utf-8, utf-16 and latin1 decoders have already been optimized > > in py3k. For *pure ASCII* input, utf-8 decoding is blazingly fast (1GB/s here). > > The dataset for iobench isn't pure ASCII though, and that's why it's not as fast. > > Ah, thanks. Although you said your data was 95% ASCII, and you're > getting decode speeds of 250MB/s. That's 75% slowdown for 5% of the > data! Surely that's not right???
If you look at how utf-8 decoding is implemented (in unicodeobject.c), it's quite obvious why it is so :-) There is a (very) fast path for chunks of pure ASCII data, and (fast but not blazingly fast) fallback for non ASCII data. Please don't think of it as a slowdown... It's still much faster than 2.x, which manages 130MB/s on the same data. Regards Antoine. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com