Paul Moore <p.f.moore <at> gmail.com> writes:
> >
> > As I pointed out, utf-8, utf-16 and latin1 decoders have already been
optimized
> > in py3k. For *pure ASCII* input, utf-8 decoding is blazingly fast (1GB/s
here).
> > The dataset for iobench isn't pure ASCII though, and that's why it's not
as fast.
> 
> Ah, thanks. Although you said your data was 95% ASCII, and you're
> getting decode speeds of 250MB/s. That's 75% slowdown for 5% of the
> data! Surely that's not right???

If you look at how utf-8 decoding is implemented (in unicodeobject.c), it's
quite obvious why it is so :-) There is a (very) fast path for chunks of pure
ASCII data, and (fast but not blazingly fast) fallback for non ASCII data.

Please don't think of it as a slowdown... It's still much faster than 2.x, which
manages 130MB/s on the same data.

Regards

Antoine.

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to