Re: [Python-Dev] Unicode charmap decoders slow

Martin v. Löwis Tue, 04 Oct 2005 12:50:14 -0700

Walter Dörwald wrote:
> For charmap decoding we might be able to use an array (e.g. a tuple  
> (or an array.array?) of codepoints instead of dictionary.


This array would have to be sparse, of course. Using an array.array
would be more efficient, I guess - but we would need a C API for arrays
(to validate the type code, and to get ob_item).

> Or we could implement this array as a C array (i.e. gencodec.py would  
> generate C code).

For decoding, we would not get any better than array.array, except for
startup cost.

For encoding, having a C trie might give considerable speedup. _codecs
could offer an API to convert the current dictionaries into
lookup-efficient structures, and the conversion would be done when
importing the codec.

For the trie, two levels (higher and lower byte) would probably be
sufficient: I believe most encodings only use 2 "rows" (256 code
point blocks), very few more than three.

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Unicode charmap decoders slow

Reply via email to