I have written my fastcharmap decoder and encoder.  It's not meant to be
better than the patch and other changes to come in a future version of
Python, but it does work now with the current codecs.  Using Hye-Shik
Chang's benchmark, decoding is about 4.3x faster than the base, and
encoding is about 2x faster than the base (that's comparing the base and
the fast versions on my machine).  If fastcharmap would be useful, please
tell me where I should make it available, and any changes that are needed.
I would also need to write an installer (distutils I guess).

<http://georgeanelson.com/fastcharmap.d.tar.gz>

Fastcharmap is written in Python and Pyrex 0.9.3, and the .pyx file will
need to be compiled before use.  I used:

pyrexc _fastcharmap.pyx
gcc -c -fPIC -I/usr/include/python2.3/ _fastcharmap.c
gcc -shared _fastcharmap.o -o _fastcharmap.so

To use, hook each codec to be speed up:

    import fastcharmap
    help(fastcharmap)
    fastcharmap.hook('name_of_codec')
    u = unicode('some text', 'name_of_codec')
    s = u.encode('name_of_codec')

No codecs were rewritten.  It took me a while to learn enough to do this
(Pyrex, more Python, some Python C API), and there were some surprises.
Hooking in is grosser than I would have liked.  I've only used it on Python
2.3 on FC3.  Still, it should work going forward, and, if the dicts are
replaced by something else, fastcharmap will know to leave everything
alone.  There's still a tiny bit of debugging print statements in it.


>At 8:36 AM +0200 10/5/05, Martin v. Löwis wrote:
>>Tony Nelson wrote:
> ...
>>> Encoding can be made fast using a simple hash table with external chaining.
>>> There are max 256 codepoints to encode, and they will normally be well
>>> distributed in their lower 8 bits.  Hash on the low 8 bits (just mask), and
>>> chain to an area with 256 entries.  Modest storage, normally short chains,
>>> therefore fast encoding.
>>
>>This is what is currently done: a hash map with 256 keys. You are
>>complaining about the performance of that algorithm. The issue of
>>external chaining is likely irrelevant: there likely are no collisions,
>>even though Python uses open addressing.
>
>I think I'm complaining about the implementation, though on decode, not
>encode.
>
>In any case, there are likely to be collisions in my scheme.  Over the
>next few days I will try to do it myself, but I will need to learn Pyrex,
>some of the Python C API, and more about Python to do it.
>
>
>>>>...I suggest instead just /caching/ the translation in C arrays stored
>>>>with the codec object.  The cache would be invalidated on any write to the
>>>>codec's mapping dictionary, and rebuilt the next time anything was
>>>>translated.  This would maintain the present semantics, work with current
>>>>codecs, and still provide the desired speed improvement.
>>
>>That is not implementable. You cannot catch writes to the dictionary.
>
>I should have been more clear.  I am thinking about using a proxy object
>in the codec's 'encoding_map' and 'decoding_map' slots, that will forward
>all the dictionary stuff.  The proxy will delete the cache on any call
>which changes the dictionary contents.  There are proxy classed and
>dictproxy (don't know how its implemented yet) so it seems doable, at
>least as far as I've gotten so far.
>
>
>>> Note that this caching is done by new code added to the existing C
>>> functions (which, if I have it right, are in unicodeobject.c).  No
>>> architectural changes are made; no existing codecs need to be changed;
>>> everything will just work
>>
>>Please try to implement it. You will find that you cannot. I don't
>>see how regenerating/editing the codecs could be avoided.
>
>Will do!
____________________________________________________________________
TonyN.:'                       <mailto:[EMAIL PROTECTED]>
      '                              <http://www.georgeanelson.com/>
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to