Am 04.10.2005 um 04:25 schrieb [EMAIL PROTECTED]: > As the OP suggests, decoding with a codec like mac-roman or > iso8859-1 is very > slow compared to encoding or decoding with utf-8. Here I'm working > with 53k of > data instead of 53 megs. (Note: this is a laptop, so it's possible > that > thermal or battery management features affected these numbers a > bit, but by a > factor of 3 at most) > > $ timeit.py -s "s='a'*53*1024; u=unicode(s)" "u.encode('utf-8')" > 1000 loops, best of 3: 591 usec per loop > $ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('utf-8')" > 1000 loops, best of 3: 1.25 msec per loop > $ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('mac-roman')" > 100 loops, best of 3: 13.5 msec per loop > $ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('iso8859-1')" > 100 loops, best of 3: 13.6 msec per loop > > With utf-8 encoding as the baseline, we have > decode('utf-8') 2.1x as long > decode('mac-roman') 22.8x as long > decode('iso8859-1') 23.0x as long > > Perhaps this is an area that is ripe for optimization.
For charmap decoding we might be able to use an array (e.g. a tuple (or an array.array?) of codepoints instead of dictionary. Or we could implement this array as a C array (i.e. gencodec.py would generate C code). Bye, Walter Dörwald _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com