Marc-Andre Lemburg <m...@egenix.com> added the comment:

Alexander Belopolsky wrote:
> 
> Alexander Belopolsky <belopol...@users.sourceforge.net> added the comment:
> 
> On Thu, Feb 24, 2011 at 11:01 AM, Marc-Andre Lemburg
> <rep...@bugs.python.org> wrote:
> ..
>> On this ticker, we're discussing just one application area: that
>> of the builtin short cuts.
>>
> Fair enough.  I was hoping to close this ticket by simply committing
> the posted patch, but it looks like people want to do more.  I don't
> think we'll get measurable performance gains but may improve code
> understandability.
> 
>> To have more encoding name variants benefit from the optimization,
>> we might want to enhance that particular normalization function
>> to avoid having to compare against "utf8" and "utf-8" in the
>> encode/decode functions.
> 
> Which function are you talking about?
> 
> 1. normalize_encoding() in unicodeobject.c
> 2. normalizestring() in codecs.c

The first one, since that's being used by the shortcuts.

> The first is s.lower().replace('-', '_') and the second is

It does this: s.lower().replace('_', '-')

> s.lower().replace(' ', '_'). (Note space vs. dash difference.)
> 
> Why do we need both?  And why should they be different?

Because the first is specifically used for the shortcuts
(which can do more without breaking anything, since it's
only used internally) and the second prepares the encoding
names for lookup in the codec registry (which has a PEP100
defined behavior we cannot easily change).

----------
title: b'x'.decode('latin1') is much slower     than    b'x'.decode('latin-1') 
-> b'x'.decode('latin1') is much slower than     b'x'.decode('latin-1')

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue11303>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to