Re: [Python-Dev] A codecs nit

2006-02-18 Thread M.-A. Lemburg
Barry Warsaw wrote:
> On Wed, 2006-02-15 at 22:07 +0100, M.-A. Lemburg wrote:
> 
>> Those are not pseudo-encodings, they are regular codecs.
>>
>> It's a common misunderstanding that codecs are only seen as serving
>> the purpose of converting between Unicode and strings.
>>
>> The codec system is deliberately designed to be general enough
>> to also work with many other types, e.g. it is easily possible to
>> write a codec that convert between the hex literal sequence you
>> have above to a list of ordinals:
> 
> Slightly off-topic, but one thing that's always bothered me about the
> current codecs implementation is that str.encode() (and friends)
> implicitly treats its argument as module, and imports it, even if the
> module doesn't live in the encodings package.  That seems like a mistake
> to me (and a potential security problem if the import has side-effects).

It was a mistake, yes, and thanks for bringing this up.

Codec packages should implement and register their own
codec search functions.

> I don't know whether at the very least restricting the imports to the
> encodings package would make sense or would break things.
> 
 import sys
 sys.modules['smtplib']
> Traceback (most recent call last):
>   File "", line 1, in ?
> KeyError: 'smtplib'
 ''.encode('smtplib')
> Traceback (most recent call last):
>   File "", line 1, in ?
> LookupError: unknown encoding: smtplib
 sys.modules['smtplib']
> 
> 
> I can't see any reason for allowing any randomly importable module to
> act like an encoding.

The encodings package search function will try to import
the module and then check the module signature. If the
module fails to export the codec registration API, then
it raises the LookupError you see above.

At the time, it was nice to be able to write codec
packages as Python packages and have them readily usable
by just putting the package on the sys.path.

This was a side-effect of the way the encodings search
function worked. The original design idea was to have
all 3rd party codecs register themselves with the
codec registry. However, this implies that the application
using the codecs would have to run the registration
code at least ones. Since the encodings package search
function provided a more convenient way, this was used
by most codec package programmers.

In Py 2.5 we'll change that. The encodings package search
function will only allow codecs in that package to be
imported. All other codec packages will have to provide
their own search function and register this with the
codecs registry.

The big question is: what to do about 2.3 and 2.4 - adding
the same patch will cause serious breakage, since popular
codec packages such as Tamito's Japanese package rely
on the existing behavior.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 18 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A codecs nit

2006-02-21 Thread Barry Warsaw
On Sat, 2006-02-18 at 14:44 +0100, M.-A. Lemburg wrote:

> In Py 2.5 we'll change that. The encodings package search
> function will only allow codecs in that package to be
> imported. All other codec packages will have to provide
> their own search function and register this with the
> codecs registry.

My weekend experimentation used the imp functions to constrain the
module search path to encodings.__path__, but I'm not sure that's much
better than prepending 'encodings.' on the module name and letting
__import__() do its thing.

> The big question is: what to do about 2.3 and 2.4 - adding
> the same patch will cause serious breakage, since popular
> codec packages such as Tamito's Japanese package rely
> on the existing behavior.

FWIW, Mailman has had to do a bunch of special case loading of the 3rd
party Japanese and Korean codecs for older Pythons, and the email
package also has to do special tests for e.g. euc-jp before it'll do the
Asian codec tests.  I think most of the latter is unnecessary for 2.4
and beyond, and I suspect that the former is also unnecessary for 2.4
and beyond.  It's probably still necessary for 2.3.

IIUC, there are still people who prefer Tamito's package over the
built-in Japanese codecs in 2.4, but I don't understand all the details.
My preference would be to backport the fix to 2.4 but not worry about
2.3 since there are no plans to ever release a 2.3.6 AFAIK.

-Barry



signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] A codecs nit (was Re: bytes.from_hex())

2006-02-15 Thread Barry Warsaw
On Wed, 2006-02-15 at 22:07 +0100, M.-A. Lemburg wrote:

> Those are not pseudo-encodings, they are regular codecs.
> 
> It's a common misunderstanding that codecs are only seen as serving
> the purpose of converting between Unicode and strings.
> 
> The codec system is deliberately designed to be general enough
> to also work with many other types, e.g. it is easily possible to
> write a codec that convert between the hex literal sequence you
> have above to a list of ordinals:

Slightly off-topic, but one thing that's always bothered me about the
current codecs implementation is that str.encode() (and friends)
implicitly treats its argument as module, and imports it, even if the
module doesn't live in the encodings package.  That seems like a mistake
to me (and a potential security problem if the import has side-effects).
I don't know whether at the very least restricting the imports to the
encodings package would make sense or would break things.

>>> import sys
>>> sys.modules['smtplib']
Traceback (most recent call last):
  File "", line 1, in ?
KeyError: 'smtplib'
>>> ''.encode('smtplib')
Traceback (most recent call last):
  File "", line 1, in ?
LookupError: unknown encoding: smtplib
>>> sys.modules['smtplib']


I can't see any reason for allowing any randomly importable module to
act like an encoding.
-Barry



signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com