Re: [Python-Dev] email package status in 3.X

P.J. Eby Sun, 20 Jun 2010 11:43:08 -0700

At 10:57 AM 6/20/2010 -0700, Guido van Rossum wrote:

The problem comes exactly where you find it: when *porting* existing
code that uses aforementioned ways to alleviate the pain, you find
that the hacks no longer work and a properly layered design is needed
that clearly distinguishes between which variables contain bytes and
which text.

Actually, I would say that it's more that (in the network protocolcase) we *have* bytes, some of which we would like to *treat* astext, yet do not wish to constantly convert back and forth tofull-blown unicode -- especially since the protocols themselvesdesignate ASCII or latin-1 at the transport layer (sometimes withodder encodings above, but these already have to be explicitly dealtwith by existing code).

While reading over this thread, I'm wondering whether at least my(WSGI-related) problems in this area would be solved by theavailability of a type (say "bstr") that was simply a wrapperproviding string-like behavior over an underlying bytes, byte array,or memoryview, that would produce objects of compatible type whencombined with strings (by encoding them to match).

Then, I could wrap bytes with it to pass them to string operations,and then feed them back into everything else. The bstr type ideallywould be directly compatible with bytes I/O, or at least have a.bytes attribute that would be.

It seems like that would reduce WSGI porting issues quite a bit,since it would mostly consist of throwing extra bstr() calls in wherethings are breaking, and maybe grabbing the .bytes attribute for I/O.

This approach would still be explicit as to what types you're workingwith, but would not require O(n) *conversions* at every interactionboundary. It would be limited, of course, to single-byte encodingswith all characters (0-255) valid.

OTOH, maybe there should just be a bytestrings module withbytestrings.ascii and bytestrings.latin1, and between the two thatshould cover the network protocol needs quite well.

Actually, if the Python 3 str() constructor could do O(1) conversionfor the latin-1 case (i.e., just wrapped the underlying bytes), Iwould just put, "bstr = lambda x: str(x,'latin-1')" at the top of myprograms and have roughly the same effect.

This idea is still a bit half-baked, but a more baked version mightbe just the ticket for porting stuff that used str to work with bytesin 2.x, if only because writing, e.g.:


     newurl = bstr(urljoin(bstr(base), 'subdir'))

seems so much saner than writing *this* everywhere:

     newurl = str(urljoin(str(base, 'latin-1'), 'subdir'), 'latin-1')

It is perhaps a bit late to propose this idea, since ideally we wouldalso want to use it in 2.x to aid porting. But I'm curious if anyother people here experiencing byte/unicode woes in relation tonetwork protocols would find this a solution to their chieffrustration. (i.e., that the stdlib often insists now on strings,where effectively bytes were usable before, and thus one must doconversions both coming and going.)


_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

Reply via email to