On Thu, 16 Sep 2010 09:52:48 -0400, Barry Warsaw <ba...@python.org> wrote: > On Sep 16, 2010, at 11:28 PM, Nick Coghlan wrote: > >There are some APIs that should be able to handle bytes *or* strings, > >but the current use of string literals in their implementation means > >that bytes don't work. This turns out to be a PITA for some networking > >related code which really wants to be working with raw bytes (e.g. > >URLs coming off the wire). > > Note that email has exactly the same problem. A general solution -- even if > embodied in *well documented* best-practices and convention -- would really > help make the stdlib work consistently, and I bet third party libraries too.
Allowing bytes-in -> bytes-out where possible would definitely be a help (and Guido has endorsed this, IIUC), but some care has to be taken to understand the API contract of the method in question before blindly applying it. Are you "merely" allowing bytes to be processed as ASCII strings, or does processing the bytes *correctly* imply that you are converting from an ASCII encoding of text in order to process it? In Python2, the latter might not generate unicode yet still produce a correct result most of the time, but a big point of Python3 is to eliminate that "most of the time", so we need to be careful not to reintroduce it. This was all covered in the thread Nick refers to; I just want to emphasize that one needs to look at the API contract carefully before making it polymorphic (in Guido's sense of the term). If the way to do this is well documented best practices, we first have to figure out what those best practices are. To do that we have to write some real-world code. I'm trying one approach in email6: Bytes and String subclasses, where the subclasses have an attribute named 'literals' derived from a utility module that does this: literals = dict( empty = '', colon = ':', newline = '\n', space = ' ', tab = '\t', fws = ' \t', headersep = ': ', ) class _string_literals: pass class _bytes_literals: pass for name, value in literals.items(): setattr(_string_literals, name, value) setattr(_bytes_literals, name, bytes(value, 'ASCII')) del literals, name, value And the subclasses do: class BytesHeader(BaseHeader): lit = email.utils._bytes_literals class StringHeader(BaseHeader): lit = email.utils._string_literals And then BaseHeader uses self.lit.colon, etc, when manipulating strings. It also has to use slice notation rather than indexing when looking at individual characters, which is a PITA but not terrible. I'm not saying this is the best approach, since this is all experimental code at the moment, but it is *an* approach.... -- R. David Murray www.bitdance.com _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com