On Feb 14, 2006, at 12:20 AM, Phillip J. Eby wrote: > bytes(map(ord, str_or_unicode)) > > In other words, without an encoding, bytes() should simply treat > str and > unicode objects *as if they were a sequence of integers*, and > produce an > error when an integer is out of range. This is a logical and > consistent > interpretation in the absence of an encoding, because in that case you > don't care about the encoding - it's just raw data.
If you're talking about "raw data", then make bytes(unicodestring) produce what buffer(unicodestring) currently does -- something completely and utterly worthless. :) [it depends on how you compiled python and what endianness your system has.] There really is no case where you don't care about the encoding...there is always a specific desired output encoding, and you have to think about what encoding that is. The argument that latin-1 is a sensible default just because you can convert to latin-1 by chopping off the upper 3 bytes of a unicode character's ordinal position is not convincing; you're still doing an encoding operation, it just happens to be computationally easy. That Jython programs have to pretend that unicode strings are an appropriate way to store bytes, and thus often have to do fake "latin-1" conversions which are really no such thing, doesn't make a convincing argument either. Using unicode strings to store bytes read from or written to a socket is really just broken. Actually having any default encoding at all is IMO a poor idea, but as python has one at the moment (ascii), might as well keep using it for consistency until it's eliminated (sys.setdefaultencoding ('undefined') is my friend.) James _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com