At 12:03 AM 2/14/2006 +0100, M.-A. Lemburg wrote: >The conversion from Unicode to bytes is different in this >respect, since you are converting from a "bigger" type to >a "smaller" one. Choosing latin-1 as default for this >conversion would give you all 8 bits, instead of just 7 >bits that ASCII provides.
I was just pointing out that since byte strings are bytes by definition, then simply putting those bytes in a bytes() object doesn't alter the existing encoding. So, using latin-1 when converting a string to bytes actually seems like the the One Obvious Way to do it. I'm so accustomed to being wary of encoding issues that the idea doesn't *feel* right at first - I keep going, "but you can't know what encoding those bytes are". Then I go, Duh, that's the point. If you convert str->bytes, there's no conversion and no interpretation - neither the str nor the bytes object knows its encoding, and that's okay. So str(bytes_object) (in 2.x) should also just turn it back to a normal bytestring. In fact, the 'encoding' argument seems useless in the case of str objects, and it seems it should default to latin-1 for unicode objects. The only use I see for having an encoding for a 'str' would be to allow confirming that the input string in fact is valid for that encoding. So, "bytes(some_str,'ascii')" would be an assertion that some_str must be valid ASCII. > > So, it sounds like making the encoding default to latin-1 would be a > > reasonably safe approach in both 2.x and 3.x. > >Reasonable for bytes(): yes. In general: no. Right, I was only talking about bytes(). For 3.0, the type formerly known as "str" won't exist, so only the Unicode part will be relevant then. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com