On 3/8/2012 2:58 PM, Prasad, Ramit wrote:
     Right. The real problem is that Python 2.7 doesn't have distinct
"str" and "bytes" types.  type(bytes() returns<type 'str'>
"str" is assumed to be ASCII 0..127, but that's not enforced.
"bytes" and "str" should have been distinct types, but
that would have broken much old code.  If they were distinct, then
constructors could distinguish between string type conversion
(which requires no encoding information) and byte stream decoding.

     So it's possible to get junk characters in a "str", and they
won't convert to Unicode.  I've had this happen with databases which
were supposed to be ASCII, but occasionally a non-ASCII character
would slip through.

bytes and str are just aliases for each other.

   That's true in Python 2.7, but not in 3.x.  From 2.6 forward,
"bytes" and "str" were slowly being separated.  See PEP 358.
Some of the problems in Python 2.7 come from this ambiguity.
Logically, "unicode" of "str" should be a simple type conversion
from ASCII to Unicode, while "unicode" of "bytes" should
require an encoding.  But because of the bytes/str ambiguity
in Python 2.6/2.7, the behavior couldn't be type-based.

                                John Nagle

Reply via email to