> Right. The real problem is that Python 2.7 doesn't have distinct > "str" and "bytes" types. type(bytes() returns <type 'str'> > "str" is assumed to be ASCII 0..127, but that's not enforced. > "bytes" and "str" should have been distinct types, but > that would have broken much old code. If they were distinct, then > constructors could distinguish between string type conversion > (which requires no encoding information) and byte stream decoding. > > So it's possible to get junk characters in a "str", and they > won't convert to Unicode. I've had this happen with databases which > were supposed to be ASCII, but occasionally a non-ASCII character > would slip through.
bytes and str are just aliases for each other. >>> id( bytes ) 505366496 >>> id( str ) 505366496 >>> type( bytes ) <type 'type'> >>> type( str ) <type 'type'> >>> bytes == str True >>> bytes is str True And I do not think they were ever intended to be just ASCII because chr() takes 0 - 256 (non-inclusive) and returns a str. Ramit Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology 712 Main Street | Houston, TX 77002 work phone: 713 - 216 - 5423 -- > -----Original Message----- > From: python-list-bounces+ramit.prasad=jpmorgan....@python.org > [mailto:python-list-bounces+ramit.prasad=jpmorgan....@python.org] On Behalf > Of John Nagle > Sent: Thursday, March 08, 2012 4:24 PM > To: python-list@python.org > Subject: Re: "Decoding unicode is not supported" in unusual situation > > On 3/7/2012 6:18 PM, Ben Finney wrote: > > Steven D'Aprano<steve+comp.lang.pyt...@pearwood.info> writes: > > > >> On Thu, 08 Mar 2012 08:48:58 +1100, Ben Finney wrote: > >>> I think that's a Python bug. If the latter succeeds as a no-op, the > >>> former should also succeed as a no-op. Neither should ever get any > >>> errors when ‘s’ is a ‘unicode’ object already. > >> > >> No. The semantics of the unicode function (technically: a type > >> constructor) are well-defined, and there are two distinct behaviours: > > > This is all different in Python 3.x, where "str" is Unicode and > "bytes" really are a distinct type. > > John Nagle > -- > http://mail.python.org/mailman/listinfo/python-list This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. -- http://mail.python.org/mailman/listinfo/python-list