>     Right. The real problem is that Python 2.7 doesn't have distinct
> "str" and "bytes" types.  type(bytes() returns <type 'str'>
> "str" is assumed to be ASCII 0..127, but that's not enforced.
> "bytes" and "str" should have been distinct types, but
> that would have broken much old code.  If they were distinct, then
> constructors could distinguish between string type conversion
> (which requires no encoding information) and byte stream decoding.
> 
>     So it's possible to get junk characters in a "str", and they
> won't convert to Unicode.  I've had this happen with databases which
> were supposed to be ASCII, but occasionally a non-ASCII character
> would slip through.

bytes and str are just aliases for each other. 

>>> id( bytes )
505366496
>>> id( str )
505366496
>>> type( bytes )
<type 'type'>
>>> type( str )
<type 'type'>
>>> bytes == str 
True
>>> bytes is str
True


And I do not think they were ever intended to be just 
ASCII because chr() takes 0 - 256 (non-inclusive) and 
returns a str.


Ramit


Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology
712 Main Street | Houston, TX 77002
work phone: 713 - 216 - 5423

--


> -----Original Message-----
> From: python-list-bounces+ramit.prasad=jpmorgan....@python.org
> [mailto:python-list-bounces+ramit.prasad=jpmorgan....@python.org] On Behalf
> Of John Nagle
> Sent: Thursday, March 08, 2012 4:24 PM
> To: python-list@python.org
> Subject: Re: "Decoding unicode is not supported" in unusual situation
> 
> On 3/7/2012 6:18 PM, Ben Finney wrote:
> > Steven D'Aprano<steve+comp.lang.pyt...@pearwood.info>  writes:
> >
> >> On Thu, 08 Mar 2012 08:48:58 +1100, Ben Finney wrote:
> >>> I think that's a Python bug. If the latter succeeds as a no-op, the
> >>> former should also succeed as a no-op. Neither should ever get any
> >>> errors when ‘s’ is a ‘unicode’ object already.
> >>
> >> No. The semantics of the unicode function (technically: a type
> >> constructor) are well-defined, and there are two distinct behaviours:
> 
> 
>     This is all different in Python 3.x, where "str" is Unicode and
> "bytes" really are a distinct type.
> 
>                               John Nagle
> --
> http://mail.python.org/mailman/listinfo/python-list

This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to