On Fri, 08 Oct 2010 15:31:27 +0200, Hallvard B Furuseth wrote: > Arnaud Delobelle writes: >>Hallvard B Furuseth <h.b.furus...@usit.uio.no> writes: >>> I've been playing a bit with Python3.2a2, and frankly its charset >>> handling looks _less_ safe than in Python 2. (...) >>> With 2.<late> conversion Unicode <-> string the equivalent operation >>> did not silently produce garbage: it raised UnicodeError instead. >>> With old raw Python strings that was not a problem in applications >>> which did not need to convert any charsets, with python3 they can >>> break. >>> >>> I really wish bytes.__str__ would at least by default fail. >> >> I think you misunderstand the purpose of str(). It is to provide a >> (unicode) string representation of an object and has nothing to do with >> converting it to unicode: > > That's not the point - the point is that for 2.* code which _uses_ str > vs unicode, the equivalent 3.* code uses str vs bytes. Yet not the same > way - a 2.* 'str' will sometimes be 3.* bytes, sometime str. So > upgraded old code will have to expect both str and bytes.
I'm sorry, this makes no sense to me. I've read it repeatedly, and I still don't understand what you're trying to say. > In 2.*, str<->unicode conversion failed or produced the equivalent > character/byte data. Yes, there could be charset problems if the > defaults were set up wrong, but that's a smaller problem than in 3.*. In > 3.*, the bytes->str conversion always _silently_ produces garbage. So you say, but I don't see it. Why is this garbage? >>> b = b'abc\xff' >>> str(b) "b'abc\\xff'" That's what I would expect from the str() function called with a bytes argument. Since decoding bytes requires a codec, which you haven't given, it can only return a string representation of the bytes. If you want to decode bytes into a string, you need to specify a codec: >>> >>> str(b, 'latin-1') 'abcÿ' >>> b.decode('latin-1') 'abcÿ' -- Steven -- http://mail.python.org/mailman/listinfo/python-list