Thanks for the answers.  That clears things up quite a bit.

What if your source file is set to utf-8?  Do you then have a proper
UTF-8 string, but the problem is that none of the standard Python
library methods know how to properly interpret UTF-8?

Well, the decode method knows how to decode that bytes into a `unicode`
object if you call it with 'utf-8' as argument.

OK, good to know.

4. In Python 3.0, this silliness goes away, because all strings are
Unicode by default.

Yes and no. The problem just shifts because at some point you get into similar troubles, just in the other direction. Data enters the program
as bytes and must leave it as bytes again, so you have to deal with
encodings at those points.

Yes, but that's still much better than having to litter your code with 'u' prefixes and .decode calls and so on. If I'm using a UTF-8-savvy text editor (as we all should be doing in the 21st century!), and type "foo = '2π'", I should get a string containing a '2' and a pi character, and all the text operations (like counting characters, etc.) should Just Work.

When I read and write files or sockets or whatever, of course I'll have to think about what encoding the text should be... but internal to my own source code, I shouldn't have to.

I understand the need for a transition strategy, which is what we have in 2.x, and that's working well enough. But I'll be glad when it's over. :)

Cheers,
- Joe


--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to