2011/6/11 Sérgio Monteiro Basto <sergi...@sapo.pt>: > ok after thinking about this, this problem exist because Python want be > smart with ttys
The *anomaly* (not problem) exists because Python has a way of being told a target encoding. If two parties agree on an encoding, they can send characters to each other. I had this discussion at work a while ago; my boss was talking about being "binary-safe" (which really meant "8-bit safe"), while I was saying that we should support, verify, and demand properly-formed UTF-8. The main significance is that agreeing on an encoding means we can change the encoding any time it's convenient, without having to document that we've changed the data - because we haven't. I can take the number "twelve thousand three hundred and forty-five" and render that as a string of decimal digits as "12345", or as hexadecimal digits as "3039", but I haven't changed the number. If you know that I'm giving you a string of decimal digits, and I give you "12345", you will get the same number at the far side. Python has agreed with stdout that it will send it characters encoded in UTF-8. Having made that agreement, Python and stdout can happily communicate in characters, not bytes. You don't need to explicitly encode your characters into bytes - and in fact, this would be a very bad thing to do, because you don't know _what_ encoding stdout is using. If it's expecting UTF-16, you'll get a whole lot of rubbish if you send it UTF-8 - but it'll look fine if you send it Unicode. Chris Angelico -- http://mail.python.org/mailman/listinfo/python-list