On Sun, 2008-12-14 at 11:16 +0100, Piotr Sobolewski wrote: > Marc 'BlackJack' Rintsch wrote: > > > I'd make that first line: > > sys.stdout = codecs.getwriter('utf-8')(sys.stdout) > > > > Why is it even more cumbersome to execute that line *once* instead > > encoding at every ``print`` statement? > > Oh, maybe it's not cumbersome, but a little bit strange - but sure, I can > get used to it. > > My main problem is that when I use some language I want to use it the way it > is supposed to be used. Usually doing like that saves many problems. > Especially in Python, where there is one official way to do any elementary > task. And I just want to know what is the normal, official way of printing > unicode strings. I mean, the question is not "how can I print the unicode > string" but "how the creators of the language suppose me to print the > unicode string". I couldn't find an answer to this question in docs, so I > hope somebody here knows it. > > So, is it _the_ python way of printing unicode? >
The "right way" to print a unicode string is to encode it in the encoding that is appropriate for your needs (which may or may not be UTF-8), and then to print it. What this means in terms of your three examples is that the first and third are correct, and the second is incorrect. The second one breaks when writing to a file, so don't use it. Both the first and third behave in the way that I suggest. The first (print u'foo'.encode('utf-8')) is less cumbersome if you do it once, but the third method (rebinding sys.stdout using codecs.open) is less cumbersome if you'll be doing a lot of printing on stdout. In the end, they are the same method, but one of them introduces another layer of abstraction. If you'll be using more than two print statements that need to be bound to a non-ascii encoding, I'd recommend the third, as it rapidly becomes less cumbersome, the more you print. That said, you should also consider whether you want to rebind sys.stdout or not. It makes your print statements less verbose, but it also loses your reference to the basic stdout. What if you want to print using UTF-8 for a while, but then you need to switch to another encoding later? If you've used a new name, you can still refer back to the original sys.stdout. Right: my_out = codecs.getwriter('utf-8')(sys.stdout) print >> my_out u"Stuff" my_out = codecs.getwriter('ebcdic')(sys.stdout) print >> my_out u"Stuff" Wrong sys.stdout = codecs.getwriter('utf-8')(sys.stdout) print u"Stuff" sys.stdout = codecs.getwriter('ebcdic')(sys.stdout) # Now sys.stdout is geting encoded twice, and you'll probably # get garbage out. :( print u"Stuff" Though I guess this is why the OP is doing: sys.stdout = codecs.getwriter('utf-8')(sys.__stdout__) That avoids the problem by not rebinding the original file object. sys.__stdout__ is still in its original state. Carry on, then. Cheers, Cliff -- http://mail.python.org/mailman/listinfo/python-list