Re: the official way of printing unicode strings
On Sun, 14 Dec 2008 06:48:19 +0100, Piotr Sobolewski wrote: Then I tried to do this that way: sys.stdout = codecs.getwriter(utf-8)(sys.__stdout__) s = uStanisław Lem print u This works but is even more combersome. So, my question is: what is the official, recommended Python way? I'd make that first line: sys.stdout = codecs.getwriter('utf-8')(sys.stdout) Why is it even more cumbersome to execute that line *once* instead encoding at every ``print`` statement? Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list
Re: the official way of printing unicode strings
Marc 'BlackJack' Rintsch wrote: I'd make that first line: sys.stdout = codecs.getwriter('utf-8')(sys.stdout) Why is it even more cumbersome to execute that line *once* instead encoding at every ``print`` statement? Oh, maybe it's not cumbersome, but a little bit strange - but sure, I can get used to it. My main problem is that when I use some language I want to use it the way it is supposed to be used. Usually doing like that saves many problems. Especially in Python, where there is one official way to do any elementary task. And I just want to know what is the normal, official way of printing unicode strings. I mean, the question is not how can I print the unicode string but how the creators of the language suppose me to print the unicode string. I couldn't find an answer to this question in docs, so I hope somebody here knows it. So, is it _the_ python way of printing unicode? -- http://mail.python.org/mailman/listinfo/python-list
Re: the official way of printing unicode strings
On Sun, 2008-12-14 at 11:16 +0100, Piotr Sobolewski wrote: Marc 'BlackJack' Rintsch wrote: I'd make that first line: sys.stdout = codecs.getwriter('utf-8')(sys.stdout) Why is it even more cumbersome to execute that line *once* instead encoding at every ``print`` statement? Oh, maybe it's not cumbersome, but a little bit strange - but sure, I can get used to it. My main problem is that when I use some language I want to use it the way it is supposed to be used. Usually doing like that saves many problems. Especially in Python, where there is one official way to do any elementary task. And I just want to know what is the normal, official way of printing unicode strings. I mean, the question is not how can I print the unicode string but how the creators of the language suppose me to print the unicode string. I couldn't find an answer to this question in docs, so I hope somebody here knows it. So, is it _the_ python way of printing unicode? The right way to print a unicode string is to encode it in the encoding that is appropriate for your needs (which may or may not be UTF-8), and then to print it. What this means in terms of your three examples is that the first and third are correct, and the second is incorrect. The second one breaks when writing to a file, so don't use it. Both the first and third behave in the way that I suggest. The first (print u'foo'.encode('utf-8')) is less cumbersome if you do it once, but the third method (rebinding sys.stdout using codecs.open) is less cumbersome if you'll be doing a lot of printing on stdout. In the end, they are the same method, but one of them introduces another layer of abstraction. If you'll be using more than two print statements that need to be bound to a non-ascii encoding, I'd recommend the third, as it rapidly becomes less cumbersome, the more you print. That said, you should also consider whether you want to rebind sys.stdout or not. It makes your print statements less verbose, but it also loses your reference to the basic stdout. What if you want to print using UTF-8 for a while, but then you need to switch to another encoding later? If you've used a new name, you can still refer back to the original sys.stdout. Right: my_out = codecs.getwriter('utf-8')(sys.stdout) print my_out uStuff my_out = codecs.getwriter('ebcdic')(sys.stdout) print my_out uStuff Wrong sys.stdout = codecs.getwriter('utf-8')(sys.stdout) print uStuff sys.stdout = codecs.getwriter('ebcdic')(sys.stdout) # Now sys.stdout is geting encoded twice, and you'll probably # get garbage out. :( print uStuff Though I guess this is why the OP is doing: sys.stdout = codecs.getwriter('utf-8')(sys.__stdout__) That avoids the problem by not rebinding the original file object. sys.__stdout__ is still in its original state. Carry on, then. Cheers, Cliff -- http://mail.python.org/mailman/listinfo/python-list
Re: the official way of printing unicode strings
Piotr Sobolewski nie_dzi...@gazeta.pl writes: in Python (contrary to Perl, for instance) there is one way to do common tasks. More accurately: the ideal is that there should be only one *obvious* way to do things. Other ways may also exist. Could somebody explain me what is the official python way of printing unicode strings? Try these: URL:http://effbot.org/zone/unicode-objects.htm URL:http://www.reportlab.com/i18n/python_unicode_tutorial.html URL:http://www.amk.ca/python/howto/unicode If you want something more official, try the PEP that introduced Unicode objects, PEP 100: URL:http://www.python.org/dev/peps/pep-0100/. I tried to do this such way: s = uStanisław Lem print u.encode('utf-8') This works, but is very cumbersome. Nevertheless, that says everything that needs to be said: You've defined a Unicode text object, and you've printed it specifying which character encoding to use. When dealing with text, the reality is that there is *always* an encoding at the point where program objects must interface to or from a device, such as a file, a keyboard, or a display. There is *no* sensible default encoding, except for the increasingly-inadequate 7-bit ASCII. URL:http://www.joelonsoftware.com/articles/Unicode.html Since there is no sensible default, Python needs to be explicitly told at some point which encoding to use. Then I tried to do this that way: s = uStanisław Lem print u This breaks when I redirect the output of my program to some file, like that: $ example.py log How does it “break”? What behaviour did you expect, and what behaviour did you get instead? -- \ “I hope that after I die, people will say of me: ‘That guy sure | `\owed me a lot of money’.” —Jack Handey | _o__) | Ben Finney -- http://mail.python.org/mailman/listinfo/python-list
Re: the official way of printing unicode strings
My main problem is that when I use some language I want to use it the way it is supposed to be used. Usually doing like that saves many problems. Especially in Python, where there is one official way to do any elementary task. And I just want to know what is the normal, official way of printing unicode strings. I mean, the question is not how can I print the unicode string but how the creators of the language suppose me to print the unicode string. I couldn't find an answer to this question in docs, so I hope somebody here knows it. So, is it _the_ python way of printing unicode? The official way to write Unicode strings into a file is not to do that. Explicit is better then implicit - always explicitly pick an encoding, and encode the Unicode string to that encoding. Doing so is possible in any of the forms that you have shown. Now, Python does not mandate any choice of encoding. The right way to encode data is in the encoding that readers of your data expect it in. For printing to the terminal, it is clear what the encoding needs to be (namely, the one that is used by the terminal), hence Python choses that one when printing to the terminal. When printing to the file, the application needs to make a choice. If you have no idea what encoding to use, your best choice is the one returned by locale.getpreferredencoding(). This is the encoding that the user is most likely to expect. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
the official way of printing unicode strings
Hello, in Python (contrary to Perl, for instance) there is one way to do common tasks. Could somebody explain me what is the official python way of printing unicode strings? I tried to do this such way: s = uStanisław Lem print u.encode('utf-8') This works, but is very cumbersome. Then I tried to do this that way: s = uStanisław Lem print u This breaks when I redirect the output of my program to some file, like that: $ example.py log Then I tried to do this that way: sys.stdout = codecs.getwriter(utf-8)(sys.__stdout__) s = uStanisław Lem print u This works but is even more combersome. So, my question is: what is the official, recommended Python way? -- http://mail.python.org/mailman/listinfo/python-list