Re: the stupid encoding problem to stdout
2011/6/14 Sérgio Monteiro Basto : > And see, I can send ascii and utf-8 to utf-8 output and never have problems, > but if I send ascii and utf-8 to ascii files sometimes got encode errors. > If something fits inside 7-bit ASCII, it is by definition valid UTF-8. This is not a coincidence. Those hours you've spent grokking this are not wasted, if you now have a comprehension of characters vs encodings. More people in the world need to understand that difference! :) Chris Angelico -- http://mail.python.org/mailman/listinfo/python-list
Re: the stupid encoding problem to stdout
Ian Kelly wrote: > If you want your output to behave that way, then all you have to do is > specify that with an explicit encode step. ok >> If we want we change default for whatever we want, but without this >> "default change" Python should not change his behavior depending on >> output. yeah I prefer strange output for a different platform, to a >> decode errors. > > Sorry, I disagree. If your program is going to fail, it's better that > it fail noisily (with an error) than silently (with no notice that > anything is wrong). Hi, ok a little resume, I got the solution which is setting env with PYTHONIOENCODING=utf-8, which if it was a default for modern GNU Linux, was made me save lots of time. My practical problem is simple like, I make a script that want run in shell for testing and log to a file when use with a configuration. Everything runs well in a shell and sometimes (later) fails when log to a file, with a "UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position". So to work in both cases (tty and files), I filled all code with string .encode('utf-8') to workaround, when what always I want was use PYTHONIOCONDIG=utf-8. I got anything in utf-8, database is in utf-8, I coding in utf-8, my OS is in utf-8. In last about 3 years of learning Python I lost many many hours to understand this problem. And see, I can send ascii and utf-8 to utf-8 output and never have problems, but if I send ascii and utf-8 to ascii files sometimes got encode errors. So you please consider, at least on Linux, default encode to utf-8 (because we have less problems) or make more clear that pipe to a file is different to a tty and problem was in files that defaults to ascii. Or make the default of IOENCONDIG based on env LANG. Anyway many thanks for your time and for help me out. I don't know how run the things in Python 3 , in python 3 defaults are utf-8 ? Thanks, -- Sérgio M. B. -- http://mail.python.org/mailman/listinfo/python-list
Re: the stupid encoding problem to stdout
2011/6/11 Sérgio Monteiro Basto : > ok after thinking about this, this problem exist because Python want be > smart with ttys The *anomaly* (not problem) exists because Python has a way of being told a target encoding. If two parties agree on an encoding, they can send characters to each other. I had this discussion at work a while ago; my boss was talking about being "binary-safe" (which really meant "8-bit safe"), while I was saying that we should support, verify, and demand properly-formed UTF-8. The main significance is that agreeing on an encoding means we can change the encoding any time it's convenient, without having to document that we've changed the data - because we haven't. I can take the number "twelve thousand three hundred and forty-five" and render that as a string of decimal digits as "12345", or as hexadecimal digits as "3039", but I haven't changed the number. If you know that I'm giving you a string of decimal digits, and I give you "12345", you will get the same number at the far side. Python has agreed with stdout that it will send it characters encoded in UTF-8. Having made that agreement, Python and stdout can happily communicate in characters, not bytes. You don't need to explicitly encode your characters into bytes - and in fact, this would be a very bad thing to do, because you don't know _what_ encoding stdout is using. If it's expecting UTF-16, you'll get a whole lot of rubbish if you send it UTF-8 - but it'll look fine if you send it Unicode. Chris Angelico -- http://mail.python.org/mailman/listinfo/python-list
Re: the stupid encoding problem to stdout
2011/6/10 Sérgio Monteiro Basto : > ok after thinking about this, this problem exist because Python want be > smart with ttys, which is in my point of view is wrong, should not encode to > utf-8, because tty is in utf-8. Python should always encode to the same > thing. If the default is ascii, should always encode to ascii. > yeah should send to tty in ascii, if I send my code to a guy in windows > which use tty with cp1000whatever , shouldn't give decoding errors and > should send in ascii . You can't have your cake and eat it too. If Python needs to output a string in ascii, and that string can't be represented in ascii, then raising an exception is the only reasonable thing to do. You seem to be suggesting that Python should do an implicit output.encode('ascii', 'replace') on all Unicode output, which might be okay for a TTY, but you wouldn't want that for file output; it would allow Python to silently create garbage data. And what if you send your code to somebody with a UTF-16 terminal? You try to output ASCII to that, and you're just going to get complete garbage. If you want your output to behave that way, then all you have to do is specify that with an explicit encode step. > If we want we change default for whatever we want, but without this "default > change" Python should not change his behavior depending on output. > yeah I prefer strange output for a different platform, to a decode errors. Sorry, I disagree. If your program is going to fail, it's better that it fail noisily (with an error) than silently (with no notice that anything is wrong). -- http://mail.python.org/mailman/listinfo/python-list
Re: the stupid encoding problem to stdout
Ben Finney wrote: >> > What should it decode to, then? >> >> UTF-8, as in tty > > But when you explicitly redirect to a file, it's not going to a TTY. > It's going to a file whose encoding isn't known unless you specify it. ok after thinking about this, this problem exist because Python want be smart with ttys, which is in my point of view is wrong, should not encode to utf-8, because tty is in utf-8. Python should always encode to the same thing. If the default is ascii, should always encode to ascii. yeah should send to tty in ascii, if I send my code to a guy in windows which use tty with cp1000whatever , shouldn't give decoding errors and should send in ascii . If we want we change default for whatever we want, but without this "default change" Python should not change his behavior depending on output. yeah I prefer strange output for a different platform, to a decode errors. And I have /usr/bin/iconv . Thanks for attention, sorry about my very limited English. -- Sérgio M. B. -- http://mail.python.org/mailman/listinfo/python-list
Re: the stupid encoding problem to stdout
Le 09/06/2011 04:18, Sérgio Monteiro Basto a écrit : > hi, > cat test.py > #!/usr/bin/env python > #-*- coding: utf-8 -*- > u = u'moçambique' > print u.encode("utf-8") > print u > > chmod +x test.py > ../test.py > moçambique > moçambique The following tries to encode before to print. If you pass an already utf-8 object, it just print it; if not it encode it. All the "print" statements pass by MyPrint.write #!/usr/bin/env python #-*- coding: utf-8 -*- import sys class MyPrint(object): def __init__(self): self.old_stdout=sys.stdout sys.stdout=self def write(self,text): try: encoded=text.encode("utf8") except UnicodeDecodeError: encoded=text self.old_stdout.write(encoded) MyPrint() u = u'moçambique' print u.encode("utf-8") print u TEST : $ ./test.py moçambique moçambique $ ./test.py > test.txt $ cat test.txt moçambique moçambique By the way, my code will not help for error message. I think that the errors are printed by sys.stderr.write. So if you want to do raise "moçambique" you should think about add stderr to the class MyPrint If you know French, I strongly recommend "Comprendre les erreurs unicode" by Victor Stinner : http://dl.afpy.org/pycon-fr-09/Comprendre_les_erreurs_unicode.pdf Have a nice day Laurent -- http://mail.python.org/mailman/listinfo/python-list
Re: the stupid encoding problem to stdout
Le 09/06/2011 04:18, Sérgio Monteiro Basto a écrit : > hi, > cat test.py > #!/usr/bin/env python > #-*- coding: utf-8 -*- > u = u'moçambique' > print u.encode("utf-8") > print u > > chmod +x test.py > ../test.py > moçambique > moçambique The following tries to encode before to print. If you pass an already utf-8 object, it just print it; if not it encode it. All the "print" statements pass by MyPrint.write #!/usr/bin/env python #-*- coding: utf-8 -*- import sys class MyPrint(object): def __init__(self): self.old_stdout=sys.stdout sys.stdout=self def write(self,text): try: encoded=text.encode("utf8") except UnicodeDecodeError: encoded=text self.old_stdout.write(encoded) MyPrint() u = u'moçambique' print u.encode("utf-8") print u TEST : $ ./test.py moçambique moçambique $ ./test.py > test.txt $ cat test.txt moçambique moçambique By the way, my code will not help for error message. I think that the errors are printed by sys.stderr.write. So if you want to do raise "moçambique" you should think about add stderr to the class MyPrint If you know French, I strongly recommend "Comprendre les erreurs unicode" by Victor Stinner : http://dl.afpy.org/pycon-fr-09/Comprendre_les_erreurs_unicode.pdf Have a nice day Laurent -- http://mail.python.org/mailman/listinfo/python-list
Re: the stupid encoding problem to stdout
Le 09/06/2011 04:18, Sérgio Monteiro Basto a écrit : > hi, > cat test.py > #!/usr/bin/env python > #-*- coding: utf-8 -*- > u = u'moçambique' > print u.encode("utf-8") > print u > > chmod +x test.py > ../test.py > moçambique > moçambique The following tries to encode before to print. If you pass an already utf-8 object, it just print it; if not it encode it. All the "print" statements pass by MyPrint.write #!/usr/bin/env python #-*- coding: utf-8 -*- import sys class MyPrint(object): def __init__(self): self.old_stdout=sys.stdout sys.stdout=self def write(self,text): try: encoded=text.encode("utf8") except UnicodeDecodeError: encoded=text self.old_stdout.write(encoded) MyPrint() u = u'moçambique' print u.encode("utf-8") print u TEST : $ ./test.py moçambique moçambique $ ./test.py > test.txt $ cat test.txt moçambique moçambique By the way, my code will not help for error message. I think that the errors are printed by sys.stderr.write. So if you want to do raise "moçambique" you should think about add stderr to the class MyPrint If you know French, I strongly recommend "Comprendre les erreurs unicode" by Victor Stinner : http://dl.afpy.org/pycon-fr-09/Comprendre_les_erreurs_unicode.pdf Have a nice day Laurent -- http://mail.python.org/mailman/listinfo/python-list
Re: the stupid encoding problem to stdout
Ben Finney wrote: >> >> Exactly the opposite , if python don't know the encoding should not >> >> try decode to ASCII. > > Are you advocating that Python should refuse to write characters unless > the encoding is specified? I could sympathise with that, but currently > that's not what Python does; instead it defaults to the ASCII codec. could be a solution ;) or a smarter default based on LANG for example (as many GNU does). -- Sérgio M. B. -- http://mail.python.org/mailman/listinfo/python-list
Re: the stupid encoding problem to stdout
Sérgio Monteiro Basto writes: > Nobody wrote: > > >> Exactly the opposite , if python don't know the encoding should not > >> try decode to ASCII. Are you advocating that Python should refuse to write characters unless the encoding is specified? I could sympathise with that, but currently that's not what Python does; instead it defaults to the ASCII codec. > > What should it decode to, then? > > UTF-8, as in tty But when you explicitly redirect to a file, it's *not* going to a TTY. It's going to a file whose encoding isn't known unless you specify it. -- \ “Reality must take precedence over public relations, for nature | `\cannot be fooled.” —Richard P. Feynman | _o__) | Ben Finney -- http://mail.python.org/mailman/listinfo/python-list
Re: the stupid encoding problem to stdout
Mark Tolonen wrote: > > "Sérgio Monteiro Basto" wrote in message > news:4df137a7$0$30580$a729d...@news.telepac.pt... > >> How I change sys.stdout.encoding always to UTF-8 ? at least have a >> consistent sys.stdout.encoding > > There is an environment variable that can force Python I/O to be a specfic > encoding: > > PYTHONIOENCODING=utf-8 Excellent thanks , double thanks. BTW: should be set by default on a utf-8 systems like Fedora, Ubuntu, Debian , Redhat, and all Linuxs. For sure I will put this on startup of my systems. > -Mark -- Sérgio M. B. -- http://mail.python.org/mailman/listinfo/python-list
Re: the stupid encoding problem to stdout
Nobody wrote: >> Exactly the opposite , if python don't know the encoding should not try >> decode to ASCII. > > What should it decode to, then? UTF-8, as in tty, how I change this default ? > You can't write characters to a stream, only bytes. > ok got the point . Thanks, -- http://mail.python.org/mailman/listinfo/python-list
Re: the stupid encoding problem to stdout
"Sérgio Monteiro Basto" wrote in message news:4df137a7$0$30580$a729d...@news.telepac.pt... How I change sys.stdout.encoding always to UTF-8 ? at least have a consistent sys.stdout.encoding There is an environment variable that can force Python I/O to be a specfic encoding: PYTHONIOENCODING=utf-8 -Mark -- http://mail.python.org/mailman/listinfo/python-list
Re: the stupid encoding problem to stdout
On 6/9/2011 5:46 PM, Nobody wrote: On Thu, 09 Jun 2011 22:14:17 +0100, Sérgio Monteiro Basto wrote: Exactly the opposite , if python don't know the encoding should not try decode to ASCII. What should it decode to, then? You can't write characters to a stream, only bytes. I want python don't care about encoding terminal and send characters as they are or for a file . You can't write characters to a stream, only bytes. Characters, representations are for people, byte representations are for computers. -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: the stupid encoding problem to stdout
Sérgio Monteiro Basto writes: > Ben Finney wrote: > > > In this case your shell has no preference for the encoding (since > > you're redirecting output to a file). > > How I say to python that I want that write in utf-8 to files ? You already did: > > In the first print statement you specify the encoding UTF-8, which > > is capable of encoding the characters. If you want UTF-8 on the byte stream for a file, specify it when opening the file, or when reading or writing the file. -- \ “But Marge, what if we chose the wrong religion? Each week we | `\ just make God madder and madder.” —Homer, _The Simpsons_ | _o__) | Ben Finney -- http://mail.python.org/mailman/listinfo/python-list
Re: the stupid encoding problem to stdout
On Thu, 09 Jun 2011 22:14:17 +0100, Sérgio Monteiro Basto wrote: > Exactly the opposite , if python don't know the encoding should not try > decode to ASCII. What should it decode to, then? You can't write characters to a stream, only bytes. > I want python don't care about encoding terminal and send characters as they > are or for a file . You can't write characters to a stream, only bytes. -- http://mail.python.org/mailman/listinfo/python-list
Re: the stupid encoding problem to stdout
Ben Finney wrote: > Sérgio Monteiro Basto writes: > >> ./test.py >> moçambique >> moçambique > > In this case your terminal is reporting its encoding to Python, and it's > capable of taking the UTF-8 data that you send to it in both cases. > >> ./test.py > output.txt >> Traceback (most recent call last): >> File "./test.py", line 5, in >> print u >> UnicodeEncodeError: 'ascii' codec can't encode character >> u'\xe7' in position 2: ordinal not in range(128) > > In this case your shell has no preference for the encoding (since you're > redirecting output to a file). > How I say to python that I want that write in utf-8 to files ? > In the first print statement you specify the encoding UTF-8, which is > capable of encoding the characters. > > In the second print statement you haven't specified any encoding, so the > default ASCII encoding is used. > > > Moral of the tale: Make sure an encoding is specified whenever data > steps between bytes and characters. > >> Don't seems logic, when send things to a file the beaviour change. > > They're different files, which have been opened with different > encodings. If you want a different encoding, you need to specify that. > -- http://mail.python.org/mailman/listinfo/python-list
Re: the stupid encoding problem to stdout
Benjamin Kaplan wrote: > 2011/6/8 Sérgio Monteiro Basto : >> hi, >> cat test.py >> #!/usr/bin/env python >> #-*- coding: utf-8 -*- >> u = u'moçambique' >> print u.encode("utf-8") >> print u >> >> chmod +x test.py >> ./test.py >> moçambique >> moçambique >> >> ./test.py > output.txt >> Traceback (most recent call last): >> File "./test.py", line 5, in >> print u >> UnicodeEncodeError: 'ascii' codec can't encode character >> u'\xe7' in position 2: ordinal not in range(128) >> >> in python 2.7 >> how I explain to python to send the same thing to stdout and >> the file output.txt ? >> >> Don't seems logic, when send things to a file the beaviour >> change. >> >> Thanks, >> Sérgio M. B. > > That's not a terminal vs file thing. It's a "file that declares it's > encoding" vs a "file that doesn't declare it's encoding" thing. Your > terminal declares that it is UTF-8. So when you print a Unicode string > to your terminal, Python knows that it's supposed to turn it into > UTF-8. When you pipe the output to a file, that file doesn't declare > an encoding. So rather than guess which encoding you want, Python > defaults to the lowest common denominator: ASCII. If you want > something to be a particular encoding, you have to encode it yourself. Exactly the opposite , if python don't know the encoding should not try decode to ASCII. > > You have a couple of choices on how to make it work: > 1) Play dumb and always encode as UTF-8. This would look really weird > if someone tried running your program in a terminal with a CP-847 > encoding (like cmd.exe on at least the US version of Windows), but it > would never crash. I want python don't care about encoding terminal and send characters as they are or for a file . > 2) Check sys.stdout.encoding. If it's ascii, then encode your unicode > string in the string-escape encoding, which substitutes the escape > sequence in for all non-ASCII characters. How I change sys.stdout.encoding always to UTF-8 ? at least have a consistent sys.stdout.encoding > 3) Check to see if sys.stdout.isatty() and have different behavior for > terminals vs files. If you're on a terminal that doesn't declare its > encoding, encoding it as UTF-8 probably won't help. If you're writing > to a file, that might be what you want to do. Thanks, -- http://mail.python.org/mailman/listinfo/python-list
Re: the stupid encoding problem to stdout
2011/6/8 Sérgio Monteiro Basto : > hi, > cat test.py > #!/usr/bin/env python > #-*- coding: utf-8 -*- > u = u'moçambique' > print u.encode("utf-8") > print u > > chmod +x test.py > ./test.py > moçambique > moçambique > > ./test.py > output.txt > Traceback (most recent call last): > File "./test.py", line 5, in > print u > UnicodeEncodeError: 'ascii' codec can't encode character > u'\xe7' in position 2: ordinal not in range(128) > > in python 2.7 > how I explain to python to send the same thing to stdout and > the file output.txt ? > > Don't seems logic, when send things to a file the beaviour > change. > > Thanks, > Sérgio M. B. That's not a terminal vs file thing. It's a "file that declares it's encoding" vs a "file that doesn't declare it's encoding" thing. Your terminal declares that it is UTF-8. So when you print a Unicode string to your terminal, Python knows that it's supposed to turn it into UTF-8. When you pipe the output to a file, that file doesn't declare an encoding. So rather than guess which encoding you want, Python defaults to the lowest common denominator: ASCII. If you want something to be a particular encoding, you have to encode it yourself. You have a couple of choices on how to make it work: 1) Play dumb and always encode as UTF-8. This would look really weird if someone tried running your program in a terminal with a CP-847 encoding (like cmd.exe on at least the US version of Windows), but it would never crash. 2) Check sys.stdout.encoding. If it's ascii, then encode your unicode string in the string-escape encoding, which substitutes the escape sequence in for all non-ASCII characters. 3) Check to see if sys.stdout.isatty() and have different behavior for terminals vs files. If you're on a terminal that doesn't declare its encoding, encoding it as UTF-8 probably won't help. If you're writing to a file, that might be what you want to do. -- http://mail.python.org/mailman/listinfo/python-list
Re: the stupid encoding problem to stdout
Sérgio Monteiro Basto writes: > ./test.py > moçambique > moçambique In this case your terminal is reporting its encoding to Python, and it's capable of taking the UTF-8 data that you send to it in both cases. > ./test.py > output.txt > Traceback (most recent call last): > File "./test.py", line 5, in > print u > UnicodeEncodeError: 'ascii' codec can't encode character > u'\xe7' in position 2: ordinal not in range(128) In this case your shell has no preference for the encoding (since you're redirecting output to a file). In the first print statement you specify the encoding UTF-8, which is capable of encoding the characters. In the second print statement you haven't specified any encoding, so the default ASCII encoding is used. Moral of the tale: Make sure an encoding is specified whenever data steps between bytes and characters. > Don't seems logic, when send things to a file the beaviour change. They're different files, which have been opened with different encodings. If you want a different encoding, you need to specify that. -- \ “There's no excuse to be bored. Sad, yes. Angry, yes. | `\Depressed, yes. Crazy, yes. But there's no excuse for boredom, | _o__) ever.” —Viggo Mortensen | Ben Finney -- http://mail.python.org/mailman/listinfo/python-list
the stupid encoding problem to stdout
hi, cat test.py #!/usr/bin/env python #-*- coding: utf-8 -*- u = u'moçambique' print u.encode("utf-8") print u chmod +x test.py ./test.py moçambique moçambique ./test.py > output.txt Traceback (most recent call last): File "./test.py", line 5, in print u UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 2: ordinal not in range(128) in python 2.7 how I explain to python to send the same thing to stdout and the file output.txt ? Don't seems logic, when send things to a file the beaviour change. Thanks, Sérgio M. B. -- http://mail.python.org/mailman/listinfo/python-list