Re: (A Possible Solution) Re: preferred way to set encoding for print
On Sep 16, 2009, at 12:39 PM, ~flow wrote: so: how can i tell python, in a configuration or using a setting in sitecustomize.py, or similar, to use utf-8 as a default encoding? [snip Stdout_writer_with_ncrs solution] This should work: sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding=sys.stdout.encoding, errors='xmlcharrefreplace') http://mail.python.org/pipermail/python-list/2009-August/725100.html -Miles -- http://mail.python.org/mailman/listinfo/python-list
Re: (A Possible Solution) Re: preferred way to set encoding for print
"~flow" wrote in message news:643ca91c-b81c-483c-a8af-65c93b593...@r33g2000vbp.googlegroups.com... On Sep 16, 7:16 am, "Mark Tolonen" wrote: Setting PYTHONIOENCODING overrides the encoding used for stdin/stdout/stderr (See the Python help for details), but if your terminal doesn't support the encoding that won't help. [snip] what has changed in python is that they now somehow find out about the terminal's encoding, and then put that encoding into place and defend it with teeth and claws. it is simply not easy to take control of that setting. A couple more tips, PYTHONIOENCODING takes an optional errorhandler: C:\>set PYTHONIOENCODING=cp437:xmlcharrefreplace C:\>python Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. print('Hello \u5000\u5001') Hello 倀倁 You can also write directly to stdout with byte strings (Note: my terminal doesn't support UTF-8, but no error): import sys sys.stdout.buffer.write('\u5000'.encode('utf8')) sÇÇ3 -Mark -- http://mail.python.org/mailman/listinfo/python-list
Re: preferred way to set encoding for print
Mark Tolonen wrote: ('utf-8')`, but that has no effect in py3.0.1. also, i cannot set Even if not relevant to your immediate problem, if you can, upgrade to 3.1, with its many important bug fixes. -- http://mail.python.org/mailman/listinfo/python-list
(A Possible Solution) Re: preferred way to set encoding for print
On Sep 16, 7:16 am, "Mark Tolonen" wrote: > Setting PYTHONIOENCODING overrides the encoding used for stdin/stdout/stderr > (See the Python help for details), but if your terminal doesn't support the > encoding that won't help. thx for these two tips. of course, that was a bit misleading by me to complain that a cp850 terminal can't display chinese characters from pythonit cannot do it all, of course. i've gone on to experiment. what i do not want is python to stop execution when an encoding error occurs on printing and perhaps logging. so far, i used to do this by convincing python to use utf-8 in any and all cases, and then live with the amount garbish that appears on screen when using cp850 and cp1252 terminals. what has changed in python is that they now somehow find out about the terminal's encoding, and then put that encoding into place and defend it with teeth and claws. it is simply not easy to take control of that setting. this is in itself unfortunate; i believe that users should have a right to determine what to do in case of stdout encoding problems. these are a little different from i-wrote-to-that-file-and-boom experiences. *there* the encoding exception is fully warranted, and could be easily fixed by allowing a less-than-strict encoding mode. but print is different, and of all situations where encoding errors can occur, this is the hardest to take hold of. and much more so in python3 it seems than in python2. printing to the screen is often purely meta-informative in nature, a side-effect e.g. of a webserver really doing web pages. i don't want to bring my entire system down just because some output into some terminal in the back orifice produced a some amount of grabish. maybe only a single chinese character amongst thousands of done this done that red tape. i think web browsers are a good example here. i don't know whether it was a good idea to let clients reassemble broken web pages in an order as they see fit, but the policy to just output broken encoding character instances instead of terminating the browser process with a lengthy stacktrace was probably somehow good for the poopularity of the web as we know it. my current patch looks like this: class Stdout_writer_with ncrs( object ): def write( self, p ): """See to it that all write encodings are done using numerical character references (NCRs) that circumvents Python’s default behavior of raising an exception whenever it encounters an unrepresentable character while printing.""" enc = sys.__stdout__.encoding p = p if isinstance( p, str ) else str( p ) p = p.encode( enc, 'xmlcharrefreplace' ).decode( enc ) sys.__stdout__.write( p ) sys.stdout = Stdout_writer_with ncrs() this method picks up anything to be printed, makes sure it is a text, and then encodes it to the terminal encoding using numerical character references (NCRs), then decodes it again since the underlying wrapper class wants to do encodings itself and refuses bytes in place of strings to be sent (again, this is not nice: an array of byte values sent to the print method is a clear request to send exactly those bytes, verbatim, one by one, to the terminal. no mucking around with my bytes, pls! maybe i can implement that in the code above, too.) of course, this simplistic scaffold will break if anyone uses sys.stdout for anything but issue sys.stdout.write(), but so far it has worked fine despite of being a defective, tiny shim. maybe inheriting from sys.stdout.__class__ would help. > "_wolf" wrote in message > > news:22991c72-d00f-45cd-9bf7-0b80fc431...@k26g2000vbp.googlegroups.com... > > > > > hi folks, > > > i am doing my first steps in the wonderful world of python 3. > > > some things are good. > > some things have to be relearned. > > some things drive me crazy. > > > sadly, i'm working on a windows box. which, in germany, entails that > > python thinks it to be a good idea to take cp1252 as the default > > encoding. > > > so just coz i got my box in germany means i can never print out a > > chinese character? say what? > > > i have no troubles with people configuring their python installation > > to use any encoding in the world, but wouldn't it have been less of a > > surprise to just assume utf-8 for any file in/output? after all, it is > > already the default for python source files as far as i understand. > > someone might think they're clever to sniff into the system and make > > the somehwat educated guess that this dude's using cp1252 for his > > files. but they would be wrong. > > > so: how can i tell python, in a configuration or using a setting in > > sitecustomize.py, or similar, to use utf-8 as a default encoding? > > there used to be a trick to say `reload(sys);sys.setdefaultencoding > > ('utf-8')`, but that has no effect in py3.0.1. also, i cannot set > > `sys.stdout.encoding`; is there a way to re-open that stream with a > > different encoding? > > > in all, i belie
Re: preferred way to set encoding for print
"_wolf" wrote in message news:22991c72-d00f-45cd-9bf7-0b80fc431...@k26g2000vbp.googlegroups.com... hi folks, i am doing my first steps in the wonderful world of python 3. some things are good. some things have to be relearned. some things drive me crazy. sadly, i'm working on a windows box. which, in germany, entails that python thinks it to be a good idea to take cp1252 as the default encoding. so just coz i got my box in germany means i can never print out a chinese character? say what? i have no troubles with people configuring their python installation to use any encoding in the world, but wouldn't it have been less of a surprise to just assume utf-8 for any file in/output? after all, it is already the default for python source files as far as i understand. someone might think they're clever to sniff into the system and make the somehwat educated guess that this dude's using cp1252 for his files. but they would be wrong. so: how can i tell python, in a configuration or using a setting in sitecustomize.py, or similar, to use utf-8 as a default encoding? there used to be a trick to say `reload(sys);sys.setdefaultencoding ('utf-8')`, but that has no effect in py3.0.1. also, i cannot set `sys.stdout.encoding`; is there a way to re-open that stream with a different encoding? in all, i believe it is quite unsettling to me to see that, on my py3 installation, sys.getdefaultencoding() == 'utf-8' sys.stdout.encoding == 'cp1252' locale.getlocale() == (None, None) locale.getdefaultlocale() == ('de_DE', 'cp1252') which to me makes as much sense as a blackcurrant tart thrown into space. worse, locale.setlocale( locale.LC_ALL, locale.getdefaultlocale() ) results in locale.Error: unsupported locale setting this bloody thing doesn't accept its *own* output. attempts to feed that locale beast with anything but the empty string or 'C' were all doomed. it would take a very patient and eloquent person to explain that in a credible fashion to me. my word for this is, 'broken'. i would very much like to rid myself of these considerations. just say it's all utf-8, wash'n'go. my attempts of changing python's mind using the locale module have failed so far. otherwise, i for one don't want to touch that locale thing with a very long pole. as far as i can see, it does not work as documented. the platform dependencies are also a clear OFF LIMITS sign to me. any suggestions? What specifically do you want to do? I work with Chinese all the time on a U.S. Windows system. Do you want to print Chinese characters in a console window? In a Python IDE? FYI, I don't use the locale module for much at all. I can't type or print Chinese to a console window unless I change Control Panel, Regional and Language Options, Advanced Tab, Language for non-Unicode Programs to a Chinese selection (and reboot). Then the default sys.stdout.encoding is something like cp936. The Pythonwin IDE in the latest version of pywin32, however, supports UTF-8 in its interactive window and displays Chinese fine. Setting PYTHONIOENCODING overrides the encoding used for stdin/stdout/stderr (See the Python help for details), but if your terminal doesn't support the encoding that won't help. Let me know what you're trying to do. -Mark -- http://mail.python.org/mailman/listinfo/python-list
preferred way to set encoding for print
hi folks, i am doing my first steps in the wonderful world of python 3. some things are good. some things have to be relearned. some things drive me crazy. sadly, i'm working on a windows box. which, in germany, entails that python thinks it to be a good idea to take cp1252 as the default encoding. so just coz i got my box in germany means i can never print out a chinese character? say what? i have no troubles with people configuring their python installation to use any encoding in the world, but wouldn't it have been less of a surprise to just assume utf-8 for any file in/output? after all, it is already the default for python source files as far as i understand. someone might think they're clever to sniff into the system and make the somehwat educated guess that this dude's using cp1252 for his files. but they would be wrong. so: how can i tell python, in a configuration or using a setting in sitecustomize.py, or similar, to use utf-8 as a default encoding? there used to be a trick to say `reload(sys);sys.setdefaultencoding ('utf-8')`, but that has no effect in py3.0.1. also, i cannot set `sys.stdout.encoding`; is there a way to re-open that stream with a different encoding? in all, i believe it is quite unsettling to me to see that, on my py3 installation, sys.getdefaultencoding() == 'utf-8' sys.stdout.encoding == 'cp1252' locale.getlocale() == (None, None) locale.getdefaultlocale() == ('de_DE', 'cp1252') which to me makes as much sense as a blackcurrant tart thrown into space. worse, locale.setlocale( locale.LC_ALL, locale.getdefaultlocale() ) results in locale.Error: unsupported locale setting this bloody thing doesn't accept its *own* output. attempts to feed that locale beast with anything but the empty string or 'C' were all doomed. it would take a very patient and eloquent person to explain that in a credible fashion to me. my word for this is, 'broken'. i would very much like to rid myself of these considerations. just say it's all utf-8, wash'n'go. my attempts of changing python's mind using the locale module have failed so far. otherwise, i for one don't want to touch that locale thing with a very long pole. as far as i can see, it does not work as documented. the platform dependencies are also a clear OFF LIMITS sign to me. any suggestions? cheers, ~flow -- http://mail.python.org/mailman/listinfo/python-list