Re: (A Possible Solution) Re: preferred way to set encoding for print

2009-09-16 Thread Miles Kaufmann

On Sep 16, 2009, at 12:39 PM, ~flow wrote:

so: how can i tell python, in a configuration or using a setting in
sitecustomize.py, or similar, to use utf-8 as a default encoding?




[snip Stdout_writer_with_ncrs solution]



This should work:

sys.stdout = io.TextIOWrapper(sys.stdout.buffer,
  encoding=sys.stdout.encoding,
  errors='xmlcharrefreplace')

http://mail.python.org/pipermail/python-list/2009-August/725100.html

-Miles

--
http://mail.python.org/mailman/listinfo/python-list


Re: (A Possible Solution) Re: preferred way to set encoding for print

2009-09-16 Thread Mark Tolonen


"~flow"  wrote in message 
news:643ca91c-b81c-483c-a8af-65c93b593...@r33g2000vbp.googlegroups.com...

On Sep 16, 7:16 am, "Mark Tolonen"  wrote:
Setting PYTHONIOENCODING overrides the encoding used for 
stdin/stdout/stderr
(See the Python help for details), but if your terminal doesn't support 
the

encoding that won't help.

[snip]

what has changed in python is that they now somehow find out about the
terminal's encoding, and then put that encoding into place and defend
it with teeth and claws. it is simply not easy to take control of that
setting.


A couple more tips, PYTHONIOENCODING takes an optional errorhandler:

C:\>set PYTHONIOENCODING=cp437:xmlcharrefreplace
C:\>python
Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit (Intel)] 
on

win32
Type "help", "copyright", "credits" or "license" for more information.

print('Hello \u5000\u5001')

Hello 倀倁

You can also write directly to stdout with byte strings (Note: my terminal 
doesn't support UTF-8, but no error):



import sys
sys.stdout.buffer.write('\u5000'.encode('utf8'))

sÇÇ3

-Mark


--
http://mail.python.org/mailman/listinfo/python-list


Re: preferred way to set encoding for print

2009-09-16 Thread Terry Reedy

Mark Tolonen wrote:


('utf-8')`, but that has no effect in py3.0.1. also, i cannot set


Even if not relevant to your immediate problem, if you can, upgrade to 
3.1, with its many important bug fixes.


--
http://mail.python.org/mailman/listinfo/python-list


(A Possible Solution) Re: preferred way to set encoding for print

2009-09-16 Thread ~flow
On Sep 16, 7:16 am, "Mark Tolonen"  wrote:
> Setting PYTHONIOENCODING overrides the encoding used for stdin/stdout/stderr
> (See the Python help for details), but if your terminal doesn't support the
> encoding that won't help.

thx for these two tips. of course, that was a bit misleading by me to
complain that a cp850 terminal can't display chinese characters from
pythonit cannot do it all, of course.

i've gone on to experiment. what i do not want is python to stop
execution when an encoding error occurs on printing and perhaps
logging. so far, i used to do this by convincing python to use utf-8
in any and all cases, and then live with the amount garbish that
appears on screen when using cp850 and cp1252 terminals.

what has changed in python is that they now somehow find out about the
terminal's encoding, and then put that encoding into place and defend
it with teeth and claws. it is simply not easy to take control of that
setting.

this is in itself unfortunate; i believe that users should have a
right to determine what to do in case of stdout encoding problems.
these are a little different from i-wrote-to-that-file-and-boom
experiences. *there* the encoding exception is fully warranted, and
could be easily fixed by allowing a less-than-strict encoding mode.

but print is different, and of all situations where encoding errors
can occur, this is the hardest to take hold of. and much more so in
python3 it seems than in python2.

printing to the screen is often purely meta-informative in nature, a
side-effect e.g. of a webserver really doing web pages. i don't want
to bring my entire system down just because some output into some
terminal in the back orifice produced a some amount of grabish. maybe
only a single chinese character amongst thousands of done this done
that red tape.

i think web browsers are a good example here. i don't know whether it
was a good idea to let clients reassemble broken web pages in an order
as they see fit, but the policy to just output broken encoding
character instances instead of terminating the browser process with a
lengthy stacktrace was probably somehow good for the poopularity of
the web as we know it.

my current patch looks like this:

  class Stdout_writer_with ncrs( object ):

def write( self, p ):
  """See to it that all write encodings are done using numerical
character references (NCRs) that
  circumvents Python’s default behavior of raising an exception
whenever it encounters an
  unrepresentable character while printing."""
  enc   = sys.__stdout__.encoding
  p = p if isinstance( p, str ) else str( p )
  p = p.encode( enc, 'xmlcharrefreplace' ).decode( enc )
  sys.__stdout__.write( p )

  sys.stdout = Stdout_writer_with ncrs()

this method picks up anything to be printed, makes sure it is a text,
and then encodes it to the terminal encoding using numerical character
references (NCRs), then decodes it again since the underlying wrapper
class wants to do encodings itself and refuses bytes in place of
strings to be sent (again, this is not nice: an array of byte values
sent to the print method is a clear request to send exactly those
bytes, verbatim, one by one, to the terminal. no mucking around with
my bytes, pls! maybe i can implement that in the code above, too.)

of course, this simplistic scaffold will break if anyone uses
sys.stdout for anything but issue sys.stdout.write(), but so far it
has worked fine despite of being a defective, tiny shim. maybe
inheriting from sys.stdout.__class__ would help.









> "_wolf"  wrote in message
>
> news:22991c72-d00f-45cd-9bf7-0b80fc431...@k26g2000vbp.googlegroups.com...
>
>
>
> > hi folks,
>
> > i am doing my first steps in the wonderful world of python 3.
>
> > some things are good.
> > some things have to be relearned.
> > some things drive me crazy.
>
> > sadly, i'm working on a windows box. which, in germany, entails that
> > python thinks it to be a good idea to take cp1252 as the default
> > encoding.
>
> > so just coz i got my box in germany means i can never print out a
> > chinese character? say what?
>
> > i have no troubles with people configuring their python installation
> > to use any encoding in the world, but wouldn't it have been less of a
> > surprise to just assume utf-8 for any file in/output? after all, it is
> > already the default for python source files as far as i understand.
> > someone might think they're clever to sniff into the system and make
> > the somehwat educated guess that this dude's using cp1252 for his
> > files. but they would be wrong.
>
> > so: how can i tell python, in a configuration or using a setting in
> > sitecustomize.py, or similar, to use utf-8 as a default encoding?
> > there used to be a trick to say `reload(sys);sys.setdefaultencoding
> > ('utf-8')`, but that has no effect in py3.0.1. also, i cannot set
> > `sys.stdout.encoding`; is there a way to re-open that stream with a
> > different encoding?
>
> > in all, i belie

Re: preferred way to set encoding for print

2009-09-15 Thread Mark Tolonen


"_wolf"  wrote in message 
news:22991c72-d00f-45cd-9bf7-0b80fc431...@k26g2000vbp.googlegroups.com...

hi folks,

i am doing my first steps in the wonderful world of python 3.

some things are good.
some things have to be relearned.
some things drive me crazy.

sadly, i'm working on a windows box. which, in germany, entails that
python thinks it to be a good idea to take cp1252 as the default
encoding.

so just coz i got my box in germany means i can never print out a
chinese character? say what?

i have no troubles with people configuring their python installation
to use any encoding in the world, but wouldn't it have been less of a
surprise to just assume utf-8 for any file in/output? after all, it is
already the default for python source files as far as i understand.
someone might think they're clever to sniff into the system and make
the somehwat educated guess that this dude's using cp1252 for his
files. but they would be wrong.

so: how can i tell python, in a configuration or using a setting in
sitecustomize.py, or similar, to use utf-8 as a default encoding?
there used to be a trick to say `reload(sys);sys.setdefaultencoding
('utf-8')`, but that has no effect in py3.0.1. also, i cannot set
`sys.stdout.encoding`; is there a way to re-open that stream with a
different encoding?

in all, i believe it is quite unsettling to me to see that, on my py3
installation,

sys.getdefaultencoding() == 'utf-8'
sys.stdout.encoding == 'cp1252'
locale.getlocale() == (None, None)
locale.getdefaultlocale() == ('de_DE', 'cp1252')

which to me makes as much sense as a blackcurrant tart thrown into
space. worse,

locale.setlocale( locale.LC_ALL, locale.getdefaultlocale() )

results in

locale.Error: unsupported locale setting

this bloody thing doesn't accept its *own* output. attempts to feed
that locale beast with anything but the empty string or 'C' were all
doomed. it would take a very patient and eloquent person to explain
that in a credible fashion to me. my word for this is, 'broken'.

i would very much like to rid myself of these considerations. just say
it's all utf-8, wash'n'go.

my attempts of changing python's mind using the locale module have
failed so far. otherwise, i for one don't want to touch that locale
thing with a very long pole. as far as i can see, it does not work as
documented. the platform dependencies are also a clear OFF LIMITS sign
to me.

any suggestions?


What specifically do you want to do?  I work with Chinese all the time on a 
U.S. Windows system.  Do you want to print Chinese characters in a console 
window?  In a Python IDE?  FYI, I don't use the locale module for much at 
all.


I can't type or print Chinese to a console window unless I change Control 
Panel, Regional and Language Options, Advanced Tab, Language for non-Unicode 
Programs to a Chinese selection (and reboot).  Then the default 
sys.stdout.encoding is something like cp936.


The Pythonwin IDE in the latest version of pywin32, however, supports UTF-8 
in its interactive window and displays Chinese fine.


Setting PYTHONIOENCODING overrides the encoding used for stdin/stdout/stderr 
(See the Python help for details), but if your terminal doesn't support the 
encoding that won't help.


Let me know what you're trying to do.

-Mark


--
http://mail.python.org/mailman/listinfo/python-list


preferred way to set encoding for print

2009-09-15 Thread _wolf
hi folks,

i am doing my first steps in the wonderful world of python 3.

some things are good.
some things have to be relearned.
some things drive me crazy.

sadly, i'm working on a windows box. which, in germany, entails that
python thinks it to be a good idea to take cp1252 as the default
encoding.

so just coz i got my box in germany means i can never print out a
chinese character? say what?

i have no troubles with people configuring their python installation
to use any encoding in the world, but wouldn't it have been less of a
surprise to just assume utf-8 for any file in/output? after all, it is
already the default for python source files as far as i understand.
someone might think they're clever to sniff into the system and make
the somehwat educated guess that this dude's using cp1252 for his
files. but they would be wrong.

so: how can i tell python, in a configuration or using a setting in
sitecustomize.py, or similar, to use utf-8 as a default encoding?
there used to be a trick to say `reload(sys);sys.setdefaultencoding
('utf-8')`, but that has no effect in py3.0.1. also, i cannot set
`sys.stdout.encoding`; is there a way to re-open that stream with a
different encoding?

in all, i believe it is quite unsettling to me to see that, on my py3
installation,

sys.getdefaultencoding() == 'utf-8'
sys.stdout.encoding == 'cp1252'
locale.getlocale() == (None, None)
locale.getdefaultlocale() == ('de_DE', 'cp1252')

which to me makes as much sense as a blackcurrant tart thrown into
space. worse,

locale.setlocale( locale.LC_ALL, locale.getdefaultlocale() )

results in

locale.Error: unsupported locale setting

this bloody thing doesn't accept its *own* output. attempts to feed
that locale beast with anything but the empty string or 'C' were all
doomed. it would take a very patient and eloquent person to explain
that in a credible fashion to me. my word for this is, 'broken'.

i would very much like to rid myself of these considerations. just say
it's all utf-8, wash'n'go.

my attempts of changing python's mind using the locale module have
failed so far. otherwise, i for one don't want to touch that locale
thing with a very long pole. as far as i can see, it does not work as
documented. the platform dependencies are also a clear OFF LIMITS sign
to me.

any suggestions?

cheers,

~flow

-- 
http://mail.python.org/mailman/listinfo/python-list