Re: Unicode stdin/stdout

2013-11-18 Thread Robin Becker

On 15/11/2013 18:16, random...@fastmail.us wrote:

Of course, the real solution to this issue is to replace sys.stdout on
windows with an object that can handle Unicode directly with the
WriteConsoleW function - the problem there is that it will break code
that expects to be able to use sys.stdout.buffer for binary I/O. I also
wasn't able to get the analogous stdin replacement class to work with
input() in my attempts.


I started to use this on my windows installation


#c:\python33\lib\site-packages\sitecustomize.py
import sys, codecs
sys.stdout = codecs.getwriter(utf-8)(sys.stdout.detach())
sys.stderr = codecs.getwriter(utf-8)(sys.stderr.detach())

which makes them writable with any unicode; after many years I am quite used to 
garbage appearing in the windows console.


Unfortunately the above doesn't go into virtual environments, but I assume a 
hacked site.py could do that.

--
Robin Becker

--
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode stdin/stdout

2013-11-18 Thread Robin Becker

On 18/11/2013 11:47, Robin Becker wrote:
...

#c:\python33\lib\site-packages\sitecustomize.py
import sys, codecs
sys.stdout = codecs.getwriter(utf-8)(sys.stdout.detach())
sys.stderr = codecs.getwriter(utf-8)(sys.stderr.detach())


it seems that the above needs extra stuff to make some distutils logging work 
etc etc; so now I'm using sitecustomize.py containing


import sys, codecs
sys.stdout = codecs.getwriter(utf-8)(sys.stdout.detach())
sys.stdout.encoding = 'utf8'
sys.stderr = codecs.getwriter(utf-8)(sys.stderr.detach())
sys.stderr.encoding = 'utf8'

--
Robin Becker

--
https://mail.python.org/mailman/listinfo/python-list


Re: [Python-ideas] Unicode stdin/stdout

2013-11-18 Thread Nick Coghlan
On 18 Nov 2013 22:36, Robin Becker ro...@reportlab.com wrote:

 On 18/11/2013 11:47, Robin Becker wrote:
 ...

 #c:\python33\lib\site-packages\sitecustomize.py
 import sys, codecs
 sys.stdout = codecs.getwriter(utf-8)(sys.stdout.detach())
 sys.stderr = codecs.getwriter(utf-8)(sys.stderr.detach())

 
 it seems that the above needs extra stuff to make some distutils logging
work etc etc; so now I'm using sitecustomize.py containing

 import sys, codecs
 sys.stdout = codecs.getwriter(utf-8)(sys.stdout.detach())
 sys.stdout.encoding = 'utf8'
 sys.stderr = codecs.getwriter(utf-8)(sys.stderr.detach())
 sys.stderr.encoding = 'utf8'

Note that calling detach() on the standard streams isn't officially
supported, since it breaks the shadow streams saved in sys.__stderr__, etc.

Cheers,
Nick.


 --
 Robin Becker

 ___
 Python-ideas mailing list
 python-id...@python.org
 https://mail.python.org/mailman/listinfo/python-ideas
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: [Python-ideas] Unicode stdin/stdout

2013-11-18 Thread Victor Stinner
Why do you need to force the UTF-8 encoding? Your locale is not
correctly configured?

It's better to set PYTHONIOENCODING rather than replacing
sys.stdout/stderr at runtime.

There is an open issue to add a TextIOWrapper.set_encoding() method:
http://bugs.python.org/issue15216

Victor
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode stdin/stdout

2013-11-18 Thread Robin Becker

On 18/11/2013 15:25, Victor Stinner wrote:

Why do you need to force the UTF-8 encoding? Your locale is not
correctly configured?

It's better to set PYTHONIOENCODING rather than replacing
sys.stdout/stderr at runtime.

There is an open issue to add a TextIOWrapper.set_encoding() method:
http://bugs.python.org/issue15216

Victor

well reportlab does all sorts of character sets and languages; if I put in a 
quick print to try and debug stuff I prefer that it create some output rather 
than create an error of its own. In the real world it's not possible always to 
know what the output contains (especially in error cases) so having any 
restriction on the allowed textual outputs is a bit constraining.


The utf8 encoding should allow any unicode to be properly encoded, rendering is 
another issue and I expect some garbage when things are going wrong.


I think you are right and I should use PYTHONIOENCODING to set this up. In the 
codec writer approach I think it's harder to get interactive behaviour working 
properly (the output seems to be buffered differently). My attempts to make 
windows xp use code page 65001 everywhere have been fairly catastrophic eg 
non-booting :(

--
Robin Becker

--
https://mail.python.org/mailman/listinfo/python-list


Re: [Python-ideas] Unicode stdin/stdout

2013-11-18 Thread random832
On Mon, Nov 18, 2013, at 7:33, Robin Becker wrote:
 UTF-8 stuff

This doesn't really solve the issue I was referring to, which is that
windows _console_ (i.e. not redirected file or pipe) I/O can only
support unicode via wide character (UTF-16) I/O with a special function,
not via using byte-based I/O with the normal write function.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: [Python-ideas] Unicode stdin/stdout

2013-11-18 Thread Andrew Barnert
From: random...@fastmail.us random...@fastmail.us



 On Mon, Nov 18, 2013, at 7:33, Robin Becker wrote:
  UTF-8 stuff
 
 This doesn't really solve the issue I was referring to, which is that
 windows _console_ (i.e. not redirected file or pipe) I/O can only
 support unicode via wide character (UTF-16) I/O with a special function,
 not via using byte-based I/O with the normal write function.


The problem is that Windows 16-bit I/O doesn't fit into the usual io module 
hierarchy. Not because it uses an encoding of UTF-16 (although anyone familiar 
with ReadConsoleW/WriteConsoleW from other languages may be a bit confused that 
Python's lowest-level wrappers around them deal in byte counts instead of WCHAR 
counts), but because you have to use HANDLEs instead of fds. So, there are 
going to be some compromises and some complexity.

One possibility is to use as much of the io hierarchy as possible, but not try 
to make it flexible enough to be reusable for arbitrary HANDLEs: Add 
WindowsFileIO and WindowsConsoleIO classes that implement RawIOBase with a 
native HANDLE and ReadFile/WriteFile and ReadConsoleW/WriteConsoleW 
respectively. Both work in terms of bytes (which means WindowsConsoleIO.read 
has to //2 its argument, and write has to *2 the result). You also need a 
create_windows_io function that wraps a HANDLE by calling GetConsoleMode and 
constructing a WindowsConsoleIO or WindowsFileIO as appropriate, then creates a 
BufferedReader/Writer around that, then constructs a TextIOWrapper with UTF-16 
or the default encoding around that. At startup, you just do that for the three 
GetStdHandle handles, and that's your stdin, stdout, and stderr.

Besides not being reusable enough for people who want to wrap HANDLEs from 
other libraries or attach to new consoles from Python, it's not clear what 
fileno() should return. You could fake it and return the MSVCRT fds that 
correspond to the same files as the HANDLEs, but it's possible to end up with 
one redirected and not the other (e.g., if you detach the console), and I'm not 
sure what happens if you mix and match the two. A more correct solution would 
be to call _open_osfhandle on the HANDLE (and then keep track of the fact that 
os.close closes the HANDLE, or leave it up to the user to deal with bad handle 
errors?), but I'm not sure that's any better in practice. Also, should a 
console HANDLE use _O_WTEXT for its fd (in which case the user has to know that 
he has a _O_WTEXT handle even though there's no way to see that from Python), 
or not (in which case he's mixing 8-bit and 16-bit I/O on the same file)?

It might be reasonable to just not expose fileno(); most code that wants the 
fileno() for stdin is just going to do something Unix-y that's not going to 
work anyway (select it, tcsetattr it, pass it over a socket to another file, …).

A different approach would be to reuse as _little_ of io as possible, instead 
of as much: Windows stdin/stdout/stderr could each be custom TextIOBase 
implementations that work straight on HANDLEs and don't even support buffer (or 
detach), much less fileno. That exposes even less functionality to users, of 
course. It also means we need a parallel implementation of all the buffering 
logic. (On the other hand, it also leaves the door open to expose some Windows 
functionality, like async ReadFileEx/WriteFileEx, in a way that would be very 
hard through the normal layers…)


It shouldn't be too hard to write most of these via an extension module or 
ctypes to experiment with it. As long as you're careful not to mix 
winsys.stdout and sys.stdout (the module could even set sys.stdin, sys.stdout, 
sys.stderr=stdin, stdout, stderr at import time, or just del them, for a bit of 
protection), it should work.

It might be worth implementing a few different designs to play with, and 
putting them through their paces with some modules and scripts that do 
different things with stdio (including running the scripts with cmd.exe 
redirected I/O and with subprocess PIPEs) to see which ones have problems or 
limitations that are hard to foresee in advance.

If you have a design that you think sounds good, and are willing to experiment 
the hell out of it, and don't know how to get started but would be willing to 
debug and finish a mostly-written/almost-working implementation, I could slap 
something together with ctypes to get you started.
-- 
https://mail.python.org/mailman/listinfo/python-list


Unicode stdin/stdout (was: Re: python 3.3 repr)

2013-11-15 Thread random832
Of course, the real solution to this issue is to replace sys.stdout on
windows with an object that can handle Unicode directly with the
WriteConsoleW function - the problem there is that it will break code
that expects to be able to use sys.stdout.buffer for binary I/O. I also
wasn't able to get the analogous stdin replacement class to work with
input() in my attempts.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Smarter way to do this? Unicode + stdin, stdout

2006-12-17 Thread Martin v. Löwis
BenjaMinster schrieb:
 I want to read and write unicode on stdin and stdout.  I can't seem to
 find any way to force sys.stdin.encoding and sys.stdout.encoding to be
 utf-8, so I've got the following workaround:

What operating system are you using? Why do you want to do this?
Python attempts to determine the encoding of your terminal (if
sys.stdout is a terminal), and set sys.stdout.encoding accordingly.

Regards,
Martin
-- 
http://mail.python.org/mailman/listinfo/python-list


Smarter way to do this? Unicode + stdin, stdout

2006-12-16 Thread BenjaMinster
I want to read and write unicode on stdin and stdout.  I can't seem to
find any way to force sys.stdin.encoding and sys.stdout.encoding to be
utf-8, so I've got the following workaround:

import codecs, sys
out = codecs.getwriter(utf-8)(sys.stdout)

def tricky(): return sys.stdin.readline().decode(utf-8).[:-1]

That is, I wrap sys.stdout in a utf-8 writer and I read utf-8 input
from stdin as normal characters then convert it to
unicode and chop off the newline.  Wrapping stdin with
codecs.getreader() doesn't work for some reason--when I call readline()
on the resulting wrapped stream, it won't return until I send it Ctrl+d
twice!  Then stdin is closed which is of course not helpful.

-- 
http://mail.python.org/mailman/listinfo/python-list