[issue23901] Force console stdout to use UTF8 on Windows

2017-03-26 Thread Martin Panter

Martin Panter added the comment:

This seems to be discussing the same sort of stuff that ended up with the Issue 
1602 implementation.

--
nosy: +martin.panter
resolution:  -> duplicate
stage:  -> resolved
status: open -> closed
superseder:  -> windows console doesn't print or input Unicode

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23901] Force console stdout to use UTF8 on Windows

2015-04-18 Thread STINNER Victor

STINNER Victor added the comment:

If sys.stdout is modified, it must be carefully tested in various scenario:

- Windows console, default config
- Windows console, TrueType font
- PowerShell = see #21927, it looks like PowerShell has its own set of Unicode 
issues
- Redirect output into a file
- etc.

Very good articles by Michael S. Kaplan on Windows stdout/console:
- Conventional wisdom is retarded, aka What the @#%* is _O_U16TEXT?
  http://www.siao2.com/2008/03/18/8306597.aspx
- Myth busting in the console
  http://www.siao2.com/2010/10/07/10072032.aspx
- Cunningly conquering communicated console caveats. Comprende, mon Capitán?
  http://www.siao2.com/2010/05/07/10008232.aspx

See also fwide() function.

Good luck...

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23901
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23901] Force console stdout to use UTF8 on Windows

2015-04-13 Thread Paul Moore

Paul Moore added the comment:

My proof-of-concept attempt to use _O_U8TEXT resulted in some very bizarre 
behaviour - odd buffering of the interactive interpreter output and what appear 
to be Chinese characters being displayed for normal (ASCII) interactions.

I suspect there is some oddity around how _O_U8TEXT works. The approach looks 
too fragile to pursue. I'll look further into the RawIOBase option.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23901
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23901] Force console stdout to use UTF8 on Windows

2015-04-09 Thread Paul Moore

Paul Moore added the comment:

Doh. That latter approach (a RawIOBase implementation) is *precisely* what 
win_unicode_console does for stdout (using utf16le rather than utf8 as that's 
the native Windows encoding used by WriteConsole). So (a) yes it would work, 
and (b) it has already demonstrated in the wild that the approach is viable.

(Actually, a C implementation of this approach might be a better way of 
implementing this anyway, rather than relying on a relatively obscure C runtime 
feature).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23901
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23901] Force console stdout to use UTF8 on Windows

2015-04-09 Thread Paul Moore

New submission from Paul Moore:

Console code page issues are a consistent source of problems on Windows. It 
would be nice, given that the Windows console has Unicode support, if Python 
could write the full range of Unicode to the console by default.

The MSVC runtime appears to have a flag that can be set via _setmode(), 
_O_U8TEXT, which enables Unicode mode (see 
https://msdn.microsoft.com/en-us/library/tw4k6df8%28v=vs.100%29.aspx?f=255MSPPError=-2147217396,
 in particular the second example). It seems as if Python could set U8TEXT mode 
on sys.stdout on startup (assuming it's a console) and set the encoding to 
UTF8, and then Unicode output would just work.

I don't have code that implements this yet, but if I can get my head round the 
IO stack and the Python startup code, I'll give it a go.

Steve - any comments on whether this might work? I've never seen any real-world 
code using U8TEXT which makes me wonder if it's reliable (doing 
msvcrt.setmode(sys.stdout.fileno(), 0x4) in Python 3.4 causes Python to 
crash, which is worrying, but it works in 3.5...).

--
assignee: paul.moore
components: Windows
messages: 240354
nosy: paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: Force console stdout to use UTF8 on Windows
versions: Python 3.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23901
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23901] Force console stdout to use UTF8 on Windows

2015-04-09 Thread Paul Moore

Paul Moore added the comment:

Generally, my understanding is that the console does pretty badly at supporting 
Unicode via the normal WriteFile APIs and the code page support (mainly 
because support for the UTF8 code page is rubbish). But the WriteConsole API 
does, I believe, have pretty solid Unicode support (it's what Powershell uses, 
for example). Typically, attempts to support Unicode for Python console output 
(e.g., win_unicode_console on PyPI) deal with this by making a file-like object 
that calls WriteConsole under the hood, and replaces sys.stdout with this. The 
problem with this approach is that it isn't a normal text stream object 
(there's no underlying raw bytes buffer), so the result isn't seamless 
(although win_unicode_console is pretty good).

What I noticed is that the C runtime supports an _O_U8TEXT mode for console 
file descriptors, at the (bytes) write() level. So that could be seamlessly 
integrated into the bytes IO layer of the Python IO stack.

As far as I can tell from the description, the way it works is to treat a block 
of bytes written via write() as a UTF8 string, encode it to Unicode and write 
it to the console via WriteConsole(). (I haven't checked the CRT source, but 
that seems like the most likely implementation).

Code speaks louder than words, obviously, and I do intend to produce a trial 
implementation. But that'll take a bit of time because I need to understand how 
the IO stack hangs together first.

An alternative approach would be a RawIOBase implementation that wrote bytes to 
the console by (re-)decoding them from UTF8 and using WriteConsole, then 
wrapping that in the usual buffered IO and text IO layers (with the text IO 
layer using UTF8 encoding). That may well be implementable in pure Python, and 
make a good prototype implementation. Hmm...

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23901
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23901] Force console stdout to use UTF8 on Windows

2015-04-09 Thread R. David Murray

R. David Murray added the comment:

There are a lot of issues in this tracker (for some definition of a lot) that 
indicate that the console does *not* support unicode.  So if you are writing 
utf-8 I wouldn't expect this to work.  (If it were an API taking unicode 
directly, that might be a different story).  But the amount I know about 
windows is pretty small, so I sure hope you are right

--
nosy: +r.david.murray

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23901
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23901] Force console stdout to use UTF8 on Windows

2015-04-09 Thread STINNER Victor

STINNER Victor added the comment:

 There are a lot of issues in this tracker (for some definition of a lot) that 
 indicate that the console does *not* support unicode.

The main issue is the issue #1602.

--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23901
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com