[issue12632] Python 3 doesn't support cp65001 as the OEM code page
Bruce Ferris added the comment: The PYTHONIOENCODING=utf-8 setting works great if I have code page 65001 set. I haven't, however, done a complete console functionality check with that setting but, thanks for the input -- it solves the current problem I'm experiencing. I do wonder, however, if switching to that setting should happen automatically if it's not specified and the Windows current code page is 65001. That would solve the problem unless, of course, PYTHONIOENCODING has side-effects elsewhere that would cause other problems. On the other hand, if it does have side-effects elsewhere than it's not the answer I'm looking for. -- ___ Python tracker <http://bugs.python.org/issue12632> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12632] Windows GPF with Code Page 65001
Bruce Ferris added the comment: Victor, thanks for replying and I've had a quick read of everything that went on for issue #1602. I think there's some misunderstanding in what I'm saying here. Maybe this will help clear up what I'm saying... D:\>chcp Active code page: 850 D:\>chcp 65001 Active code page: 65001 D:\>python27\python Python 2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> ^Z D:\>python31\python Fatal Python error: Py_Initialize: can't initialize sys standard streams LookupError: unknown encoding: cp65001 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. D:\>chcp 850 Active code page: 850 D:\>python31\python Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> ^Z D:\> You see, I'm NOT trying to output any Unicode or UTF-8 characters. All I'm trying to do is run different versions of Python on the same machine from the command line. Some code inside Python now "break" if Python 3.1 is started with Code Page 65001. I fully understand the change between Python 2.7 and 3.1 were probably due to trying to fix issue #1602 (or some other related issue). But, as a side-effect to that "fix", if you now start Python 3.1 (and maybe beyond) with code page set to 65001, it refuses to work but it didn't used to refuse to work. Evidently, Python now tries using the Code Page as an encoding lookup. But, it didn't used to in 2.7. So, there's another compatability issue introduced. Setting my cmd.exe code page to 65001 shouldn't mean a thing to Python if it can't associate it with an encoding. It could, at least, just switch to 7-Bit ASCII and proceed on. That would be better than failing! That's my whole point. If Python want to do some tweeking with code pages to get it's job done, that's fine by me, as long as it doesn't "break" and restores whatever code page I had set when I started it. It's not down to a UTF-8 issue, it's about a compatability issue introduced sometime in the last year or so as a side-effect of trying to resolve a UTF-8 issue, probably #1602. That's all! -- ___ Python tracker <http://bugs.python.org/issue12632> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12632] Windows GPF with Code Page 65001
Bruce Ferris added the comment: I disagree with the "it's not really a GPF since it calls Abort". Consider the following cmd.exe session... Microsoft Windows [Version 6.0.6002] Copyright (c) 2006 Microsoft Corporation. All rights reserved. D:\>chcp 65001 Active code page: 65001 D:\>python >t.txt Fatal Python error: Py_Initialize: can't initialize sys standard streams LookupError: unknown encoding: cp65001 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. D:\>type t.txt D:\>dir t.txt Volume in drive D is DATA Volume Serial Number is 2E61-626C Directory of D:\ 25/07/2011 06:10 PM 0 t.txt 1 File(s) 0 bytes 0 Dir(s) 16,768,655,360 bytes free D:\> This means that, even if it was "intentional", from a programatic point of view. the Python process in this case leaves no other indication other than transient bytes in the transient cmd.exe console buffer. No way of redirecting the output and examining it. I strongly disagree with the statement "(If it were a true segfault-like error, there would be no message from Python itself.)" The "no message from Python itself" case is shown above. My application handles code page 65001 just fine, no problems. If it attempts to use Windows WriteConsole function and it fails, it tries using WriteFile instead. So, when my application fails and output is redirected, it produces output. But, Python 3.1 doesn't. See the following Microsoft MSDN link, it states the WriteConsole point explicitly... http://msdn.microsoft.com/en-us/library/ms687401%28v=VS.85%29.aspx So, if Python doesn't like Code Page 65001, for whatever reason, it can simply save it on startup, and change it to whatever makes it happy. Then, upon Python exit (including Abort), change it back to 65001 before calling Abort. I'm sorry, but the following is "easy" in my book... 1) At Startup... Call GetConsoleOutputCP and save that somewhere. If code page is 65001, change it to something that doesn't cause problems by calling SetConsoleOutputCP 2) On Write... If WriteConsole fails, try calling WriteFile instead. 3) At Abort or Exit... Call SetConsoleOutputCP to set it back to whatever it was on Startup. I don't care if your app (Python) can display UTF-8 on Microsoft's cmd.exe console or if it can't. All I'm trying to do is point out a bit of misbehaviour that CAN be easily changed and will make your product more resilient. I don't know the details of how Python deals with character encoding and, quite honestly, I shouldn't need to since it's not my product. however, I DO know how I handle a similiar scenario in my own app. Microsoft made it complicated, not me. But, I can "easily" get around the problem using the above scenario. If Python can't do it just as "easily", then it tells me more about Python's implementation and the people behind Python then it tells me about Microsoft and the people behind Windows. Don't get me wrong, I love Python as a tool for solving certain classes of problems and, please, keep up the good work. It's appreciated. -- ___ Python tracker <http://bugs.python.org/issue12632> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12632] Windows GPF with Code Page 65001
Bruce Ferris added the comment: I use code page 65001 because 1) it displays the UTF-8 characters in my text files with "echo " on the command line, and 2) that's Microsoft's "official" (whatever that means) code page for UTF-8, and 3) it works in cmd.exe. Setting aside why I use it, it IS used by some, and Python shouldn't GPF for ANY reason if it can be easily fixed. Right? Essentially, 65001 makes Microsoft's console output behave properly (at least with the limited characters in Lucinda Console) so I would think Python should consider not blowing up when it's set. To be honest, I just happened to have it set to 65001 to get the output from another program to look right and just happened to run Python to do some quick unrelated calculations. Imagine my surprise when Python blew, especially when all I did was to run it. It's not like I asked it to do any UTF-8 or anthing! Anyway, as far as I understand... Any GPF is a potential back door. So, it needs closing. -- ___ Python tracker <http://bugs.python.org/issue12632> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12632] Windows GPF with Code Page 65001
Changes by Bruce Ferris : -- type: -> crash ___ Python tracker <http://bugs.python.org/issue12632> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12632] Windows GPF with Code Page 65001
New submission from Bruce Ferris : The following scenario GPFs on Windows Vista using cmd.exe... D:\>python Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> ^Z D:\>chcp 65001 Active code page: 65001 D:\>python Fatal Python error: Py_Initialize: can't initialize sys standard streams LookupError: unknown encoding: cp65001 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. D:\> This is a bit surprising since Code Page 65001 IS the official Microsoft UTF-8 Code Page. Please see... http://msdn.microsoft.com/en-us/library/dd317756%28v=vs.85%29.aspx -- components: Unicode messages: 141067 nosy: bferris57 priority: normal severity: normal status: open title: Windows GPF with Code Page 65001 versions: Python 3.1 ___ Python tracker <http://bugs.python.org/issue12632> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com