Re: [Tutor] myown.getfilesystemencoding()
On Wed, Sep 04, 2013 at 05:39:10AM -0700, Albert-Jan Roskam wrote: > Wow, thanks for looking all this up. Thanks also to other people who > replied. It's not really desirable that a IDE adds confusion to an > area that's already confusing to begin with. Well, naturally it isn't desirable to add confusion, but I think that when dealing with IDEs it is unavoidable. The whole point of an IDE is that it is an *integrated* environment, which implies that the environment that Python runs in is not the same as unintegrated Python would be running in. > But given that chcp > returns cp850 on my windows system (commandline), wouldn't it be more > descriptive if sys.getfilesystemencoding() returned 'cp850'? I cannot comment on the gory details of Windows file system encodings, except to say the sooner Windows moves to UTF-8 everywhere like the rest of the civilized world, the better. -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] myown.getfilesystemencoding()
On Wed, Sep 4, 2013 at 8:39 AM, Albert-Jan Roskam wrote: > But given that chcp returns cp850 on my windows system (commandline), > wouldn't it be more descriptive if sys.getfilesystemencoding() > returned 'cp850'? The common file systems (NTFS, FAT32, UDF, exFAT) support Unicode filenames. The console also uses Unicode, but proper display depends on the current font. The cmd shell encodes to the current codepage when redirecting output from an internal command, unless it was started with /U to force Unicode (e.g. cmd /U /c dir > files.txt). For subprocess, run cmd.exe explicitly with /U (i.e. don't use shell=True), and decode the output as UTF-16. Also, some utilities, such as tree.com, display Unicode fine but always use the OEM code page when output is redirected to a file or pipe (i.e. changing the console code page won't help). > In other words: In the code below, isn't line [1] an obfuscated version of > line [2]? Both versions return only question marks on my system. > > # Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] > on win32 > import ctypes > > ords = [3629, 3633, 3585, 3625, 3619, 3652, 3607, 3618] > u = "".join([unichr(i) for i in ords]) > print u.encode("mbcs") # [1] > > #cp850 is what chcp returns on my Windows system > print u.encode("cp850", "replace") # [2] > > thai_latin_cp = "cp874" > cp_ = int(thai_latin_cp[2:]) > ctypes.windll.kernel32.SetConsoleCP(cp_) > ctypes.windll.kernel32.SetConsoleOutputCP(cp_) > print u.encode("cp874", "replace") "mbcs" is the ANSI codepage (1252), not the OEM codepage (850) nor the current codepage. Neither supports Thai characters. It would be better to compare an OEM box drawing character: >>> from unicodedata import name >>> u = u'\u2500' >>> name(u) 'BOX DRAWINGS LIGHT HORIZONTAL' >>> name(u.encode('850', 'replace').decode('850')) 'BOX DRAWINGS LIGHT HORIZONTAL' >>> name(u.encode('mbcs', 'replace').decode('mbcs')) 'HYPHEN-MINUS' > ctypes.windll.kernel32.SetConsoleCP() and SetConsoleOutputCP seem useful. > Can these functions be used to correctly display the Thai characters on > my western European Windows version? (last block of code is an attempt) > Or is that not possible altogether? If stdout is a console, a write eventually ends up at WriteConsoleA(), which decodes to the console's native Unicode based on the current output codepage. If you're using codepage 847 and the current font supports Thai characters, it should display fine. It's also possible to write a Unicode string directly by calling WriteConsoleW with ctypes. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] myown.getfilesystemencoding()
- Original Message - > From: eryksun > To: Oscar Benjamin ; Albert-Jan Roskam > > Cc: Python Mailing List > Sent: Sunday, September 1, 2013 7:30 AM > Subject: Re: [Tutor] myown.getfilesystemencoding() > > On Sat, Aug 31, 2013 at 9:16 AM, Oscar Benjamin > wrote: >> Spyder has both an internal interpreter and an external interpreter. >> One is the same interpreter process that runs the Spyder GUI. The >> other is run in a subprocess which keeps the GUI safe but reduces your >> ability to inspect the workspace data via the GUI. So presumable >> Albert means the "external" interpreter here. > > I installed Spyder on Windows to look into this. It's using Qt > QProcess to run the external interpreter in a child process. > sys.stdin.isatty() confirms it's not a tty, and Process Explorer > confirms that all 3 standard I/O handles (from msvcrt.get_osfhandle()) > are pipes. > > The file encoding is None for piped standard I/O, so printing unicode > falls back to the default encoding. Normally this is ASCII in 2.x, but > Spyder uses sitecustomize to set the default encoding based on the > default locale. It also sets the hidden console's codepage: > > if os.name == 'nt': # Windows platforms > > # Setting console encoding (otherwise Python does not > # recognize encoding) > try: > import locale, ctypes > _t, _cp = locale.getdefaultlocale('LANG') > try: > _cp = int(_cp[2:]) > ctypes.windll.kernel32.SetConsoleCP(_cp) > ctypes.windll.kernel32.SetConsoleOutputCP(_cp) > except (ValueError, TypeError): > # Code page number in locale is not valid > pass > except ImportError: > pass > > http://code.google.com/p/spyderlib/source/browse/spyderlib/ > widgets/externalshell/sitecustomize.py?name=v2.2.0#74 > > Probably this was added for a good reason, but I don't grok the point. > Python isn't interested in the hidden console window at this stage, > and the standard handles are all pipes. I didn't notice any difference > with these lines commented out, running with Python 2.7.5. YMMV > > There's a design flaw here since sys.stdin.encoding is used by the > parser in single-input mode. With it set to None, Unicode literals > entered in the REPL will be incorrectly parsed if they use non-ASCII > byte values. For example, given the input is Windows 1252, then u'€' > will be parsed as u'\x80' (i.e. PAD, a C1 Control code). > > Here's an alternative to messing with the default encoding -- at least > for the new version of Spyder that doesn't have to support 2.5. Python > 2.6+ checks for the PYTHONIOENCODING environment variable. This > overrides the encoding/errors values in Py_InitializeEx(): > > http://hg.python.org/cpython/file/70274d53c1dd/Python/pythonrun.c#l265 > > You can test setting PYTHONIOENCODING without restarting Spyder. Just > bring up Spyder's "Internal Console" and set > os.environ['PYTHONIOENCODING']. The change applies to new interpreters > started from the "Interpreters" menu. Spyder could set this itself in > the environment that gets passed to the QProcess object. Wow, thanks for looking all this up. Thanks also to other people who replied. It's not really desirable that a IDE adds confusion to an area that's already confusing to begin with. But given that chcp returns cp850 on my windows system (commandline), wouldn't it be more descriptive if sys.getfilesystemencoding() returned 'cp850'? In other words: In the code below, isn't line [1] an obfuscated version of line [2]? Both versions return only question marks on my system. # Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win32 import ctypes ords = [3629, 3633, 3585, 3625, 3619, 3652, 3607, 3618] u = "".join([unichr(i) for i in ords]) print u.encode("mbcs") # [1] #cp850 is what chcp returns on my Windows system print u.encode("cp850", "replace") # [2] thai_latin_cp = "cp874" cp_ = int(thai_latin_cp[2:]) ctypes.windll.kernel32.SetConsoleCP(cp_) ctypes.windll.kernel32.SetConsoleOutputCP(cp_) print u.encode("cp874", "replace") ctypes.windll.kernel32.SetConsoleCP() and SetConsoleOutputCP seem useful. Can these functions be used to correctly display the Thai characters on my western European Windows version? (last block of code is an attempt) Or is that not possible altogether? Best wishes, Albert-Jan ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] myown.getfilesystemencoding()
On Sat, Aug 31, 2013 at 9:16 AM, Oscar Benjamin wrote: > Spyder has both an internal interpreter and an external interpreter. > One is the same interpreter process that runs the Spyder GUI. The > other is run in a subprocess which keeps the GUI safe but reduces your > ability to inspect the workspace data via the GUI. So presumable > Albert means the "external" interpreter here. I installed Spyder on Windows to look into this. It's using Qt QProcess to run the external interpreter in a child process. sys.stdin.isatty() confirms it's not a tty, and Process Explorer confirms that all 3 standard I/O handles (from msvcrt.get_osfhandle()) are pipes. The file encoding is None for piped standard I/O, so printing unicode falls back to the default encoding. Normally this is ASCII in 2.x, but Spyder uses sitecustomize to set the default encoding based on the default locale. It also sets the hidden console's codepage: if os.name == 'nt': # Windows platforms # Setting console encoding (otherwise Python does not # recognize encoding) try: import locale, ctypes _t, _cp = locale.getdefaultlocale('LANG') try: _cp = int(_cp[2:]) ctypes.windll.kernel32.SetConsoleCP(_cp) ctypes.windll.kernel32.SetConsoleOutputCP(_cp) except (ValueError, TypeError): # Code page number in locale is not valid pass except ImportError: pass http://code.google.com/p/spyderlib/source/browse/spyderlib/ widgets/externalshell/sitecustomize.py?name=v2.2.0#74 Probably this was added for a good reason, but I don't grok the point. Python isn't interested in the hidden console window at this stage, and the standard handles are all pipes. I didn't notice any difference with these lines commented out, running with Python 2.7.5. YMMV There's a design flaw here since sys.stdin.encoding is used by the parser in single-input mode. With it set to None, Unicode literals entered in the REPL will be incorrectly parsed if they use non-ASCII byte values. For example, given the input is Windows 1252, then u'€' will be parsed as u'\x80' (i.e. PAD, a C1 Control code). Here's an alternative to messing with the default encoding -- at least for the new version of Spyder that doesn't have to support 2.5. Python 2.6+ checks for the PYTHONIOENCODING environment variable. This overrides the encoding/errors values in Py_InitializeEx(): http://hg.python.org/cpython/file/70274d53c1dd/Python/pythonrun.c#l265 You can test setting PYTHONIOENCODING without restarting Spyder. Just bring up Spyder's "Internal Console" and set os.environ['PYTHONIOENCODING']. The change applies to new interpreters started from the "Interpreters" menu. Spyder could set this itself in the environment that gets passed to the QProcess object. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] myown.getfilesystemencoding()
On 30 August 2013 17:39, eryksun wrote: > On Fri, Aug 30, 2013 at 11:04 AM, Albert-Jan Roskam wrote: > >> the function returns 850 (codepage 850) when I run it via the command prompt, >> but 1252 (cp1252) when I run it in my IDE (Spyder). > > Maybe Spyder communicates with python.exe as a subprocess in a hidden > console, with the console's codepage set to 1252. You can use ctypes > to check windll.kernel32.GetConsoleCP(). If a console is attached, > this will return a nonzero value. Spyder has both an internal interpreter and an external interpreter. One is the same interpreter process that runs the Spyder GUI. The other is run in a subprocess which keeps the GUI safe but reduces your ability to inspect the workspace data via the GUI. So presumable Albert means the "external" interpreter here. Also Spyder has the option to use ipython as the shell for (I think) either interpreter and ipython does a lot of weirdness to stdin/stdout etc. (according to the complaints of the Spyder author when users asked for ipython support). Oscar ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] myown.getfilesystemencoding()
On Fri, Aug 30, 2013 at 11:04 AM, Albert-Jan Roskam wrote: > In Windows, sys.getfilesystemencoding() returns 'mbcs' (multibyte code > system), which doesn't say very much imho. Why aren't you using Unicode for the filename? The native encoding for NTFS is UTF-16, and CPython 2.x uses _wfopen() if you pass it a Unicode filename: http://hg.python.org/cpython/file/70274d53c1dd/Objects/fileobject.c#l357 http://msdn.microsoft.com/en-us/library/yeby3zcb(v=vs.90) Anyway, the "mbcs" codec uses mbcs_encode() and mbcs_decode() from the codecs module. In CPython 2.x, these call PyUnicode_EncodeMBCS() and PyUnicode_DecodeMBCS(), which in turn call the Windows API functions WideCharToMultiByte() and MultiByteToWideChar() for the CP_ACP (ANSI) codepage. This is a system defined encoding, such as Windows 1252. > So I wrote the function below, which returns the codepage as reported by > the windows chcp command. chcp.com is a console application. It's calling GetConsoleCP(), which simply returns the current code page of the attached console (running the command creates a new console if there isn't one to inherit from the parent). This isn't the function you want. There's already a Python function that returns the default ANSI codepage: >>> import locale >>> locale.getpreferredencoding() 'cp1252' You can also use ctypes to call the Windows API directly, and then convert the integer to a string: >>> from ctypes import windll >>> str(windll.kernel32.GetACP()) '1252' > the function returns 850 (codepage 850) when I run it via the command prompt, > but 1252 (cp1252) when I run it in my IDE (Spyder). Maybe Spyder communicates with python.exe as a subprocess in a hidden console, with the console's codepage set to 1252. You can use ctypes to check windll.kernel32.GetConsoleCP(). If a console is attached, this will return a nonzero value. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] myown.getfilesystemencoding()
On 2013-08-30 08:04, Albert-Jan Roskam wrote: > In Windows, sys.getfilesystemencoding() returns 'mbcs' (multibyte code > system), which doesn't say very much imho. Well, what's the problem you have with mbcs being the output here? On NT, mbcs is the encoding that should be used to convert Unicode to a bytestring that is equivalent when used as a path name, after all. pgpb8pNg4EQeQ.pgp Description: PGP signature ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] myown.getfilesystemencoding()
In Windows, sys.getfilesystemencoding() returns 'mbcs' (multibyte code system), which doesn't say very much imho. So I wrote the function below, which returns the codepage as reported by the windows chcp command. I noticed that the function returns 850 (codepage 850) when I run it via the command prompt, but 1252 (cp1252) when I run it in my IDE (Spyder). Any idea why? Is it a good idea anyway to make this function (no, probably, because Python devs are smart people ;-) #!python.exe #Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win32 import subprocess, re, sys def getfilesystemencoding(): if sys.platform.startswith("win"): proc = subprocess.Popen("chcp", shell=True, stdout=subprocess.PIPE) m = re.search(": (?P\d+)", proc.communicate()[0]) if m: return m.group("codepage") return sys.getfilesystemencoding() return sys.getfilesystemencoding() print getfilesystemencoding() Regards, Albert-Jan ~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor