Re: [Tutor] myown.getfilesystemencoding()

2013-09-04 Thread Steven D'Aprano
On Wed, Sep 04, 2013 at 05:39:10AM -0700, Albert-Jan Roskam wrote:

> Wow, thanks for looking all this up. Thanks also to other people who 
> replied. It's not really desirable that a IDE adds confusion to an 
> area that's already confusing to begin with. 

Well, naturally it isn't desirable to add confusion, but I think that 
when dealing with IDEs it is unavoidable. The whole point of an IDE is 
that it is an *integrated* environment, which implies that the 
environment that Python runs in is not the same as unintegrated Python 
would be running in.


> But given that chcp 
> returns cp850 on my windows system (commandline), wouldn't it be more 
> descriptive if sys.getfilesystemencoding() returned 'cp850'?

I cannot comment on the gory details of Windows file system encodings, 
except to say the sooner Windows moves to UTF-8 everywhere like the rest 
of the civilized world, the better.


-- 
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] myown.getfilesystemencoding()

2013-09-04 Thread eryksun
On Wed, Sep 4, 2013 at 8:39 AM, Albert-Jan Roskam  wrote:
> But given that chcp returns cp850 on my windows system (commandline),
> wouldn't it be more descriptive if sys.getfilesystemencoding()
> returned 'cp850'?

The common file systems (NTFS, FAT32, UDF, exFAT) support Unicode
filenames. The console also uses Unicode, but proper display depends
on the current font.

The cmd shell encodes to the current codepage when redirecting output
from an internal command, unless it was started with /U to force
Unicode (e.g. cmd /U /c dir > files.txt). For subprocess, run cmd.exe
explicitly with /U (i.e. don't use shell=True), and decode the output
as UTF-16. Also, some utilities, such as tree.com, display Unicode
fine but always use the OEM code page when output is redirected to a
file or pipe (i.e. changing the console code page won't help).

> In other words: In the code below, isn't line [1] an obfuscated version of
> line [2]? Both versions return only question marks on my system.
>
> # Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)]
> on win32
> import ctypes
>
> ords = [3629, 3633, 3585, 3625, 3619, 3652, 3607, 3618]
> u = "".join([unichr(i) for i in ords])
> print u.encode("mbcs") # [1]
>
> #cp850 is what chcp returns on my Windows system
> print u.encode("cp850", "replace") # [2]
>
> thai_latin_cp = "cp874"
> cp_ = int(thai_latin_cp[2:])
> ctypes.windll.kernel32.SetConsoleCP(cp_)
> ctypes.windll.kernel32.SetConsoleOutputCP(cp_)
> print u.encode("cp874", "replace")

"mbcs" is the ANSI codepage (1252), not the OEM codepage (850) nor the
current codepage. Neither supports Thai characters. It would be better
to compare an OEM box drawing character:

>>> from unicodedata import name
>>> u = u'\u2500'
>>> name(u)
'BOX DRAWINGS LIGHT HORIZONTAL'

>>> name(u.encode('850', 'replace').decode('850'))
'BOX DRAWINGS LIGHT HORIZONTAL'

>>> name(u.encode('mbcs', 'replace').decode('mbcs'))
'HYPHEN-MINUS'

> ctypes.windll.kernel32.SetConsoleCP() and SetConsoleOutputCP seem useful.
> Can these functions be used to correctly display the Thai characters on
> my western European Windows version? (last block of code is an attempt)
> Or is that not possible altogether?

If stdout is a console, a write eventually ends up at WriteConsoleA(),
which decodes to the console's native Unicode based on the current
output codepage. If you're using codepage 847 and the current font
supports Thai characters, it should display fine. It's also possible
to write a Unicode string directly by calling WriteConsoleW with
ctypes.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] myown.getfilesystemencoding()

2013-09-04 Thread Albert-Jan Roskam


- Original Message -

> From: eryksun 
> To: Oscar Benjamin ; Albert-Jan Roskam 
> 
> Cc: Python Mailing List 
> Sent: Sunday, September 1, 2013 7:30 AM
> Subject: Re: [Tutor] myown.getfilesystemencoding()
> 
> On Sat, Aug 31, 2013 at 9:16 AM, Oscar Benjamin
>  wrote:
>>  Spyder has both an internal interpreter and an external interpreter.
>>  One is the same interpreter process that runs the Spyder GUI. The
>>  other is run in a subprocess which keeps the GUI safe but reduces your
>>  ability to inspect the workspace data via the GUI. So presumable
>>  Albert means the "external" interpreter here.
> 
> I installed Spyder on Windows to look into this. It's using Qt
> QProcess to run the external interpreter in a child process.
> sys.stdin.isatty() confirms it's not a tty, and Process Explorer
> confirms that all 3 standard I/O handles (from msvcrt.get_osfhandle())
> are pipes.
> 
> The file encoding is None for piped standard I/O, so printing unicode
> falls back to the default encoding. Normally this is ASCII in 2.x, but
> Spyder uses sitecustomize to set the default encoding based on the
> default locale. It also sets the hidden console's codepage:
> 
>     if os.name == 'nt': # Windows platforms
> 
>         # Setting console encoding (otherwise Python does not
>         # recognize encoding)
>         try:
>             import locale, ctypes
>             _t, _cp = locale.getdefaultlocale('LANG')
>             try:
>                 _cp = int(_cp[2:])
>                 ctypes.windll.kernel32.SetConsoleCP(_cp)
>                 ctypes.windll.kernel32.SetConsoleOutputCP(_cp)
>             except (ValueError, TypeError):
>                 # Code page number in locale is not valid
>                 pass
>         except ImportError:
>             pass
> 
> http://code.google.com/p/spyderlib/source/browse/spyderlib/
> widgets/externalshell/sitecustomize.py?name=v2.2.0#74
> 
> Probably this was added for a good reason, but I don't grok the point.
> Python isn't interested in the hidden console window at this stage,
> and the standard handles are all pipes. I didn't notice any difference
> with these lines commented out, running with Python 2.7.5. YMMV
> 
> There's a design flaw here since sys.stdin.encoding is used by the
> parser in single-input mode. With it set to None, Unicode literals
> entered in the REPL will be incorrectly parsed if they use non-ASCII
> byte values. For example, given the input is Windows 1252, then u'€'
> will be parsed as u'\x80' (i.e. PAD, a C1 Control code).
> 
> Here's an alternative to messing with the default encoding -- at least
> for the new version of Spyder that doesn't have to support 2.5. Python
> 2.6+ checks for the PYTHONIOENCODING environment variable. This
> overrides the encoding/errors values in Py_InitializeEx():
> 
> http://hg.python.org/cpython/file/70274d53c1dd/Python/pythonrun.c#l265
> 
> You can test setting PYTHONIOENCODING without restarting Spyder. Just
> bring up Spyder's "Internal Console" and set
> os.environ['PYTHONIOENCODING']. The change applies to new interpreters
> started from the "Interpreters" menu. Spyder could set this itself in
> the environment that gets passed to the QProcess object.

Wow, thanks for looking all this up. Thanks also to other people who replied. 
It's not really desirable that a IDE adds confusion to an area that's already 
confusing to begin with. But given that chcp returns cp850 on my windows system 
(commandline), wouldn't it be more descriptive if sys.getfilesystemencoding() 
returned 'cp850'?

In other words: In the code below, isn't line [1] an obfuscated version of line 
[2]? Both versions return only question marks on my system.

# Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on 
win32
import ctypes

ords = [3629, 3633, 3585, 3625, 3619, 3652, 3607, 3618]
u = "".join([unichr(i) for i in ords])
print u.encode("mbcs") # [1] 

#cp850 is what chcp returns on my Windows system
print u.encode("cp850", "replace") # [2] 

thai_latin_cp = "cp874"
cp_ = int(thai_latin_cp[2:])
ctypes.windll.kernel32.SetConsoleCP(cp_)
ctypes.windll.kernel32.SetConsoleOutputCP(cp_)
print u.encode("cp874", "replace")

ctypes.windll.kernel32.SetConsoleCP() and SetConsoleOutputCP seem useful. Can 
these functions be used to correctly display the Thai characters on my western 
European Windows version? (last block of code is an attempt) Or is that not 
possible altogether?

Best wishes,
Albert-Jan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] myown.getfilesystemencoding()

2013-08-31 Thread eryksun
On Sat, Aug 31, 2013 at 9:16 AM, Oscar Benjamin
 wrote:
> Spyder has both an internal interpreter and an external interpreter.
> One is the same interpreter process that runs the Spyder GUI. The
> other is run in a subprocess which keeps the GUI safe but reduces your
> ability to inspect the workspace data via the GUI. So presumable
> Albert means the "external" interpreter here.

I installed Spyder on Windows to look into this. It's using Qt
QProcess to run the external interpreter in a child process.
sys.stdin.isatty() confirms it's not a tty, and Process Explorer
confirms that all 3 standard I/O handles (from msvcrt.get_osfhandle())
are pipes.

The file encoding is None for piped standard I/O, so printing unicode
falls back to the default encoding. Normally this is ASCII in 2.x, but
Spyder uses sitecustomize to set the default encoding based on the
default locale. It also sets the hidden console's codepage:

if os.name == 'nt': # Windows platforms

# Setting console encoding (otherwise Python does not
# recognize encoding)
try:
import locale, ctypes
_t, _cp = locale.getdefaultlocale('LANG')
try:
_cp = int(_cp[2:])
ctypes.windll.kernel32.SetConsoleCP(_cp)
ctypes.windll.kernel32.SetConsoleOutputCP(_cp)
except (ValueError, TypeError):
# Code page number in locale is not valid
pass
except ImportError:
pass

http://code.google.com/p/spyderlib/source/browse/spyderlib/
widgets/externalshell/sitecustomize.py?name=v2.2.0#74

Probably this was added for a good reason, but I don't grok the point.
Python isn't interested in the hidden console window at this stage,
and the standard handles are all pipes. I didn't notice any difference
with these lines commented out, running with Python 2.7.5. YMMV

There's a design flaw here since sys.stdin.encoding is used by the
parser in single-input mode. With it set to None, Unicode literals
entered in the REPL will be incorrectly parsed if they use non-ASCII
byte values. For example, given the input is Windows 1252, then u'€'
will be parsed as u'\x80' (i.e. PAD, a C1 Control code).

Here's an alternative to messing with the default encoding -- at least
for the new version of Spyder that doesn't have to support 2.5. Python
2.6+ checks for the PYTHONIOENCODING environment variable. This
overrides the encoding/errors values in Py_InitializeEx():

http://hg.python.org/cpython/file/70274d53c1dd/Python/pythonrun.c#l265

You can test setting PYTHONIOENCODING without restarting Spyder. Just
bring up Spyder's "Internal Console" and set
os.environ['PYTHONIOENCODING']. The change applies to new interpreters
started from the "Interpreters" menu. Spyder could set this itself in
the environment that gets passed to the QProcess object.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] myown.getfilesystemencoding()

2013-08-31 Thread Oscar Benjamin
On 30 August 2013 17:39, eryksun  wrote:
> On Fri, Aug 30, 2013 at 11:04 AM, Albert-Jan Roskam  wrote:
>
>> the function returns 850 (codepage 850) when I run it via the command prompt,
>> but 1252 (cp1252) when I run it in my IDE (Spyder).
>
> Maybe Spyder communicates with python.exe as a subprocess in a hidden
> console, with the console's codepage set to 1252. You can use ctypes
> to check windll.kernel32.GetConsoleCP(). If a console is attached,
> this will return a nonzero value.

Spyder has both an internal interpreter and an external interpreter.
One is the same interpreter process that runs the Spyder GUI. The
other is run in a subprocess which keeps the GUI safe but reduces your
ability to inspect the workspace data via the GUI. So presumable
Albert means the "external" interpreter here. Also Spyder has the
option to use ipython as the shell for (I think) either interpreter
and ipython does a lot of weirdness to stdin/stdout etc. (according to
the complaints of the Spyder author when users asked for ipython
support).


Oscar
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] myown.getfilesystemencoding()

2013-08-30 Thread eryksun
On Fri, Aug 30, 2013 at 11:04 AM, Albert-Jan Roskam  wrote:
> In Windows, sys.getfilesystemencoding() returns 'mbcs' (multibyte code
> system), which doesn't say very much imho.

Why aren't you using Unicode for the filename? The native encoding for
NTFS is UTF-16, and CPython 2.x uses _wfopen() if you pass it a
Unicode filename:

http://hg.python.org/cpython/file/70274d53c1dd/Objects/fileobject.c#l357
http://msdn.microsoft.com/en-us/library/yeby3zcb(v=vs.90)

Anyway, the "mbcs" codec uses mbcs_encode() and mbcs_decode() from the
codecs module. In CPython 2.x, these call PyUnicode_EncodeMBCS() and
PyUnicode_DecodeMBCS(), which in turn call the Windows API functions
WideCharToMultiByte() and MultiByteToWideChar() for the CP_ACP (ANSI)
codepage. This is a system defined encoding, such as Windows 1252.

> So I wrote the function below, which returns the codepage as reported by
> the windows chcp command.

chcp.com is a console application. It's calling GetConsoleCP(), which
simply returns the current code page of the attached console (running
the command creates a new console if there isn't one to inherit from
the parent). This isn't the function you want. There's already a
Python function that returns the default ANSI codepage:

>>> import locale
>>> locale.getpreferredencoding()
'cp1252'

You can also use ctypes to call the Windows API directly, and then
convert the integer to a string:

>>> from ctypes import windll
>>> str(windll.kernel32.GetACP())
'1252'

> the function returns 850 (codepage 850) when I run it via the command prompt,
> but 1252 (cp1252) when I run it in my IDE (Spyder).

Maybe Spyder communicates with python.exe as a subprocess in a hidden
console, with the console's codepage set to 1252. You can use ctypes
to check windll.kernel32.GetConsoleCP(). If a console is attached,
this will return a nonzero value.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] myown.getfilesystemencoding()

2013-08-30 Thread Chris Down
On 2013-08-30 08:04, Albert-Jan Roskam wrote:
> In Windows, sys.getfilesystemencoding() returns 'mbcs' (multibyte code
> system), which doesn't say very much imho.

Well, what's the problem you have with mbcs being the output here? On NT, mbcs
is the encoding that should be used to convert Unicode to a bytestring that is
equivalent when used as a path name, after all.


pgpb8pNg4EQeQ.pgp
Description: PGP signature
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] myown.getfilesystemencoding()

2013-08-30 Thread Albert-Jan Roskam
In Windows, sys.getfilesystemencoding() returns 'mbcs' (multibyte code system), 
which doesn't say very much imho.
So I wrote the function below, which returns the codepage as reported by the 
windows chcp command. I noticed that 
the function returns 850 (codepage 850) when I run it via the command prompt, 
but 1252 (cp1252) when I run it in my IDE (Spyder).
Any idea why? Is it a good idea anyway to make this function (no, probably, 
because Python devs are smart people ;-)
 
#!python.exe
#Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on 
win32
 
import subprocess, re, sys
 
def getfilesystemencoding():
    if sys.platform.startswith("win"):
    proc = subprocess.Popen("chcp", shell=True, stdout=subprocess.PIPE)
    m = re.search(": (?P\d+)", proc.communicate()[0])
    if m:
    return m.group("codepage")
    return sys.getfilesystemencoding()
    return sys.getfilesystemencoding()

print getfilesystemencoding()

Regards,
Albert-Jan


~~
All right, but apart from the sanitation, the medicine, education, wine, public 
order, irrigation, roads, a 
fresh water system, and public health, what have the Romans ever done for us?
~~ 
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor