[issue10114] compile() doesn't support the PEP 383 (surrogates)

2010-10-18 Thread STINNER Victor

STINNER Victor  added the comment:

Buildbots are green again (#10123 is closed). I ported the fix to Python 3.1 
(r85716). Close this issue.

--
resolution:  -> fixed
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10114] compile() doesn't support the PEP 383 (surrogates)

2010-10-16 Thread STINNER Victor

STINNER Victor  added the comment:

I created #10123 for the test_doctest regression.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10114] compile() doesn't support the PEP 383 (surrogates)

2010-10-16 Thread STINNER Victor

STINNER Victor  added the comment:

Commited to 3.2 (r85569+r85570). I wait for the buildbot before porting the 
patch to 3.1 and close the issue. There is already a regression on Gentoo 
buildbot with ascii locale encoding, test_doctest test_zipimport_support:

http://www.python.org/dev/buildbot/all/builders/AMD64%20Gentoo%20Wide%203.x/builds/106

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10114] compile() doesn't support the PEP 383 (surrogates)

2010-10-16 Thread STINNER Victor

STINNER Victor  added the comment:

Oh, I just realized that Python 3.1.2 (last Python 3.1 release) was released 
the 21st March, whereas r82063 (commit for #6543) was made the 17st June. So 
the encoding change was not released yet.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10114] compile() doesn't support the PEP 383 (surrogates)

2010-10-16 Thread STINNER Victor

STINNER Victor  added the comment:

> Here is a new patch [code_encoding.patch] implementing this idea:
> - Use filesystem encoding (and surrogateescape) to encode/decode
> paths in compile() and the parser, instead of utf-8 in strict mode
> (...)
> The patch restores the situation before #6543.

About Python 3.1 compatibility: Python 3.1 doesn't support non-ascii paths with 
a locale different than utf-8 (see issue #8611), so it doesn't change anything 
for Python 3.1 (it doesn't work anyway, with utf-8 or filesystem encoding).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10114] compile() doesn't support the PEP 383 (surrogates)

2010-10-15 Thread STINNER Victor

Changes by STINNER Victor :


Removed file: http://bugs.python.org/file19243/compile_surrogates.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10114] compile() doesn't support the PEP 383 (surrogates)

2010-10-15 Thread STINNER Victor

STINNER Victor  added the comment:

Remove [compile_surrogates.patch] because it creates filenames unencode to the 
filesystem encoding. Eg. compile('', '\udcc3\udca9', 'exec').co_filename gives 
'é' even if the filesystem encoding is 'ascii'.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10114] compile() doesn't support the PEP 383 (surrogates)

2010-10-15 Thread STINNER Victor

STINNER Victor  added the comment:

> I do not see what filesystem encodings, or any other encoding 
> to bytes should really have to do with the [code.co_filename].

co_filename attribute is used to display the traceback: Python opens the 
related file, read the source code line and display it. On Windows, co_filename 
is directly used because Windows accepts unicode for filenames. But on other 
OSes, you have to encode the filename to the filesystem encoding.

If your filesystem encoding is 'ascii' (eg. C locale) and co_filename is a 
non-ascii filename (eg. 'test_é.py'), encode co_filename will raise a 
UnicodeEncodeError. You can test it simply by using os.fsencode():

$ ./python 
Python 3.2a3+ (py3k:85551:85553M, Oct 16 2010, 00:54:03) 
>>> import sys; sys.getfilesystemencoding()
'utf-8'
>>> import os; os.fsencode('é')
b'\xc3\xa9'

$ LANG= ./python 
Python 3.2a3+ (py3k:85551:85553M, Oct 16 2010, 00:54:03) 
>>> import sys; sys.getfilesystemencoding()
'ascii'
>>> import os; os.fsencode('\xe9')
...
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' ...

Said differently, co_filename should be encodable to the filesystem encoding 
(os.fsencode(co_filename) should not raise an error).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10114] compile() doesn't support the PEP 383 (surrogates)

2010-10-15 Thread Terry J. Reedy

Terry J. Reedy  added the comment:

Pardon my ignorance, but given that code.co_filename is a string attribute 
given as a string, which is to say, unicode in 3.x, I do not see what 
filesystem encodings, or any other encoding to bytes should really have to do 
with the attribute. I actually would have expected compile to take your example 
argument 'abc\uDC80' and paste it onto the code object unchanged. The only 
issue to me is whether any string should be allowed or only legal-unicode 
strings. Anything else would seem like a 2.x holdover.

If PyBytes_AS_STRING (macro version of PyBytes_AsString) is the implementation 
of str(bytes_object) (as I would guess from the doc), then as I read your 
patch, it will produce rather strange 'filenames'.
>>> str('abc\uDC80'.encode("utf-8", "surrogateescape"))
"b'abc\\x80'"
always wrapped in b'...'.

If not that, what does it do (with no decoding specified)?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10114] compile() doesn't support the PEP 383 (surrogates)

2010-10-15 Thread STINNER Victor

STINNER Victor  added the comment:

> All filenames should use the filesystem encoding in Python.

Here is a new patch [code_encoding.patch] implementing this idea:

 - Use filesystem encoding (and surrogateescape) to encode/decode paths in 
compile() and the parser, instead of utf-8 in strict mode
 - Ensure that co_filename attribute can be used as a filename (eg. to not 
raise UnicodeEncodeError on Linux)
 - compile() builtin supports bytes filenames
 - _Py_FindSourceFile() (traceback.c) encodes paths of sys.path into the 
filesystem encoding, as do find_module() (import.c)
 - PyRun_SimpleFileExFlags() sets __file__ attribute using the filesystem 
encoding

The patch restores the situation before #6543.

--
Added file: http://bugs.python.org/file19246/code_encoding.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10114] compile() doesn't support the PEP 383 (surrogates)

2010-10-15 Thread STINNER Victor

STINNER Victor  added the comment:

See also #9713.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10114] compile() doesn't support the PEP 383 (surrogates)

2010-10-15 Thread STINNER Victor

STINNER Victor  added the comment:

#6543 changed code->co_filename encoding from filesystem 
encoding+surrogateescape to utf-8+strict.

With my patch, compile('', '\udcc3\udca9', 'exec').co_filename gives 'é', it 
doesn't depend on the filesystem encoding. But 'é' cannot be used with all 
filesystem encodings, eg. with ascii locale encoding (C locale), use it raises 
an error.

I now think that it was a bad idea to use utf-8 instead of the fileystem 
encoding. All filenames should use the filesystem encoding in Python.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10114] compile() doesn't support the PEP 383 (surrogates)

2010-10-15 Thread Terry J. Reedy

Terry J. Reedy  added the comment:

I think the title is slightly misleading. As I read the patch, the issue is 
that PyArg_ParseTupleAndKeywords requires that string args to C functions be 
valid Unicode strings (and that it does this by trying to encode to utf-8). 
Your patch subverts this by redefining filename to be a generic object, with a 
looser custom-coded test. It is not clear to me that filename, out of all 
string args to builtins, should be excepted this way. It seems to me that any 
real filename should be real unicode and there is no need for fake names that 
are not.

--
nosy: +terry.reedy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10114] compile() doesn't support the PEP 383 (surrogates)

2010-10-15 Thread Antoine Pitrou

Changes by Antoine Pitrou :


--
nosy: +benjamin.peterson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10114] compile() doesn't support the PEP 383 (surrogates)

2010-10-15 Thread STINNER Victor

New submission from STINNER Victor :

Example:

$ ./python
Python 3.2a3+ (py3k, Oct 15 2010, 14:31:59) 
>>> compile('', 'abc\uDC80', 'exec')
...
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 
3: surrogates not allowed

Attached patch encodes manually the filename to utf-8 with surrogateescape.

I found this problem while testing Python with an ASCII locale encoding (LANG=C 
./python Lib/test/regrtest.py). Example:

  $ LANG=C ./python -m base64 -e setup.py 
  ...
  UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc3' ...

--
components: Interpreter Core, Unicode
files: compile_surrogates.patch
keywords: patch
messages: 118762
nosy: haypo
priority: normal
severity: normal
status: open
title: compile() doesn't support the PEP 383 (surrogates)
versions: Python 3.2
Added file: http://bugs.python.org/file19243/compile_surrogates.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com