[issue8242] Support surrogates in import ; install Python in a non-ASCII directory

2010-04-20 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


--
dependencies: +utf8, backslashreplace and surrogates

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8242
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8242] Support surrogates in import ; install Python in a non-ASCII directory

2010-04-20 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


--
dependencies: +subprocess: surrogates of the error message (Python 
implementation on non-Windows)

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8242
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8242] Support surrogates in import ; install Python in a non-ASCII directory

2010-04-20 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


--
dependencies: +bz2: support surrogates in filename, and bytes/bytearray filename

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8242
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8242] Support surrogates in import ; install Python in a non-ASCII directory

2010-04-19 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Updated patch:
 - Some parts have been applied in other issues
 - Remove assert(PyBytes_Check(x)): support PyByteArray type
 - use PyErr_Format() instead of sprintf+PyErr_SetString in tokenizer.c
 - don't convert message to byte and then back to unicode in err_input(): keep 
the unicode object

--
Added file: http://bugs.python.org/file17002/surrogates-7.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8242
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8242] Support surrogates in import ; install Python in a non-ASCII directory

2010-04-19 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


Removed file: http://bugs.python.org/file16919/surrogates-6.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8242
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8242] Support surrogates in import ; install Python in a non-ASCII directory

2010-04-19 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

$ diffstat ~/surrogates-7.patch
 Doc/library/tarfile.rst |   15 +--
 Include/moduleobject.h  |1
 Lib/platform.py |   12 +-
 Lib/subprocess.py   |2
 Lib/tarfile.py  |   14 --
 Lib/test/regrtest.py|5 -
 Lib/test/test_import.py |5 +
 Lib/test/test_reprlib.py|4
 Lib/test/test_subprocess.py |4
 Lib/test/test_tarfile.py|4
 Lib/test/test_urllib.py |8 +
 Lib/test/test_urllib2.py|4
 Lib/test/test_xml_etree.py  |6 +
 Lib/traceback.py|   10 +-
 Lib/unittest/runner.py  |4
 Modules/_ctypes/callproc.c  |   12 +-
 Modules/_ssl.c  |   10 +-
 Modules/_tkinter.c  |6 -
 Modules/getpath.c   |  100 ++--
 Modules/main.c  |   46 +
 Modules/posixmodule.c   |   18 ++-
 Modules/pyexpat.c   |   11 +-
 Modules/zipimport.c |  210 
 Objects/codeobject.c|7 +
 Objects/exceptions.c|   49 ++
 Objects/fileobject.c|6 -
 Objects/moduleobject.c  |   22 +++-
 Objects/unicodeobject.c |   22 +++-
 Parser/tokenizer.c  |   18 ++-
 Python/_warnings.c  |   26 -
 Python/ast.c|   10 +-
 Python/bltinmodule.c|   33 --
 Python/ceval.c  |4
 Python/compile.c|   12 ++
 Python/errors.c |4
 Python/import.c |   88 --
 Python/pythonrun.c  |   39 
 Python/traceback.c  |   39 ++--
 38 files changed, 625 insertions(+), 265 deletions(-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8242
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8242] Support surrogates in import ; install Python in a non-ASCII directory

2010-04-19 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

I haven't reviewed the patch in detail yet, but it seems to me that it fixes 
independent issues. -1000 on that. One problem, one bug report in the tracker, 
one commit.

If this issue is about the import machinery not working anymore if there is a 
non-ASCII character in the path, then why the heck does it touch 
posixmodule.c

As for modules that have non-ASCII characters in their module name: this is, 
again, an unrelated issue (ISTM), so if you want to deal with it, please create 
a new issue.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8242
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8242] Support surrogates in import ; install Python in a non-ASCII directory

2010-04-18 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


--
dependencies: +tarfile: use surrogates for undecode fields

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8242
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8242] Support surrogates in import ; install Python in a non-ASCII directory

2010-04-18 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

I commited the platform.py patch as r80166 (trunk) and r80167 (py3k), but 
quickly reverted it because the patch on trunk broke Python bootstrap. The 
patch might be applied, but only on py3k and with more tests (ensure that it 
doesn't break bootstrap on any OS) :-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8242
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8242] Support surrogates in import ; install Python in a non-ASCII directory

2010-04-15 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


--
dependencies: +os.system() doesn't support surrogates nor bytes

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8242
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8242] Support surrogates in import ; install Python in a non-ASCII directory

2010-04-15 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


--
dependencies: +pickle is unable to encode unicode surrogates

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8242
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8242] Support surrogates in import ; install Python in a non-ASCII directory

2010-04-13 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


--
dependencies: +test_xmlrpc fails with non-ascii path

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8242
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8242] Support surrogates in import ; install Python in a non-ASCII directory

2010-04-13 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


--
dependencies: +os.execvpe() doesn't support surrogates in env

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8242
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8242] Support surrogates in import ; install Python in a non-ASCII directory

2010-04-13 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

New version of the patch: all tests pass except of 3 (test_ftplib, 
test_pep3120, test_traceback).

--
Added file: http://bugs.python.org/file16919/surrogates-6.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8242
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8242] Support surrogates in import ; install Python in a non-ASCII directory

2010-04-13 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


--
dependencies: +subprocess: support undecodable current working directory on 
POSIX OS

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8242
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8242] Support surrogates in import ; install Python in a non-ASCII directory

2010-04-13 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


--
dependencies: +ctypes.dlopen() doesn't support surrogates

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8242
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8242] Support surrogates in import ; install Python in a non-ASCII directory

2010-04-13 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


Removed file: http://bugs.python.org/file16897/surrogates-5.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8242
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8242] Support surrogates in import ; install Python in a non-ASCII directory

2010-03-27 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
nosy: +loewis

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8242
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8242] Support surrogates in import ; install Python in a non-ASCII directory

2010-03-26 Thread STINNER Victor

New submission from STINNER Victor victor.stin...@haypocalc.com:

If the fullpath to the python3 binary contains a non-ASCII character and the 
file system encoding is ASCII, Python fails with:
---
Could not find platform independent libraries prefix
Could not find platform dependent libraries exec_prefix
Consider setting $PYTHONHOME to prefix[:exec_prefix]
Fatal Python error: Py_Initialize: can't initialize sys standard streams
ImportError: No module named encodings.utf_8
Abandon
---

The file system encoding is set to ASCII if there is no locale (eg. LANG=C).

The problem is that the command line argument, especially argv[0], is stored to 
a wchar_t* string using surrogates to store undecodable bytes.

Attached patch fixes calculate_path() and import functions to support 
surrogates. Details:

 * Initialize Py_FileSystemDefaultEncoding earlier in Py_InitializeEx(), 
because its value is required to encode unicode using surrogates to bytes
 * Rename char2wchar() to _Py_char2wchar(), the function is not more static ; 
and create function _Py_wchar2char()
 * Escape surrogates (reimplement surrogateescape decoder) in calculate_path() 
subfunctions (_wstat, _wgetcwd, _Py_wreadlink)
 * Use surrogateescape error handler in find_module(), NullImporter_init() and 
zipimporter_init()
 * Write a fastpath (I don't know the right term: is it an hack?) for utf-8 
encoding with surrogateescape error handler in PyUnicode_AsEncodedObject() and 
PyUnicode_AsEncodedString(): required because these functions are called by 
codecs module is initialized

The patch is a work in progress: there are some FIXME (I don't know if the 
string should be encoded/decoded using surrogates or not).

I only tested ASCII and UTF-8 file system encodings. I don't know if we can 
support more encodings. Python has few builtin encodings. Other encodings are 
implemented in Python: we have to import them, but we need the codec to import 
a module, so...

I don't think that Windows is affected by this issue because it has a better 
API for unicode filenames and command line arguments, and most patched 
functions are surrounded by #ifndef WINDOWS ... #endif

--
components: Unicode
files: surrogates_bootstrap-4.patch
keywords: patch
messages: 101815
nosy: haypo
severity: normal
status: open
title: Support surrogates in import ; install Python in a non-ASCII directory
versions: Python 3.1, Python 3.2
Added file: http://bugs.python.org/file16671/surrogates_bootstrap-4.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8242
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8242] Support surrogates in import ; install Python in a non-ASCII directory

2010-03-26 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

If I understood correctly, my patch is also required to import a module having 
a non-ASCII full path if the file system encoding is ASCII.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8242
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8242] Support surrogates in import ; install Python in a non-ASCII directory

2010-03-26 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 Initialize Py_FileSystemDefaultEncoding earlier in Py_InitializeEx(),
 because its value is required to encode unicode using surrogates to bytes

Oh, it doesn't work: get_codeset() returns NULL, because the codec register is 
empty when get_codeset() is called (with my patch).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8242
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com