[issue8242] Support surrogates in import ; install Python in a non-ASCII directory
Changes by STINNER Victor victor.stin...@haypocalc.com: -- dependencies: +utf8, backslashreplace and surrogates ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8242 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8242] Support surrogates in import ; install Python in a non-ASCII directory
Changes by STINNER Victor victor.stin...@haypocalc.com: -- dependencies: +subprocess: surrogates of the error message (Python implementation on non-Windows) ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8242 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8242] Support surrogates in import ; install Python in a non-ASCII directory
Changes by STINNER Victor victor.stin...@haypocalc.com: -- dependencies: +bz2: support surrogates in filename, and bytes/bytearray filename ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8242 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8242] Support surrogates in import ; install Python in a non-ASCII directory
STINNER Victor victor.stin...@haypocalc.com added the comment: Updated patch: - Some parts have been applied in other issues - Remove assert(PyBytes_Check(x)): support PyByteArray type - use PyErr_Format() instead of sprintf+PyErr_SetString in tokenizer.c - don't convert message to byte and then back to unicode in err_input(): keep the unicode object -- Added file: http://bugs.python.org/file17002/surrogates-7.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8242 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8242] Support surrogates in import ; install Python in a non-ASCII directory
Changes by STINNER Victor victor.stin...@haypocalc.com: Removed file: http://bugs.python.org/file16919/surrogates-6.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8242 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8242] Support surrogates in import ; install Python in a non-ASCII directory
STINNER Victor victor.stin...@haypocalc.com added the comment: $ diffstat ~/surrogates-7.patch Doc/library/tarfile.rst | 15 +-- Include/moduleobject.h |1 Lib/platform.py | 12 +- Lib/subprocess.py |2 Lib/tarfile.py | 14 -- Lib/test/regrtest.py|5 - Lib/test/test_import.py |5 + Lib/test/test_reprlib.py|4 Lib/test/test_subprocess.py |4 Lib/test/test_tarfile.py|4 Lib/test/test_urllib.py |8 + Lib/test/test_urllib2.py|4 Lib/test/test_xml_etree.py |6 + Lib/traceback.py| 10 +- Lib/unittest/runner.py |4 Modules/_ctypes/callproc.c | 12 +- Modules/_ssl.c | 10 +- Modules/_tkinter.c |6 - Modules/getpath.c | 100 ++-- Modules/main.c | 46 + Modules/posixmodule.c | 18 ++- Modules/pyexpat.c | 11 +- Modules/zipimport.c | 210 Objects/codeobject.c|7 + Objects/exceptions.c| 49 ++ Objects/fileobject.c|6 - Objects/moduleobject.c | 22 +++- Objects/unicodeobject.c | 22 +++- Parser/tokenizer.c | 18 ++- Python/_warnings.c | 26 - Python/ast.c| 10 +- Python/bltinmodule.c| 33 -- Python/ceval.c |4 Python/compile.c| 12 ++ Python/errors.c |4 Python/import.c | 88 -- Python/pythonrun.c | 39 Python/traceback.c | 39 ++-- 38 files changed, 625 insertions(+), 265 deletions(-) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8242 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8242] Support surrogates in import ; install Python in a non-ASCII directory
Martin v. Löwis mar...@v.loewis.de added the comment: I haven't reviewed the patch in detail yet, but it seems to me that it fixes independent issues. -1000 on that. One problem, one bug report in the tracker, one commit. If this issue is about the import machinery not working anymore if there is a non-ASCII character in the path, then why the heck does it touch posixmodule.c As for modules that have non-ASCII characters in their module name: this is, again, an unrelated issue (ISTM), so if you want to deal with it, please create a new issue. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8242 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8242] Support surrogates in import ; install Python in a non-ASCII directory
Changes by STINNER Victor victor.stin...@haypocalc.com: -- dependencies: +tarfile: use surrogates for undecode fields ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8242 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8242] Support surrogates in import ; install Python in a non-ASCII directory
STINNER Victor victor.stin...@haypocalc.com added the comment: I commited the platform.py patch as r80166 (trunk) and r80167 (py3k), but quickly reverted it because the patch on trunk broke Python bootstrap. The patch might be applied, but only on py3k and with more tests (ensure that it doesn't break bootstrap on any OS) :-) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8242 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8242] Support surrogates in import ; install Python in a non-ASCII directory
Changes by STINNER Victor victor.stin...@haypocalc.com: -- dependencies: +os.system() doesn't support surrogates nor bytes ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8242 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8242] Support surrogates in import ; install Python in a non-ASCII directory
Changes by STINNER Victor victor.stin...@haypocalc.com: -- dependencies: +pickle is unable to encode unicode surrogates ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8242 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8242] Support surrogates in import ; install Python in a non-ASCII directory
Changes by STINNER Victor victor.stin...@haypocalc.com: -- dependencies: +test_xmlrpc fails with non-ascii path ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8242 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8242] Support surrogates in import ; install Python in a non-ASCII directory
Changes by STINNER Victor victor.stin...@haypocalc.com: -- dependencies: +os.execvpe() doesn't support surrogates in env ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8242 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8242] Support surrogates in import ; install Python in a non-ASCII directory
STINNER Victor victor.stin...@haypocalc.com added the comment: New version of the patch: all tests pass except of 3 (test_ftplib, test_pep3120, test_traceback). -- Added file: http://bugs.python.org/file16919/surrogates-6.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8242 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8242] Support surrogates in import ; install Python in a non-ASCII directory
Changes by STINNER Victor victor.stin...@haypocalc.com: -- dependencies: +subprocess: support undecodable current working directory on POSIX OS ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8242 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8242] Support surrogates in import ; install Python in a non-ASCII directory
Changes by STINNER Victor victor.stin...@haypocalc.com: -- dependencies: +ctypes.dlopen() doesn't support surrogates ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8242 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8242] Support surrogates in import ; install Python in a non-ASCII directory
Changes by STINNER Victor victor.stin...@haypocalc.com: Removed file: http://bugs.python.org/file16897/surrogates-5.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8242 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8242] Support surrogates in import ; install Python in a non-ASCII directory
Changes by Antoine Pitrou pit...@free.fr: -- nosy: +loewis ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8242 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8242] Support surrogates in import ; install Python in a non-ASCII directory
New submission from STINNER Victor victor.stin...@haypocalc.com: If the fullpath to the python3 binary contains a non-ASCII character and the file system encoding is ASCII, Python fails with: --- Could not find platform independent libraries prefix Could not find platform dependent libraries exec_prefix Consider setting $PYTHONHOME to prefix[:exec_prefix] Fatal Python error: Py_Initialize: can't initialize sys standard streams ImportError: No module named encodings.utf_8 Abandon --- The file system encoding is set to ASCII if there is no locale (eg. LANG=C). The problem is that the command line argument, especially argv[0], is stored to a wchar_t* string using surrogates to store undecodable bytes. Attached patch fixes calculate_path() and import functions to support surrogates. Details: * Initialize Py_FileSystemDefaultEncoding earlier in Py_InitializeEx(), because its value is required to encode unicode using surrogates to bytes * Rename char2wchar() to _Py_char2wchar(), the function is not more static ; and create function _Py_wchar2char() * Escape surrogates (reimplement surrogateescape decoder) in calculate_path() subfunctions (_wstat, _wgetcwd, _Py_wreadlink) * Use surrogateescape error handler in find_module(), NullImporter_init() and zipimporter_init() * Write a fastpath (I don't know the right term: is it an hack?) for utf-8 encoding with surrogateescape error handler in PyUnicode_AsEncodedObject() and PyUnicode_AsEncodedString(): required because these functions are called by codecs module is initialized The patch is a work in progress: there are some FIXME (I don't know if the string should be encoded/decoded using surrogates or not). I only tested ASCII and UTF-8 file system encodings. I don't know if we can support more encodings. Python has few builtin encodings. Other encodings are implemented in Python: we have to import them, but we need the codec to import a module, so... I don't think that Windows is affected by this issue because it has a better API for unicode filenames and command line arguments, and most patched functions are surrounded by #ifndef WINDOWS ... #endif -- components: Unicode files: surrogates_bootstrap-4.patch keywords: patch messages: 101815 nosy: haypo severity: normal status: open title: Support surrogates in import ; install Python in a non-ASCII directory versions: Python 3.1, Python 3.2 Added file: http://bugs.python.org/file16671/surrogates_bootstrap-4.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8242 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8242] Support surrogates in import ; install Python in a non-ASCII directory
STINNER Victor victor.stin...@haypocalc.com added the comment: If I understood correctly, my patch is also required to import a module having a non-ASCII full path if the file system encoding is ASCII. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8242 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8242] Support surrogates in import ; install Python in a non-ASCII directory
STINNER Victor victor.stin...@haypocalc.com added the comment: Initialize Py_FileSystemDefaultEncoding earlier in Py_InitializeEx(), because its value is required to encode unicode using surrogates to bytes Oh, it doesn't work: get_codeset() returns NULL, because the codec register is empty when get_codeset() is called (with my patch). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8242 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com