New submission from STINNER Victor <victor.stin...@haypocalc.com>: If the fullpath to the python3 binary contains a non-ASCII character and the file system encoding is ASCII, Python fails with: --- Could not find platform independent libraries <prefix> Could not find platform dependent libraries <exec_prefix> Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>] Fatal Python error: Py_Initialize: can't initialize sys standard streams ImportError: No module named encodings.utf_8 Abandon ---
The file system encoding is set to ASCII if there is no locale (eg. LANG=C). The problem is that the command line argument, especially argv[0], is stored to a wchar_t* string using surrogates to store undecodable bytes. Attached patch fixes calculate_path() and import functions to support surrogates. Details: * Initialize Py_FileSystemDefaultEncoding earlier in Py_InitializeEx(), because its value is required to encode unicode using surrogates to bytes * Rename char2wchar() to _Py_char2wchar(), the function is not more static ; and create function _Py_wchar2char() * Escape surrogates (reimplement surrogateescape decoder) in calculate_path() subfunctions (_wstat, _wgetcwd, _Py_wreadlink) * Use surrogateescape error handler in find_module(), NullImporter_init() and zipimporter_init() * Write a "fastpath" (I don't know the right term: is it an hack?) for utf-8 encoding with surrogateescape error handler in PyUnicode_AsEncodedObject() and PyUnicode_AsEncodedString(): required because these functions are called by codecs module is initialized The patch is a work in progress: there are some FIXME (I don't know if the string should be encoded/decoded using surrogates or not). I only tested ASCII and UTF-8 file system encodings. I don't know if we can support more encodings. Python has few builtin encodings. Other encodings are implemented in Python: we have to import them, but we need the codec to import a module, so... I don't think that Windows is affected by this issue because it has a better API for unicode filenames and command line arguments, and most patched functions are surrounded by #ifndef WINDOWS ... #endif ---------- components: Unicode files: surrogates_bootstrap-4.patch keywords: patch messages: 101815 nosy: haypo severity: normal status: open title: Support surrogates in import ; install Python in a non-ASCII directory versions: Python 3.1, Python 3.2 Added file: http://bugs.python.org/file16671/surrogates_bootstrap-4.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue8242> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com