Re: Unicode entries on sys.path
Thomas Heller wrote: Martin v. Löwis [EMAIL PROTECTED] writes: Thomas Heller wrote: How should these patches be approached? Please have a look as to how posixmodule.c and fileobject.c deal with this issue. On windows, it would probably be easiest to use the MS generic text routines: _tcslen instead of strlen, for example, and to rely on the _UNICODE preprocessor symbol to map this function to strlen or wcslen. No. This fails for two reasons: 1. We don't compile Python with _UNICODE, and never will do so. This macro is only a mechanism to simplify porting code from ANSI APIs to Unicode APIs, so you don't have to reformulate all the API calls. For new code, it is better to use the Unicode APIs directly if you plan to use them. 2. On Win9x, the Unicode APIs don't work (*). So you need to chose at run-time whether you want to use wide or narrow API. Unless a) we ship two binaries in the future, one for W9x, one for NT+ (I hope this won't happen), or b) we drop support for W9x. I'm in favour of doing so sooner or later, but perhaps not for Python 2.5. I wasn't asking about the *W functions, I'm asking about string/unicode handling in Python source files. Looking into Python/import.c, wouldn't it be required to change the signature of a lot of functions to receive PyObject* arguments, instead of char* ? For example, find_module should change from static struct filedescr *find_module(char *, char *, PyObject *, char *, size_t, FILE **, PyObject **); to static struct filedescr *find_module(char *, char *, PyObject *, PyObject **, FILE **, PyObject **); where the fourth argument would now be either a PyString or PyUnicode object pointer? (*) Can somebody please report whether the *W file APIs fail on W9x because the entry points are not there (so you can't even run the binary), or because they fail with an error when called? I always thought that the *W apis would not be there in win98, but it seems that is wrong. Fortunately, how could Python, which links to the FindFirstFileW exported function for example, run on win98 otherwise... Normally I would have thought this would require using the Microsoft Layer for Unicode (unicows.dll). According to MSDN 9x already does have a handful of unicode APIs. FindFirstFile does not seem to be one of them - unless the list on htpp://msdn.microsoft.com/library/default.asp?url=/library/en-us/mslu/winprog/other_existing_unicode_support.asp) is bogus (?). -- Vincent Wehren Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode entries on sys.path
vincent wehren wrote: FindFirstFile does not seem to be one of them - unless the list on htpp://msdn.microsoft.com/library/default.asp?url=/library/en-us/mslu/winprog/other_existing_unicode_support.asp) is bogus (?). It might perhaps be misleading: I think the entry points are there, but calling the functions will always fail. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode entries on sys.path
vincent wehren schreef: Normally I would have thought this would require using the Microsoft Layer for Unicode (unicows.dll). If Python is going to use unicows.dll, it might want to use libunicows for compatibility with mingw etc.: http://libunicows.sourceforge.net/ -- JanC Be strict when sending and tolerant when receiving. RFC 1958 - Architectural Principles of the Internet - section 3.9 -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode entries on sys.path
Martin v. Löwis [EMAIL PROTECTED] writes: Thomas Heller wrote: How should these patches be approached? Please have a look as to how posixmodule.c and fileobject.c deal with this issue. On windows, it would probably be easiest to use the MS generic text routines: _tcslen instead of strlen, for example, and to rely on the _UNICODE preprocessor symbol to map this function to strlen or wcslen. No. This fails for two reasons: 1. We don't compile Python with _UNICODE, and never will do so. This macro is only a mechanism to simplify porting code from ANSI APIs to Unicode APIs, so you don't have to reformulate all the API calls. For new code, it is better to use the Unicode APIs directly if you plan to use them. 2. On Win9x, the Unicode APIs don't work (*). So you need to chose at run-time whether you want to use wide or narrow API. Unless a) we ship two binaries in the future, one for W9x, one for NT+ (I hope this won't happen), or b) we drop support for W9x. I'm in favour of doing so sooner or later, but perhaps not for Python 2.5. I wasn't asking about the *W functions, I'm asking about string/unicode handling in Python source files. Looking into Python/import.c, wouldn't it be required to change the signature of a lot of functions to receive PyObject* arguments, instead of char* ? For example, find_module should change from static struct filedescr *find_module(char *, char *, PyObject *, char *, size_t, FILE **, PyObject **); to static struct filedescr *find_module(char *, char *, PyObject *, PyObject **, FILE **, PyObject **); where the fourth argument would now be either a PyString or PyUnicode object pointer? (*) Can somebody please report whether the *W file APIs fail on W9x because the entry points are not there (so you can't even run the binary), or because they fail with an error when called? I always thought that the *W apis would not be there in win98, but it seems that is wrong. Fortunately, how could Python, which links to the FindFirstFileW exported function for example, run on win98 otherwise... Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode entries on sys.path
Thomas Heller wrote: I wasn't asking about the *W functions, I'm asking about string/unicode handling in Python source files. Looking into Python/import.c, wouldn't it be required to change the signature of a lot of functions to receive PyObject* arguments, instead of char* ? Yes, that would be one solution. Another solution would be to provide an additional Py_UNICODE*, and to allow that pointer to be NULL. Most systems would ignore that pointer (and it would be NULL most of the time), except on NT+, which would use the Py_UNICODE* if available, and the char* otherwise. I always thought that the *W apis would not be there in win98, but it seems that is wrong. Fortunately, how could Python, which links to the FindFirstFileW exported function for example, run on win98 otherwise... Thanks, that is convincing. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode entries on sys.path
On Thu, 23 Dec 2004 19:24:58 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= [EMAIL PROTECTED] wrote: Thomas Heller wrote: It seems that Python itself converts unicode entries in sys.path to normal strings using windows default conversion rules - is this a problem that I can fix by changing some regional setting on my machine? You can set the system code page on the third tab on the XP regional settings (character set for non-unicode applications). This, of course, assumes that there is a character set that supports all directories in sys.path. If you have Japanese characters on sys.path, you certainly need to set the system locale to Japanese (is that CP932?). Changing this setting requires a reboot. Hm, maybe more a windows question than a python question... The real question here is: why does Python not support arbitrary Unicode strings on sys.path? It could, in principle, atleast on Windows NT+ (and also on OSX). Patches are welcome. What about removable drives? And mountable multiple file system types? Maybe some collections of potentially homogenous file system references such as sys.path need to be virtualized to carry relevant file system encoding and protocol info etc. That could cover synthetic or compressed info sources too, IWT. Homogeneous package representation could be a similar problem, I guess. Regards, Bengt Richter -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode entries on sys.path
Martin v. Löwis [EMAIL PROTECTED] writes: Thomas Heller wrote: It seems that Python itself converts unicode entries in sys.path to normal strings using windows default conversion rules - is this a problem that I can fix by changing some regional setting on my machine? You can set the system code page on the third tab on the XP regional settings (character set for non-unicode applications). This, of course, assumes that there is a character set that supports all directories in sys.path. If you have Japanese characters on sys.path, you certainly need to set the system locale to Japanese (is that CP932?). Changing this setting requires a reboot. Hm, maybe more a windows question than a python question... The real question here is: why does Python not support arbitrary Unicode strings on sys.path? It could, in principle, atleast on Windows NT+ (and also on OSX). Patches are welcome. How should these patches be approached? On windows, it would probably be easiest to use the MS generic text routines: _tcslen instead of strlen, for example, and to rely on the _UNICODE preprocessor symbol to map this function to strlen or wcslen. Is there a similar thing in the non-windows world? Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode entries on sys.path
Thomas Heller wrote: How should these patches be approached? Please have a look as to how posixmodule.c and fileobject.c deal with this issue. On windows, it would probably be easiest to use the MS generic text routines: _tcslen instead of strlen, for example, and to rely on the _UNICODE preprocessor symbol to map this function to strlen or wcslen. No. This fails for two reasons: 1. We don't compile Python with _UNICODE, and never will do so. This macro is only a mechanism to simplify porting code from ANSI APIs to Unicode APIs, so you don't have to reformulate all the API calls. For new code, it is better to use the Unicode APIs directly if you plan to use them. 2. On Win9x, the Unicode APIs don't work (*). So you need to chose at run-time whether you want to use wide or narrow API. Unless a) we ship two binaries in the future, one for W9x, one for NT+ (I hope this won't happen), or b) we drop support for W9x. I'm in favour of doing so sooner or later, but perhaps not for Python 2.5. Regards, Martin (*) Can somebody please report whether the *W file APIs fail on W9x because the entry points are not there (so you can't even run the binary), or because they fail with an error when called? -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode entries on sys.path
Just wrote: The real question here is: why does Python not support arbitrary Unicode strings on sys.path? It could, in principle, atleast on Windows NT+ (and also on OSX). Patches are welcome. Works for me on OSX 10.3.6, as it should: prior to using the sys.path entry, a unicode string is encoded with Py_FileSystemDefaultEncoding. I'm not sure how well it works together with zipimport, though. As Vincent's message already implies, I'm asking for Windows patches. In a Windows system, there are path names which just *don't have* a representation in the file system default encoding. So you just can't use the standard file system API (open, read, write) to access those files - instead, you have to use specific Unicode variants of the file system API. The only operating system in active use that can reliably represent all file names in the standard API is OS X. Unix can do that as long as the locale is UTF-8; for all other systems, there are restrictions when you try to use the file system API to access files with funny characters. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode entries on sys.path
In article [EMAIL PROTECTED], Martin v. Lowis [EMAIL PROTECTED] wrote: Hm, maybe more a windows question than a python question... The real question here is: why does Python not support arbitrary Unicode strings on sys.path? It could, in principle, atleast on Windows NT+ (and also on OSX). Patches are welcome. Works for me on OSX 10.3.6, as it should: prior to using the sys.path entry, a unicode string is encoded with Py_FileSystemDefaultEncoding. I'm not sure how well it works together with zipimport, though. Just -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode entries on sys.path
Just wrote: In article [EMAIL PROTECTED], Martin v. Lowis [EMAIL PROTECTED] wrote: Hm, maybe more a windows question than a python question... The real question here is: why does Python not support arbitrary Unicode strings on sys.path? It could, in principle, atleast on Windows NT+ (and also on OSX). Patches are welcome. Works for me on OSX 10.3.6, as it should: prior to using the sys.path entry, a unicode string is encoded with Py_FileSystemDefaultEncoding. For this conversion mbcs will be used on Windows machines, implying that such conversions are made using the current system Ansi codepage. (As a matter of interest: What is this on OSX?). This conversion is likely to be useless for unicode directory names containing characters that do not have a mapping to a character in this particular codepage. The technique described by Martin may solve the problem for what in this case are Japanese characters, but what if I have directory names from another language group, such as simpliefied Chinese, as well? The only way to get around this is to allow - as Martin suggests - arbitrary unicode strings in sys.path on those platforms that may have unicode file names. -- Vincen Wehren I'm not sure how well it works together with zipimport, though. Just -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode entries on sys.path
In article [EMAIL PROTECTED], vincent wehren [EMAIL PROTECTED] wrote: Just wrote: In article [EMAIL PROTECTED], Martin v. Lowis [EMAIL PROTECTED] wrote: Hm, maybe more a windows question than a python question... The real question here is: why does Python not support arbitrary Unicode strings on sys.path? It could, in principle, atleast on Windows NT+ (and also on OSX). Patches are welcome. Works for me on OSX 10.3.6, as it should: prior to using the sys.path entry, a unicode string is encoded with Py_FileSystemDefaultEncoding. For this conversion mbcs will be used on Windows machines, implying that such conversions are made using the current system Ansi codepage. (As a matter of interest: What is this on OSX?). UTF-8. Just -- http://mail.python.org/mailman/listinfo/python-list
Unicode entries on sys.path
I was trying to track down a bug in py2exe where the executable did not work when it is in a directory containing japanese characters. Then, I discovered that part of the problem is in the zipimporter that py2exe uses, and finally I found that it didn't even work in Python itself. If the entry in sys.path contains normal western characters, umlauts for example, it works fine. But when I copied some japanese characters from a random web page, and named a directory after that, it didn't work any longer. The windows command prompt is not able to print these characters, although windows explorer has no problems showing them. Here's the script, the subdirectory contains the file 'somemodule.py', but importing this fails: import sys sys.path = [u'\u5b66\u6821\u30c7xx'] print sys.path import somemodule It seems that Python itself converts unicode entries in sys.path to normal strings using windows default conversion rules - is this a problem that I can fix by changing some regional setting on my machine? Hm, maybe more a windows question than a python question... Thanks, Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode entries on sys.path
Thomas Heller wrote: It seems that Python itself converts unicode entries in sys.path to normal strings using windows default conversion rules - is this a problem that I can fix by changing some regional setting on my machine? You can set the system code page on the third tab on the XP regional settings (character set for non-unicode applications). This, of course, assumes that there is a character set that supports all directories in sys.path. If you have Japanese characters on sys.path, you certainly need to set the system locale to Japanese (is that CP932?). Changing this setting requires a reboot. Hm, maybe more a windows question than a python question... The real question here is: why does Python not support arbitrary Unicode strings on sys.path? It could, in principle, atleast on Windows NT+ (and also on OSX). Patches are welcome. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list