Re: Unicode entries on sys.path

2004-12-30 Thread vincent wehren
Thomas Heller wrote:
Martin v. Löwis [EMAIL PROTECTED] writes:

Thomas Heller wrote:
How should these patches be approached?
Please have a look as to how posixmodule.c and fileobject.c deal with
this issue.

On windows, it would probably
be easiest to use the MS generic text routines: _tcslen instead of
strlen, for example, and to rely on the _UNICODE preprocessor symbol to
map this function to strlen or wcslen.
No. This fails for two reasons:
1. We don't compile Python with _UNICODE, and never will do so. This
   macro is only a mechanism to simplify porting code from ANSI APIs
   to Unicode APIs, so you don't have to reformulate all the API calls.
   For new code, it is better to use the Unicode APIs directly if you
   plan to use them.
2. On Win9x, the Unicode APIs don't work (*). So you need to chose at
   run-time whether you want to use wide or narrow API. Unless
   a) we ship two binaries in the future, one for W9x, one for NT+
  (I hope this won't happen), or
   b) we drop support for W9x. I'm in favour of doing so sooner or
  later, but perhaps not for Python 2.5.

I wasn't asking about the *W functions, I'm asking about string/unicode
handling in Python source files. Looking into Python/import.c, wouldn't
it be required to change the signature of a lot of functions to receive
PyObject* arguments, instead of char* ?
For example, find_module should change from
  static struct filedescr *find_module(char *, char *, PyObject *,
   char *, size_t, FILE **, PyObject **);
to 

  static struct filedescr *find_module(char *, char *, PyObject *,
   PyObject **, FILE **, PyObject **);
where the fourth argument would now be either a PyString or PyUnicode
object pointer?

(*) Can somebody please report whether the *W file APIs fail on W9x
because the entry points are not there (so you can't even run the
binary), or because they fail with an error when called?

I always thought that the *W apis would not be there in win98, but it
seems that is wrong.  Fortunately, how could Python, which links to the
FindFirstFileW exported function for example, run on win98 otherwise...
Normally I would have thought this would require using the Microsoft 
Layer for Unicode (unicows.dll).

According to MSDN 9x already does have a handful of unicode APIs.
FindFirstFile does not seem to be one of them - unless the list on
htpp://msdn.microsoft.com/library/default.asp?url=/library/en-us/mslu/winprog/other_existing_unicode_support.asp)
is bogus (?).
--
Vincent Wehren

Thomas
--
http://mail.python.org/mailman/listinfo/python-list


Re: Unicode entries on sys.path

2004-12-30 Thread Martin v. Löwis
vincent wehren wrote:
FindFirstFile does not seem to be one of them - unless the list on
htpp://msdn.microsoft.com/library/default.asp?url=/library/en-us/mslu/winprog/other_existing_unicode_support.asp) 

is bogus (?).
It might perhaps be misleading: I think the entry points are there, but 
calling the functions will always fail.

Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list


Re: Unicode entries on sys.path

2004-12-30 Thread JanC
vincent wehren schreef:

 Normally I would have thought this would require using the Microsoft 
 Layer for Unicode (unicows.dll).

If Python is going to use unicows.dll, it might want to use libunicows for 
compatibility with mingw etc.: http://libunicows.sourceforge.net/


-- 
JanC

Be strict when sending and tolerant when receiving.
RFC 1958 - Architectural Principles of the Internet - section 3.9
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Unicode entries on sys.path

2004-12-29 Thread Thomas Heller
Martin v. Löwis [EMAIL PROTECTED] writes:

 Thomas Heller wrote:
 How should these patches be approached?

 Please have a look as to how posixmodule.c and fileobject.c deal with
 this issue.

 On windows, it would probably
 be easiest to use the MS generic text routines: _tcslen instead of
 strlen, for example, and to rely on the _UNICODE preprocessor symbol to
 map this function to strlen or wcslen.

 No. This fails for two reasons:
 1. We don't compile Python with _UNICODE, and never will do so. This
 macro is only a mechanism to simplify porting code from ANSI APIs
 to Unicode APIs, so you don't have to reformulate all the API calls.
 For new code, it is better to use the Unicode APIs directly if you
 plan to use them.
 2. On Win9x, the Unicode APIs don't work (*). So you need to chose at
 run-time whether you want to use wide or narrow API. Unless
 a) we ship two binaries in the future, one for W9x, one for NT+
(I hope this won't happen), or
 b) we drop support for W9x. I'm in favour of doing so sooner or
later, but perhaps not for Python 2.5.

I wasn't asking about the *W functions, I'm asking about string/unicode
handling in Python source files. Looking into Python/import.c, wouldn't
it be required to change the signature of a lot of functions to receive
PyObject* arguments, instead of char* ?
For example, find_module should change from
  static struct filedescr *find_module(char *, char *, PyObject *,
   char *, size_t, FILE **, PyObject **);

to 

  static struct filedescr *find_module(char *, char *, PyObject *,
   PyObject **, FILE **, PyObject **);

where the fourth argument would now be either a PyString or PyUnicode
object pointer?

 (*) Can somebody please report whether the *W file APIs fail on W9x
 because the entry points are not there (so you can't even run the
 binary), or because they fail with an error when called?

I always thought that the *W apis would not be there in win98, but it
seems that is wrong.  Fortunately, how could Python, which links to the
FindFirstFileW exported function for example, run on win98 otherwise...

Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Unicode entries on sys.path

2004-12-29 Thread Martin v. Löwis
Thomas Heller wrote:
I wasn't asking about the *W functions, I'm asking about string/unicode
handling in Python source files. Looking into Python/import.c, wouldn't
it be required to change the signature of a lot of functions to receive
PyObject* arguments, instead of char* ?
Yes, that would be one solution. Another solution would be to provide an
additional Py_UNICODE*, and to allow that pointer to be NULL. Most
systems would ignore that pointer (and it would be NULL most of the
time), except on NT+, which would use the Py_UNICODE* if available,
and the char* otherwise.
I always thought that the *W apis would not be there in win98, but it
seems that is wrong.  Fortunately, how could Python, which links to the
FindFirstFileW exported function for example, run on win98 otherwise...
Thanks, that is convincing.
Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list


Re: Unicode entries on sys.path

2004-12-27 Thread Bengt Richter
On Thu, 23 Dec 2004 19:24:58 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= 
[EMAIL PROTECTED] wrote:

Thomas Heller wrote:
 It seems that Python itself converts unicode entries in sys.path to
 normal strings using windows default conversion rules - is this a
 problem that I can fix by changing some regional setting on my machine?

You can set the system code page on the third tab on the XP
regional settings (character set for non-unicode applications).
This, of course, assumes that there is a character set that supports
all directories in sys.path. If you have Japanese characters on
sys.path, you certainly need to set the system locale to Japanese
(is that CP932?).

Changing this setting requires a reboot.

 Hm, maybe more a windows question than a python question...

The real question here is: why does Python not support arbitrary
Unicode strings on sys.path? It could, in principle, atleast on
Windows NT+ (and also on OSX). Patches are welcome.

What about removable drives? And mountable multiple file system types?
Maybe some collections of potentially homogenous file system references
such as sys.path need to be virtualized to carry relevant file system
encoding and protocol info etc. That could cover synthetic or compressed
info sources too, IWT. Homogeneous package representation could be a similar
problem, I guess.

Regards,
Bengt Richter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Unicode entries on sys.path

2004-12-27 Thread Thomas Heller
Martin v. Löwis [EMAIL PROTECTED] writes:

 Thomas Heller wrote:
 It seems that Python itself converts unicode entries in sys.path to
 normal strings using windows default conversion rules - is this a
 problem that I can fix by changing some regional setting on my machine?

 You can set the system code page on the third tab on the XP
 regional settings (character set for non-unicode applications).
 This, of course, assumes that there is a character set that supports
 all directories in sys.path. If you have Japanese characters on
 sys.path, you certainly need to set the system locale to Japanese
 (is that CP932?).

 Changing this setting requires a reboot.

 Hm, maybe more a windows question than a python question...

 The real question here is: why does Python not support arbitrary
 Unicode strings on sys.path? It could, in principle, atleast on
 Windows NT+ (and also on OSX). Patches are welcome.

How should these patches be approached?  On windows, it would probably
be easiest to use the MS generic text routines: _tcslen instead of
strlen, for example, and to rely on the _UNICODE preprocessor symbol to
map this function to strlen or wcslen.  Is there a similar thing in the
non-windows world?

Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Unicode entries on sys.path

2004-12-27 Thread Martin v. Löwis
Thomas Heller wrote:
How should these patches be approached?  
Please have a look as to how posixmodule.c and fileobject.c deal with
this issue.
On windows, it would probably
be easiest to use the MS generic text routines: _tcslen instead of
strlen, for example, and to rely on the _UNICODE preprocessor symbol to
map this function to strlen or wcslen.
No. This fails for two reasons:
1. We don't compile Python with _UNICODE, and never will do so. This
   macro is only a mechanism to simplify porting code from ANSI APIs
   to Unicode APIs, so you don't have to reformulate all the API calls.
   For new code, it is better to use the Unicode APIs directly if you
   plan to use them.
2. On Win9x, the Unicode APIs don't work (*). So you need to chose at
   run-time whether you want to use wide or narrow API. Unless
   a) we ship two binaries in the future, one for W9x, one for NT+
  (I hope this won't happen), or
   b) we drop support for W9x. I'm in favour of doing so sooner or
  later, but perhaps not for Python 2.5.
Regards,
Martin
(*) Can somebody please report whether the *W file APIs fail on W9x
because the entry points are not there (so you can't even run the
binary), or because they fail with an error when called?
--
http://mail.python.org/mailman/listinfo/python-list


Re: Unicode entries on sys.path

2004-12-26 Thread Martin v. Löwis
Just wrote:
The real question here is: why does Python not support arbitrary
Unicode strings on sys.path? It could, in principle, atleast on
Windows NT+ (and also on OSX). Patches are welcome.

Works for me on OSX 10.3.6, as it should: prior to using the sys.path 
entry, a unicode string is encoded with Py_FileSystemDefaultEncoding. 
I'm not sure how well it works together with zipimport, though.
As Vincent's message already implies, I'm asking for Windows patches.
In a Windows system, there are path names which just *don't have*
a representation in the file system default encoding. So you just
can't use the standard file system API (open, read, write) to access
those files - instead, you have to use specific Unicode variants
of the file system API.
The only operating system in active use that can reliably represent
all file names in the standard API is OS X. Unix can do that as
long as the locale is UTF-8; for all other systems, there are
restrictions when you try to use the file system API to access
files with funny characters.
Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list


Re: Unicode entries on sys.path

2004-12-24 Thread Just
In article [EMAIL PROTECTED],
 Martin v. Lowis [EMAIL PROTECTED] wrote:

  Hm, maybe more a windows question than a python question...
 
 The real question here is: why does Python not support arbitrary
 Unicode strings on sys.path? It could, in principle, atleast on
 Windows NT+ (and also on OSX). Patches are welcome.

Works for me on OSX 10.3.6, as it should: prior to using the sys.path 
entry, a unicode string is encoded with Py_FileSystemDefaultEncoding. 
I'm not sure how well it works together with zipimport, though.

Just
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Unicode entries on sys.path

2004-12-24 Thread vincent wehren
Just wrote:
In article [EMAIL PROTECTED],
 Martin v. Lowis [EMAIL PROTECTED] wrote:

Hm, maybe more a windows question than a python question...
The real question here is: why does Python not support arbitrary
Unicode strings on sys.path? It could, in principle, atleast on
Windows NT+ (and also on OSX). Patches are welcome.

Works for me on OSX 10.3.6, as it should: prior to using the sys.path 
entry, a unicode string is encoded with Py_FileSystemDefaultEncoding. 
For this conversion mbcs will be used on Windows machines, implying 
that such conversions are made using the current system Ansi codepage.
(As a matter of interest: What is this on OSX?). This conversion is 
likely to be useless for unicode directory names containing characters 
that do not have a mapping to a character in this particular codepage.

The technique described by Martin may solve the problem for what in this 
case are Japanese characters, but what if I have directory names from 
another language group, such as simpliefied Chinese, as well?

The only way to get around this is to allow - as Martin suggests - 
arbitrary unicode strings in sys.path on those platforms that may have 
unicode file names.

--
Vincen Wehren
I'm not sure how well it works together with zipimport, though.

Just
--
http://mail.python.org/mailman/listinfo/python-list


Re: Unicode entries on sys.path

2004-12-24 Thread Just
In article [EMAIL PROTECTED],
 vincent wehren [EMAIL PROTECTED] wrote:

 Just wrote:
  In article [EMAIL PROTECTED],
   Martin v. Lowis [EMAIL PROTECTED] wrote:
  
  
 Hm, maybe more a windows question than a python question...
 
 The real question here is: why does Python not support arbitrary
 Unicode strings on sys.path? It could, in principle, atleast on
 Windows NT+ (and also on OSX). Patches are welcome.
  
  
  Works for me on OSX 10.3.6, as it should: prior to using the sys.path 
  entry, a unicode string is encoded with Py_FileSystemDefaultEncoding. 
 
 For this conversion mbcs will be used on Windows machines, implying 
 that such conversions are made using the current system Ansi codepage.
 (As a matter of interest: What is this on OSX?).

UTF-8.

Just
-- 
http://mail.python.org/mailman/listinfo/python-list


Unicode entries on sys.path

2004-12-23 Thread Thomas Heller
I was trying to track down a bug in py2exe where the executable did
not work when it is in a directory containing japanese characters.

Then, I discovered that part of the problem is in the zipimporter that
py2exe uses, and finally I found that it didn't even work in Python
itself.

If the entry in sys.path contains normal western characters, umlauts for
example, it works fine.  But when I copied some japanese characters from
a random web page, and named a directory after that, it didn't work any
longer.

The windows command prompt is not able to print these characters,
although windows explorer has no problems showing them.

Here's the script, the subdirectory contains the file 'somemodule.py',
but importing this fails:

  import sys
  sys.path = [u'\u5b66\u6821\u30c7xx']
  print sys.path

  import somemodule

It seems that Python itself converts unicode entries in sys.path to
normal strings using windows default conversion rules - is this a
problem that I can fix by changing some regional setting on my machine?

Hm, maybe more a windows question than a python question...

Thanks,
Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Unicode entries on sys.path

2004-12-23 Thread Martin v. Löwis
Thomas Heller wrote:
It seems that Python itself converts unicode entries in sys.path to
normal strings using windows default conversion rules - is this a
problem that I can fix by changing some regional setting on my machine?
You can set the system code page on the third tab on the XP
regional settings (character set for non-unicode applications).
This, of course, assumes that there is a character set that supports
all directories in sys.path. If you have Japanese characters on
sys.path, you certainly need to set the system locale to Japanese
(is that CP932?).
Changing this setting requires a reboot.
Hm, maybe more a windows question than a python question...
The real question here is: why does Python not support arbitrary
Unicode strings on sys.path? It could, in principle, atleast on
Windows NT+ (and also on OSX). Patches are welcome.
Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list