On Fri, Jan 23, 2009 at 11:18:37PM +0100, "Martin v. L?wis" wrote: > > I don't see how starting with an empty directory helps. The filename > > comes from the client, and the FTP server can't know what the actual > > encoding of that filename is. > > Sure it can. If the client supports RFC 2640, it will send file names > in UTF-8. If the client does not support RFC 2640, the client must > restrict itself to 7-bit file names (i.e. ASCII). If the client violates > the protocol, the server must respond with error 501.
Perhaps, that is true, but that is in the world of standards. In my life I remember the situation when users uploaded files from Windows with names in CP866 encoding to UNIX-based ftp server, which by itself had KOI8-R as the encoding for LC_CTYPE. Since administrator was unhappy being impossible to read the names of files correctly, he found and installed specialized ("russified") version of ftp daemon, which had configuration settings, that said what is the network encoding and what is the filesystem encoding. So both ftp daemon and ftp clients violated RFC, but users and administrator were happy. I think, we should preserve the ability of ftp client to download all files he see in the listing from the server. What to do with user specified filenames when they cannot be encoded into ascii and server does not support UTF8, but violates RFC and allows 8-bit bytes in the file names? The ideal ftp client will ask the user about the encoding he thiks filenames are stored on the server side and then recode from user's encoding. It also allow the user to try several variants, if first don't work. It will allow user to download files with names in several different encodings from the same server using single ftp session. Dumb client will send filename from user as bytes, and will succeed, if user was able to specify filename verbatim. Anything between that will make the idea of using Unicode as character encoding for filenames absurd, since it will only break the i18n capabilities of the library. If python library will have file name encoding hardwired to latin1, but arguments will only be unicode strings, well, a lot of people will not even notice that, since they use only ascii part of utf-8. But then there will be again numerous "russification"-like patches to python and to modules, which are incompatible with everything, but work well in some very specific situations. This is the evil that was supposed to be defeated with i18n and with the total adoption of Unicode. Alexey G. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com