STINNER Victor <victor.stin...@haypocalc.com> added the comment: > - PKG-INFO (METADATA in distutil2), that already uses a trick to support > Unicode, but your change would replace it in a better way;
Which "trick"? > - MANIFEST, which with your fix would gain the ability to handle non-ASCII > paths, which is a feature or a bugfix depending on your point of view; Wait. Non encodable bytes is a separated issue. I would like to work on the first problem: distutils in Python3 uses open() without encoding argument and so the encoding depends on the user's locale. Said differently: if you produce a file with distutils on a computer, you cannot be sure that the file can be read with the same version of Python on other computer (if the locale encoding is different). Eg. Windows uses mbcs encoding whereas utf-8 is the preferred encoding on Linux. What is the encoding of the MANIFEST file? > - .def files, used by the compilers for the C linking step; I don’t know if > it’s appropriate to allow UTF-8 there. I don't know these files. > - RPM spec files, which use ASCII or UTF-8 according to > http://en.opensuse.org/openSUSE:Specfile_guidelines#Specfile_Encoding but > it’s not confirmed in > http://www.rpm.org/max-rpm/s1-rpm-build-creating-spec-file.html (linked > from the LSB site), so there’s no guarantee this works for all RPM > platforms. This sort of platform-specific thing is the reason why RPM > support has been removed in distutils2. UTF-8 is a superset of ASCII. If you use utf-8 but only write ascii characters, your output file will be written to utf-8... but it will be also encoded to ascii. It's magical :-) > - record and .pth files created by the install command. .pth contain directory names which can be non-ASCII. > I agree that there is something to be fixed, but I don’t know if they can > be fixed in distutils. Unicode in PKG-INFO is unrelated to files, whereas > there are files or directories in MANIFEST, spec, record and .pth. You can use non-ASCII characters for other topics than filenames. Eg. in a description of a package :-) > If this is going to be fixed, write_file should not use UTF-8 unconditionally > but grow a keyword argument IMO, so that use cases requiring ASCII > continue to work. As written before, UTF-8 is a superset of ASCII. If you read a file using utf-8 encoding, you will be able to read ascii files. But if you use utf-8 and write non-ascii characters, old version of distutils using ascii or other encoding will not be able to read these files. Anyway, I think that in most cases, all files only contain ASCII text. So it doesn't really matter. About the keyword solution: yes, it would be a smooth way to fix this issue. > When you say “patch *all* functions reading files”, I guess you mean all > functions that read distutils files, i.e. MANIFEST and PKG-INFO. I don't know distutils to answer to my own question. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue9561> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com