On Wed, Mar 17, 2021 at 1:11 AM Michał Górny <mgo...@gentoo.org> wrote:

> On Wed, 2021-03-17 at 13:55 +0900, Inada Naoki wrote:
> > OK. setuptools doesn't specify encoding at all. So locale-specific
> > encoding is used.
> > We can not fix it in short term.
>
> How about writing paths as bytestrings in the long term?  I think this
> should eliminate the necessity of knowing the correct encoding for
> the filesystem.
>
On Linux and many Unixes, there is no "correct" filesystem encoding.  ASCII
and UTF-8 are probably the most common encodings for individual files,
maybe even large collections of files, but nevertheless, paths are
bytestrings.  Treating paths as UTF-8 works fine for most files, but once
in a while there'll be a filename that fails to convert, and that's not the
fault of the filename.

For example, what happens if you need a file to be named touch "Ma$(echo |
tr '\012' '\361')ana" ?

For a presentation application (for EG), assuming UTF-8 is probably fine,
maybe even a good thing.  But for a filesystem backup tool, it's important
to not assume an encoding so you can back up and restore all filenames
irrespective of what the files' creators intended encodingwise.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HLTFATPMRA57UU3KQOXHIMELZZGXUUJJ/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to