Barry Scott writes:

 > Also beware that zip file format does not include the encoding of
 > the files that are in the zip file.

The most recent zipfile format, which is now a decade or so old, does
specify the encoding, for values of 0 = ASCII, 1 = UTF-8.[1]

 > This means that for practical purposes only ASCII filenames are
 > portable across systems. Is this limitation a problem for this
 > proposal?

As far as I know, with the exception of a few Japanese bureaucrats,
everybody uses zip implementations that handle non-ASCII properly.
InfoZip is one such that is portable, although I don't recall how it
handles filesystems with non-Unicode file name encodings.

>From the point of view of this proposal, just require that filename
encodings be properly specified, and provide an option to use the
appropriate codec.  This isn't too hard.  The main thing to rule out
is multiple encodings in one file system (yes, I've seen it, but not
recently, thank the powers).

This could even be handled (on POSIX filesystems) with an auxiliary
utility that converts whatever-encoded filenames to UTF-8 (could be a
symlink tree).  Then you can just require a UTF-8 filesystem
throughout the zipapp handling system.  Only remaining question in my
mind would be backward compatibility with any existing zipapp specs
(which I have no idea about, but if I were participating in
implementation I'd be sure to check).


Footnotes: 
[1]  Or maybe it's 0 = ISO-8859-1, 1 = UTF-8.  Sorry, don't have a
copy of the spec handy.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/I7AE3GYD7T57NEMVGFWIEWC2DQZ6MMPN/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to