[issue20329] zipfile.extractall fails in Posix shell with utf-8 filename

2017-12-18 Thread STINNER Victor

STINNER Victor  added the comment:

> I don't think that we can fix this bug, sadly. But I'm happy to see that the 
> PEP 538 and PEP 540 are already useful!

Oops, I mean "we cannot *close* this bug" (right now). Sorry.

I mean that IMHO we still have to fix the bug.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20329] zipfile.extractall fails in Posix shell with utf-8 filename

2017-12-14 Thread STINNER Victor

STINNER Victor  added the comment:

> I created an environment under 3.3.1 in which this error was still occurring, 
> but within that same environment, it is not occurring for 3.7.  I believe 
> this can be closed.

Python 3.7 now uses the UTF-8 encoding when the LC_CTYPE locale is POSIX (PEP 
538, PEP 540). You should still be able to reproduce the bug with a locale with 
an encoding different than UTF-8.

Moreover, I understand that Python 3.6 is still affected by the bug.

I don't think that we can fix this bug, sadly. But I'm happy to see that the 
PEP 538 and PEP 540 are already useful!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20329] zipfile.extractall fails in Posix shell with utf-8 filename

2017-12-14 Thread Cheryl Sabella

Cheryl Sabella  added the comment:

I created an environment under 3.3.1 in which this error was still occurring, 
but within that same environment, it is not occurring for 3.7.  I believe this 
can be closed.

--
nosy: +csabella

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20329] zipfile.extractall fails in Posix shell with utf-8 filename

2014-02-27 Thread Laurent Mazuel

Laurent Mazuel added the comment:

Thank for your answer.

Unfortunately, I cannot test easily python 3.4 for now. But I have downloaded 
the source code and diff from 3.3 to 3.4 the zipfile module and see no 
difference relating to this problem. I can be wrong, maybe if some core 
improvement of Python may change something?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20329
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20329] zipfile.extractall fails in Posix shell with utf-8 filename

2014-01-22 Thread R. David Murray

R. David Murray added the comment:

Believe me, we are *well* aware of the issue that linux stores filenames as 
bytes.

I agree that the inability to always transcode is an issue.  That's why I'd 
like the opinion of someone who has studied this problem in more depth.

--
nosy: +ncoghlan

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20329
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20329] zipfile.extractall fails in Posix shell with utf-8 filename

2014-01-22 Thread Nick Coghlan

Nick Coghlan added the comment:

The POSIX locale tells Python 3 to use ASCII for all operating system 
interfaces, including the standard streams. This is an antiquated behaviour in 
the POSIX spec that Python 3 doesn't currently work around.

Issue 19977 is a proposal to work around this limitation by default.

As an immediate workaround, it's possible to either set PYTHONIOENCODING 
explicitly so Python ignores the incorrect encoding claims from the OS, or else 
to do your own encoding and write directly to the sys.stdout.buffer binary 
interface.

Python 3.4 also allows setting *just* the default error handler for the 
streams, while still getting the encoding from the OS.

--
resolution:  - duplicate
status: open - closed
superseder:  - Use surrogateescape error handler for sys.stdin and 
sys.stdout on UNIX for the C locale

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20329
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20329] zipfile.extractall fails in Posix shell with utf-8 filename

2014-01-22 Thread Nick Coghlan

Changes by Nick Coghlan ncogh...@gmail.com:


--
superseder: Use surrogateescape error handler for sys.stdin and sys.stdout on 
UNIX for the C locale - 

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20329
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20329] zipfile.extractall fails in Posix shell with utf-8 filename

2014-01-22 Thread Nick Coghlan

Nick Coghlan added the comment:

My apologies, I completely misread the issue and thought it was related to 
displaying file names, rather than opening them.

I believe Python 3.4 includes some changes in this area - are you in a position 
to retry this on the latest 3.4 beta release?

--
resolution: duplicate - 
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20329
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20329] zipfile.extractall fails in Posix shell with utf-8 filename

2014-01-21 Thread Laurent Mazuel

New submission from Laurent Mazuel:

Hello,

Considering a zip file which contains utf-8 filenames (as uploaded zip file), 
the following code fails if launched in a Posix shell.

 with zipfile.ZipFile(test_ut8.zip) as fd:
... fd.extractall()
... 
Traceback (most recent call last):
  File stdin, line 2, in module
  File /opt/python/3.3/lib/python3.3/zipfile.py, line 1225, in extractall
self.extract(zipinfo, path, pwd)
  File /opt/python/3.3/lib/python3.3/zipfile.py, line 1213, in extract
return self._extract_member(member, path, pwd)
  File /opt/python/3.3/lib/python3.3/zipfile.py, line 1276, in _extract_member
open(targetpath, wb) as target:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-14: 
ordinal not in range(128)

With shell:
$ locale
LANG=POSIX
...

But filesystem is not encoding dependant. On a Unix system, filename are only 
bytes, there is no reason to refuse to unzip a zip file (in fact, unzip 
command line don't fail to unzip the file in a Posix shell).

Since open can take bytes filename, changing the line 1276 from
 open(targetpath)
to:
 open(targetpath.encode(utf-8))

fixes the problem.

zipfile should not care about the encoding of the filename and should use the 
bytes sequence filename extracted directly from the bytes sequence of the 
zipfile. Having ZipInfo.filename as a string (and not bytes) is great for an 
API, but is not needed to open/write a file on the disk. Then, ZipInfo should 
store the direct bytes sequences of filename as a bytes_filename field and 
use it in the open of extract.

In addition, considering the patch of bug 10614, the right patch could use the 
new ZipInfo.encoding field:
 open(targetpath.encode(member.encoding))

--
components: Extension Modules
files: test_ut8.zip
messages: 208648
nosy: Laurent.Mazuel
priority: normal
severity: normal
status: open
title: zipfile.extractall fails in Posix shell with utf-8 filename
type: behavior
versions: Python 3.3
Added file: http://bugs.python.org/file33589/test_ut8.zip

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20329
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20329] zipfile.extractall fails in Posix shell with utf-8 filename

2014-01-21 Thread R. David Murray

R. David Murray added the comment:

If you live in a current-posix world, this might make sense.  However, one can 
also argue that the filename should be *transcoded* from the tarfile encoding 
to the local FS filename encoding, which I believe is what we are currently 
doing.  Which, if you are using POSIX as the locale, will fail a lot.  If you 
use a sensible modern locale that includes utf-8, you wouldn't have a problem.

Unfortunately, the reality is probably that sometimes you want one behavior and 
sometimes you want the other :(

Encoding using member.encoding is probably wrong, though.  If you are trying to 
preserve the original bytes, is is probably best do so, and not assume that the 
tarfile encoding field is valid.

I'm adding Victor Stinner to nosy: he's thought about these issues much more 
deeply than I have.  The answer may be that we will only support transcoding 
filenames in our tarfile module...and certainly it looks like doing anything 
else, even if we want to, would be a new feature.

--
nosy: +haypo, r.david.murray

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20329
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20329] zipfile.extractall fails in Posix shell with utf-8 filename

2014-01-21 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20329
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20329] zipfile.extractall fails in Posix shell with utf-8 filename

2014-01-21 Thread Laurent Mazuel

Laurent Mazuel added the comment:

Thanks for your answer.

I think you can't transcode internal zip filenames to FS encoding. Actually, in 
Unix the FS only stores bytes for filename, there is no FS encoding. Then, if 
you change your locale, the filename printed will change too in your console. 
If you transcode filename using the current locale, unzipping twice the same 
file with two different locales will lead to two different files, which is not 
(I think) you are intending for.
The problem will not arise in Windows (NTFS is UTF-16) nor MAC OSX (UTF-8)

Moreover, a simple unzip works like a charm. It doesn't care about encoding 
or current locale and extract the file using the initial bytes in the zip. 
Unzipping twice with the two different locales creates only one file.

An interesting link (even if it is not an official reference):
http://unix.stackexchange.com/questions/2089/what-charset-encoding-is-used-for-filenames-and-paths-on-linux

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20329
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com