[issue16569] Preventing errors of simultaneous access in zipfile
Stepan Kasal added the comment: I agree that reading from a file open for write should be forbidden, no matter whether ZipFile was called with fp or a name. Actually, it is not yet forbidden, and two of the tests in the zipfile.py test suite do actually rely on this misfeature. The first chunk in the patch http://bugs.python.org/file24624/Proposed-fix-of-issue14099-second.patch contains a fix for this bug in test suite. OTOH, decompressing several files for a given zip file simultaneously does not sound that bad. You know, with all the current file managers, people look at a zip as if it were kind of a directory. -- nosy: +kasal ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16569 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14099] ZipFile.open() should not reopen the underlying file
Stepan Kasal added the comment: Re: children counting You need to know the number of open children and whether the parent ZipFile object is still open. As soon as both all children and the parent ZipFile are closed, the underlying fp (corresponding to the file name given initially) shall be closed as well. The code submitted in the patch ensures that. But other implementations are possible. In any case, it is necessary to ensure that the children stay usable even if the parent ZipFile is closed, because of code like this: def datafile(self): with ZipFile(self.datafilezip, r) as f: return f.open(data.txt) This idiom currently works and should not be broken. Re: seek() The read can interfere not only with a parallel file expansion, but also with a ZipFile metadata read (user can list the contents of the zip again). Both of these would have to be forbidden by the documentation, and, ideally, also enforced. (As disscussed issue #16569) OTOH, zipfile.py is already slow, because the decompression is implemented in Python as interpreted code. I guess that the slowdown by seek() is neglectable compared to this. Also note that we most often seek to the current position; the OS should notice that and return swiftly. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14099 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16569] Preventing errors of simultaneous access in zipfile
Stepan Kasal added the comment: but I'm afraid it's impossible to do without performance regression due to seek before every read. I agree that this is key question. I would hope that the performance hit wouldn't be so bad, unless there are actually two decompressions running concurrently. So we can have an implementation that is generally correct, though some use scenarios result in slow execution. OTOH, if the seek() call were a problem even if the new position is the same as the old one, they can be optimized out by a simple wrapper around fp. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16569 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14099] ZipFile.open() should not reopen the underlying file
Stepan Kasal added the comment: I'm not sure when I'll get to this, sorry. Hopefully sometime soon. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14099 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14099] ZipFile.open() should not reopen the underlying file
Stepan Kasal ka...@ucw.cz added the comment: Attached please find a second iteration of the fix. This time the signature of ZipExtFile is kept backward compatible, with one new parameter added. -- Added file: http://bugs.python.org/file24624/Proposed-fix-of-issue14099-second.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14099 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14099] zipfile: ZipFile.open() should not reopen the underlying file
New submission from Stepan Kasal ka...@ucw.cz: When a file inside a zip is open, the underlying zip file is open again. (Unless the file name is unknown, because the ZipFile object was created with fp only.) This design is incorrect, insecure, and ineffective: - the reopen uses the same string as file name, but on unix-like systems that file name may no longer exist, or may point to a different file - opening n files from the same zip archive consumes n OS file descriptors, wasting resources I believe that the parent ZipFile object and all the child ZipExtFile objects should keep the same fp. The last one would close it. I'm working on a patch currently. -- components: Library (Lib) messages: 154058 nosy: kasal priority: normal severity: normal status: open title: zipfile: ZipFile.open() should not reopen the underlying file type: resource usage versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14099 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14099] zipfile: ZipFile.open() should not reopen the underlying file
Stepan Kasal ka...@ucw.cz added the comment: Attached please find a patch that fixes this issue by reusing the original fp from ZipFile object. Two of the test cases attempted to read a file from a zip as soon as write() was called. I believe that this is not correct usage: zip file is not even fully written to disk at that stage! So I took the liberty to change these two test cases so that they first write the file and then read it. Let me thank to Martin Sikora for discovering the issue and to Matej Cepl for testing it on current source tree. -- keywords: +patch Added file: http://bugs.python.org/file24617/Proposed-fix-of-issue14099.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14099 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com