[issue16569] Preventing errors of simultaneous access in zipfile

2012-12-03 Thread Stepan Kasal

Stepan Kasal added the comment:

I agree that reading from a file open for write should be forbidden, no matter 
whether ZipFile was called with fp or a name.

Actually, it is not yet forbidden, and two of the tests in the zipfile.py test 
suite do actually rely on this misfeature.
The first chunk in the patch 
http://bugs.python.org/file24624/Proposed-fix-of-issue14099-second.patch 
contains a fix for this bug in test suite.

OTOH, decompressing several files for a given zip file simultaneously does not 
sound that bad.  You know, with all the current file managers, people look at a 
zip as if it were kind of a directory.

--
nosy: +kasal

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16569
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14099] ZipFile.open() should not reopen the underlying file

2012-12-03 Thread Stepan Kasal

Stepan Kasal added the comment:

Re: children counting

You need to know the number of open children and whether the parent ZipFile 
object is still open.
As soon as both all children and the parent ZipFile are closed, the underlying 
fp (corresponding to the file name given initially) shall be closed as well.

The code submitted in the patch ensures that.  But other implementations are 
possible.

In any case, it is necessary to ensure that the children stay usable even if 
the parent ZipFile is closed, because of code like this:

def datafile(self):
with ZipFile(self.datafilezip, r) as f:
return f.open(data.txt)

This idiom currently works and should not be broken.

Re: seek()

The read can interfere not only with a parallel file expansion, but also with a 
ZipFile metadata read (user can list the contents of the zip again).  Both of 
these would have to be forbidden by the documentation, and, ideally, also 
enforced.  (As disscussed issue #16569)

OTOH, zipfile.py is already slow, because the decompression is implemented in 
Python as interpreted code.  I guess that the slowdown by seek() is neglectable 
compared to this.
Also note that we most often seek to the current position; the OS should notice 
that and return swiftly.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14099
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16569] Preventing errors of simultaneous access in zipfile

2012-12-03 Thread Stepan Kasal

Stepan Kasal added the comment:

 but I'm afraid it's impossible to do without performance regression due to 
 seek before every read.

I agree that this is key question.

I would hope that the performance hit wouldn't be so bad, unless there are 
actually two decompressions running concurrently.
So we can have an implementation that is generally correct, though some use 
scenarios result in slow execution.

OTOH, if the seek() call were a problem even if the new position is the same as 
the old one, they can be optimized out by a simple wrapper around fp.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16569
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14099] ZipFile.open() should not reopen the underlying file

2012-12-03 Thread Stepan Kasal

Stepan Kasal added the comment:

I'm not sure when I'll get to this, sorry.
Hopefully sometime soon.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14099
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14099] ZipFile.open() should not reopen the underlying file

2012-02-24 Thread Stepan Kasal

Stepan Kasal ka...@ucw.cz added the comment:

Attached please find a second iteration of the fix.
This time the signature of ZipExtFile is kept backward compatible, with one new 
parameter added.

--
Added file: 
http://bugs.python.org/file24624/Proposed-fix-of-issue14099-second.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14099
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14099] zipfile: ZipFile.open() should not reopen the underlying file

2012-02-23 Thread Stepan Kasal

New submission from Stepan Kasal ka...@ucw.cz:

When a file inside a zip is open, the underlying zip file is open again.
(Unless the file name is unknown, because the ZipFile object was created with 
fp only.)

This design is incorrect, insecure, and ineffective:
- the reopen uses the same string as file name, but on unix-like systems that 
file name may no longer exist, or may point to a different file
- opening n files from the same zip archive consumes n OS file descriptors, 
wasting resources

I believe that the parent ZipFile object and all the child ZipExtFile objects 
should keep the same fp.  The last one would close it.

I'm working on a patch currently.

--
components: Library (Lib)
messages: 154058
nosy: kasal
priority: normal
severity: normal
status: open
title: zipfile: ZipFile.open() should not reopen the underlying file
type: resource usage
versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14099
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14099] zipfile: ZipFile.open() should not reopen the underlying file

2012-02-23 Thread Stepan Kasal

Stepan Kasal ka...@ucw.cz added the comment:

Attached please find a patch that fixes this issue by reusing the original fp 
from ZipFile object.

Two of the test cases attempted to read a file from a zip as soon as write() 
was called.  I believe that this is not correct usage: zip file is not even 
fully written to disk at that stage!
So I took the liberty to change these two test cases so that they first write 
the file and then read it.

Let me thank to Martin Sikora for discovering the issue and to Matej Cepl for 
testing it on current source tree.

--
keywords: +patch
Added file: http://bugs.python.org/file24617/Proposed-fix-of-issue14099.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14099
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com