Gregory P. Smith <g...@krypto.org> added the comment:

ZipFile.open() is not the code for opening a zip file. :)

That's the code for opening a file embedded within an already constructed 
mode='r' archive as done the ZipFile.__init__() constructor.  By the time 
you've gotten to the open() method, you've loaded the entire unbounded in size 
central directory into memory as ZipInfo objects [constructor] and are checking 
signature of an individual file header you are attempting to read out of the 
archive.

Follow the ZipFile() constructor, it calls ZipFile._RealGetContents() which is 
the true start of parsing the archive.  
https://github.com/python/cpython/blob/master/Lib/zipfile.py#L1317

Sure, more and more steps can be done.  But if you want to do that, you may as 
well just get rid of is_zipfile() entirely - a functions who's point is to be 
fast and not consume an amount of memory determined by the input data - and 
have people just call `zipfile.ZipFile(path_in_question, mode='r')` and live 
with the consequences of attempting to load and parse the whole thing.  If that 
doesn't raise an exception, it is more likely to be a zip file.  But that could 
still raise an exception when trying to open each of the files inside, so you'd 
need to iterate over this and open those and make sure they're valid.

is_zipfile() isn't a verify_zipfile_integrity() routine.  Just a quick best 
guess.

is_zipfile() cannot be perfect and is not intended to be.  There is always 
going to be yet another thing it _could_ try.  It isn't worth chasing the 
impossible goal and making it not be fast.

Just update the is_zipfile() docs.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue42096>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to