On Tue, Oct 27, 2015 at 7:00 AM, David Cournapeau <courn...@gmail.com> wrote: > > > On Tue, Oct 27, 2015 at 1:12 PM, Daniel Holth <dho...@gmail.com> wrote: >> >> The drawback of .zip is file size since it compresses each file >> individually rather than giving the compression algorithm a larger input, >> it's a great format otherwise. Ubiquitous including Apple iOS packages, >> Java, word processor file formats. And most Python packages are small. > > > I don't really buy the indexing advantages, especially w/ the current > implementation of zipfile in python (e.g. loading the whole set of archives > at creation time)
Can you elaborate about what you mean? AFAICT from a quick skim of the source code, zipfile does eagerly read in the table of contents for the zip file (i.e., it reads out the list of files and their metadata), but no actual files are decompressed until you ask for them individually, and when you do request a specific file then it can be accessed in O(1) time. This is really different from .tar.gz, where you have to decompress the entire archive just to get a list of files, and then you need to decompress the whole thing again each time you want to access a single file inside. (Regarding the size thing, yeah, .tar.gz is smaller, and .tar.bz2 smaller than that, and .tar.xz smaller again, ... but this doesn't strike me as an argument for throwing up our hands and leaving the choice to individual projects, because it's not like they know what the optimal trade-off is either. IMO we should pick one, and zip is Good Enough.) -n -- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig