> On Jun 15, 2017, at 4:55 PM, Brett Cannon <br...@python.org> wrote: > > > > On Thu, 15 Jun 2017 at 08:25 Donald Stufft <don...@stufft.io > <mailto:don...@stufft.io>> wrote: > >> On Jun 15, 2017, at 10:10 AM, C Anthony Risinger <c...@anthonyrisinger.com >> <mailto:c...@anthonyrisinger.com>> wrote: >> >> On Tue, Jun 13, 2017 at 8:53 PM, Nick Coghlan <ncogh...@gmail.com >> <mailto:ncogh...@gmail.com>> wrote: >> On 13 June 2017 at 19:44, Thomas Kluyver <tho...@kluyver.me.uk >> <mailto:tho...@kluyver.me.uk>> wrote: >> > On Tue, Jun 13, 2017, at 02:27 AM, Nick Coghlan wrote: >> >> > I've updated the PR to specify zip archives for build_wheel and .tar.gz >> > archives for build_sdist. >> >> +1 >> >> I've added one suggestion, which is to explicitly require PAX_FORMAT >> for the sdist tarballs produced this way (that's a POSIX format >> standardised in 2001 and supported by both 2.7 and 3.x that >> specifically requires that the paths be encoded as UTF-8). While the >> standard library does still default to GNU_FORMAT in general, the >> stated rationale for doing so (it being more widely supported than >> PAX_FORMAT) was last updated more than 10 years ago, and I don't think >> it applies here. >> >> I'm not trying to open a bikeshedding opportunity here -- and I tried to >> ignore it, honest! -- but why are tarballs preferable to zipfiles for sdists? >> >> I looked around the 517 threads to see if it had been covered already, and >> all I found was that zipfiles have additional PKG-INFO expectations in >> existing implementations, and other honorable mentions of their features >> over tarballs. >> >> I've never understood the anti-affinity towards zip because the format >> itself seems superior in many ways, such as the ability to easily append or >> replace-via-append (which might actually help perf when being used as an >> interchange format, with a repack/prune at the end), compress individual >> files, and the brilliance of placing the central directory/manifest at the >> end, allowing it to be appended to binaries, etc. and allowing rapid >> indexing of files. Tarballs are a black box. >> >> Just seems a little odd/arbitrary to me that wheel is zip, python supports >> zip importing, sdists are often zip, and Windows is zip-central, but we'd >> decide to codify tar.gz. It doesn't affect me personally because I'm Linux >> all the way down and barely remember how to use Windows, but with all the >> existing zip usage, and technical superiority(?), if we are going to pick >> something, why not that? At that point Python is all-zip and no-tar. > > > Yeah, the inconsistency bugs me as well and I was about to email about this > until this started up. :) > >> >> It's not a strong opinion really, but since the PEP does attempt to limit >> what's currently possible, can we add some verbiage as to why tar.gz is >> preferred? Or consider it with more scrutiny? >> > > > Basically it’s the least disruptive option, the vast bulk of sdists are using > ``.tar.gz`` already, multiple downstream redistributors need to do extra work > to consume a .zip rather than a .tar.gz, and the technical benefits of wheel > don’t really matter much in the context of a sdist. Zip isn’t a flat out win > technical wise either, for instance .tar.gz can compress smaller than a .zip > file because it’s compression will act over the entire set of files and not > on a per file basis. > > Then shouldn't we be pushing for .tar.xz instead? (The Rust community is > actually moving to .tar.xz for distributing Rust itself: > https://users.rust-lang.org/t/rustup-1-4-0-released/11268 > <https://users.rust-lang.org/t/rustup-1-4-0-released/11268> ; I don't know > what their plans are for crates)
Absent all other rationale, yes pushing for .tar.xz would be better (as would using the ZIP_LZMA option). However, that would cut out support for Python 2.7 by default and it would require an optional library to exist in Python’s standard library (.tar.gz and .zip does too, but that’s zlib which is near ubiquitous). It’s a balancing act, and a format that something like 80-90% of the downloads from PyPI couldn’t support is way out of balance. > > > But mostly it’s just that most sdists are .tar.gz, and most Pythons except > older ones on Windows default to producing .tar.gz. > > Well, I've been actively overriding that default and uploading only .zip > files since it's so much easier to work with zip files on Windows and UNIX > than tar.gz which are only easy on UNIX. :) And it is rather ironic that > historically projects lack a Windows wheel and thus need the sdist and yet > it's typically in a format that's the most painful to work with on Windows. Eh, if you have Python it’s not very hard to work with .tar.gz files, ``python3 -m -e foo-1.0.tar.gz`` etc. Honestly though, any argument about the technical merits of one or the other basically pales to the real argument, in that standardizing around .tar.gz means the least amount of change in the ecosystem. We have 591,653 .tar.gz sdists uploaded but only 68,816 .zip using sdists. (Or 86,845 // 13,701 for unique projects) [1]. It’s more disruptive to tell the 86k projects that they need to change instead of the 13k, and absent any really compelling reason that isn’t just relatively minor benefits, reducing churn should be our number one concern. [1] We can play with these numbers in lots of different way, like for instance in the past year 36,687 projects have released a .tar.gz using sdist but only 3,719 have released a .zip using sdist. At the end of the day though, unless you’re really trying all the numbers end up amounting to roughly an order of magnitude more projects/files/whatever are using .tar.gz than .zip. — Donald Stufft
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig