On 01/11/2017 06:29 PM, Assaf Gordon wrote: > Hello Eric and all, > >> On Wed, Jan 11, 2017 at 3:21 PM, Eric Blake <[email protected]> wrote: >>> there are strong arguments for including .lzip >>> distributions, either in addition or in place of .xz: > > I would ask at least to keep XZ and not switch solely to lzip.
I certainly agree with this point - it is too early to make lzip the sole distribution format. The question is whether shipping BOTH xz and lzip, and letting people choose which format they prefer, is worth the extra effort (which, as shown by this patch is not that much - a one-line addition to automake's options, and the possible installation of lzip on the maintainer's machines if it was not already there). I know there was a window of time where we were shipping .gz, .bz2, and .xz, while waiting for distros to catch up; now that .xz is a lot more widely supported, we were able to easily justify dropping .bz2 due to clear differences in levels of compression, and effort required to uncompress. But xz and lzip are much closer in levels of compression, making that less of a clear winner. I don't know if there is any (easy) way to count how many downloads of .gz vs. .xz happen, to get a feel for what percentages of the consumption prefers a particular format; if such metrics exist it would also let us track how popular .lzip turns out to be during a trial of running both xz and lzip tarballs in parallel. But most likely that's a pipe dream, as GNU encourages the use of mirrors for getting tarballs, making it harder to centrally track what got downloaded where. > > While I am in no position to evaluate lzip benchmark/robustness/format claims, I'm also in this boat. I merely proposed the patch as an RFC to start the discussion, so I appreciate the points being made. > I do have some concerns about the lzip program: > > First, > It is written in C++. Not a problem by itself, but seems a bit at odds as a > requirement for system-level package like coreutils. coreutils depends on gperf, which is written in C++. Then again, the dependency on gperf is at maintainer time (it does not have to be present on the tarball user's machine). Having lzip as the ONLY distribution format is a very strong burden on ALL downstream users to have lzip installed (unlike the gperf case); but having lzip and xz in parallel means that users that can't build lzip can use the xz tarball. So that strengthens my claim that this patch (if taken) is additive, and not replacement, in nature. > > Second, > I'm not sure how portable and well-tested the program is on the large number > of platforms that coreutils aim to cater to. > Being a C++ program, I'm not even sure if all these system could easily build > it or provide it as package. That's certainly a valid point against a sole distribution format, but doesn't rule it out as a parallel distribution format. > > Third, > I'm a bit wary of the closed development model: there is no public git > repository, only published tarballs, and not clear how active the development > or the community are. For years, GNU bash had a very closed development model. Only recently has the bash maintainer started posting weekly snapshots via a git repository (by no means as fine-grained as most git projects are used to having), so that is not necessarily a showstopper, but it does bear consideration. And yes, I concur that it is harder to work with software that is harder to clone and tweak. My RFC patch proposal even highlighted the fact that lzip has a CVS repository, but not a git repository, at savannah - I really wanted to point to a git repo but could not quickly find one. > > Lastly, > I think the test suite is a bit lacking, especially compared to all the > claims about recovery and robustness of the lzip format. I have not made any personal investigations on this front. And everything we require of lzip should also be applied to any consideration of whether to use zstd (with the additional hurdle that automake does not yet have a 'dist-zstd' option), since that is another up-and-coming compression format that may or may not have a win in (de)compression speeds and size. > > --- > > I'm not saying 'xz' is perfect or that it answers all the above issues. But > it has a "community buy-in" which can't be denied compared to lzip. Let's stop and consider how much of the community buy-in is a side-effect of coreutils being one of the early adopters of dist-xz? From personal experience, the only reason Cygwin started considering the inclusion of xz in the distro years ago was because the coreutils tarball came in xz; and now Cygwin uses xz for all of its distribution files (it used to use bz2). Then again, lessons learned from Cygwin's switch from bzip2 to xz will help ease any future transition from xz to (lzip/zstd/compression-of-the-day), if such a future switch is warranted. But at the same time, it can take years to prove whether a new format has enough going for it to make it a primary format. That said, if coreutils starts shipping lzip packages, wouldn't that alone be a way to kickstart some more activity on the lzip front? > If coreutils switches, I think it should switch to something that is provably > superior not only in benchmark/robustness. I wrote this email based on an IRC conversation with Matias (selk), mainly because I wanted the discussion archived for public consumption, and not something done in private with just me. At this point, I would really love for selk and/or Antonia to chime in with the arguments I am unable to provide. Private conversations are not the way to instigate change; and even if a public conversation doesn't change the status quo, hopefully it at least raises some talking points and ideas for future improvements. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
