Control: retitle -1 dpkg: Please add decompression support for zstd
(Zstandard) compressed packages

Hi Guillem,

On Fri, Apr 20, 2018 at 5:04 AM Guillem Jover <guil...@debian.org> wrote:
>
> Hi!
>
> On Wed, 2018-04-18 at 11:56:27 +0200, Balint Reczey wrote:
> > On Mon, Apr 16, 2018 at 3:51 AM, Balint Reczey
> > <balint.rec...@canonical.com> wrote:
> > > On Sun, Mar 18, 2018 at 3:38 AM, Guillem Jover <guil...@debian.org> wrote:
> > >> On Sun, 2018-03-11 at 21:51:05 +0100, Balint Reczey wrote:
> > >>> Package: dpkg
> > >>> Version: 1.19.0.5
> > >>> Severity: wishlist
> > >>> Tags: patch
> > >>
> > >>> Please add support for Zstandard compression to dpkg and other
> > >>> programs generated by the dpkg source package [1].
> > >>
> > >> Thanks. I started implementing this several weeks ago after having
> > >> discussed it with Julian Andres Klode on IRC, but stopped after seeing
> > >> the implementation getting messy given the current code structure.
> > >
> > > I think it is not that bad. :-)
>
> Well, that file is a mess already. :)
>
> > >> So, the items that come to mind (most from the dpkg FAQ [F]:
> > >>
> > >> * Availability in general Unix systems would be one. I think the code
> > >>   should be portable, but I've not checked properly.
> > >
> > > The libzstd package does not have any special dependency and there are
> > > packages for other Unix-like systems [2][3][4].
>
> Right, as suspected, but it's nice to get confirmation, thanks.
>
> > >> * Size of the shared library another, it would be by far the fattest
> > >>   compression lib used by dpkg. It's not entirely clear whether the
> > >>   shlib embeds a zlib library?
> > >
> > > I agree that the libzstd library is fairly big and I'd like to look
> > > into ways of making it leaner, maybe creating a variant with limited
> > > features covering what is needed in dpkg, apt, btrfs-progs and other
> > > system packages.
>
> That could be an option, ideally sanctioned by upstream to avoid a
> perpetual fork, and possible divergence from upstream format, encoding,
> etc.
>
> > > It does not seem to embed the zlib library, but it offers many
> > > features which may be obsolete for dpkg.
> > >
> > > I tried dropping support for legacy file formats for example
> > > (ZSTD_LEGACY_SUPPORT=8) and the size of the library dropped to 382K
> > > from the original 490K.
>
> Still a pretty fat. :)
>
> > >> * Increase in the (build-)essential set (directly and transitively).
> > >
> > > Yes, that's true, while apt also started supporting Zstd and .
>
> apt is not part of the essential-set though.
>
> > >> * It also seems the format has changed quite some times already, and
> > >>   it's probably the reason for the fat shlib. Not sure if the format
> > >>   has stabilized enough to use this as good long-term storage format,
> > >>   and what's the policy regarding supporting old formats for example,
> > >>   given that this is intended mainly to be used for real-time and
> > >>   streaming content and similar. For example the Makefile for libzstd
> > >>   defaults to supporting v0.4+ only, which does not look great.
> > >
> > > Format stability is a very valid concern and upstream claims the
> > > current format to be stable [5] (since zstd v0.8.1).
>
> I understand that to mean the current format will not change, but what
> will happen when and iff a new format is needed/wanted, what's their
> stability guarantees, etc.? As I mentioned, one thing is to target
> streaming compression, the other long-term storage; the time-frames
> expected from each of those might be completely opposite.
>
> > >> * The license seems fine, as being very permissive, or it could affect
> > >>   availability. This one I need to add to the FAQ.
> > >> * Memory usage seemed fine or slight better depending on the compression
> > >>   level, but not when wanting equal or less space used?
> > >> * Space used seemed worse.
> > >
> > > Yes, space used is worse than with xz compression, but I think the
> > > much better compression and decompression speed would make up for
> > > that.
>
> That still depends at least on the local hardware used and on the
> network speed.
>
> > >> * Compression and decompression speed seemed better depending on the
> > >>   compression and decompression levels.
> > >>
> > >> [F] 
> > >> <https://wiki.debian.org/Teams/Dpkg/FAQ#Q:_Can_we_add_support_for_new_compressors_for_.deb_packages.3F>
> > >>
> > >> Overall I'm still not sure whether this is worth it. Also the
> > >> tradeoffs for stable are different to unstable/testing, or for
> > >> fast/slow networks, or long-term storage, one-time installations,
> > >> or things like CI and similar.
> > >>
> > >> In any case this would still need discussion on debian-devel, and
> > >> involvement from other parts of the project, at least ftp-masters for
> > >> example. And whether the added "eternal" support makes sense if we are
> > >> or not planning to eventually switch to the compressor as the default,
> > >> for example, etc.
> > >
> > > I agree that the tradeoffs are very different for the use cases and
> > > please feel free to bring this topic to debian-devel quoting any part
> > > of my emails.
>
> I'll try to do that probably tomorrow. I'll probably start the
> conversation and CC you guys, so that you can chime in and fill in any
> blanks/details you want to provide.
>
> > >>> $ rm -rf firefox-xz/* ;time  dpkg-deb -R firefox-xz.deb firefox-xz/
> > >>> real 0m4,270s
> > >>> user 0m4,220s
> > >>> sys 0m0,630s
> > >>> $ rm -rf firefox-zstd/* ;time  dpkg-deb -R firefox-zstd.deb 
> > >>> firefox-zstd/
> > >>> real 0m0,765s
> > >>> user 0m0,556s
> > >>> sys 0m0,462s
> > >>
> > >> Right, although that might end up being noise when factored into a
> > >> normal dpkg installation, due to the fsync()s, or maintscript
> > >> execution, etc.
> > >
> > > I agree that fsync()s and scripts add more time overall to the
> > > installation time, but fsync()'s effect is decreasing with faster
> > > storage. I would like to look into speeding up maintscript execution.
>
> Well the best way to speed-up maintscripts is to completely get rid of
> them. :)
>
> > > I need to reproduce my results on latest sid, but  when installing big
> > > package-sets like ubuntu-desktop on a server VM the package
> > > decompression was the biggest (~40%) contributor to CPU utilization
> > > and to make a meaningful improvement in that area I think switching to
> > > a very fast decompressor is neccessary.
>
> CPU utilization does not mean the overall time might be better or worse,
> as I'd assume the biggest amount of time here will come from I/O anyway.
>
> > > I think the biggest price we have to pay here is the slower download
> > > of the somewhat bigger compressed packages, but IMO the real solution
> > > here is rolling out DeltaDebs [6] support, which is planned to be an
> > > improvement over debdelta [7]. DeltaDebs could save around 90% of
> > > bandwidth - or download time needed for packages.
> > >
> > > Since DeltaDeb generation also involves decompression, Zstd could
> > > speed this up, too.
>
> Right, and while I think the idea is very nice. It still needs to be
> seen if for example Debian would be interested in providing those at
> all, or if there would be interest for what suites or similar given
> the increased mirror usage, etc. I'd not like to tie a decission on
> this on something that might or might not happen.
>
> > >>> Tests on the full Ubuntu main archive showed ~6% average increase in
> > >>> the size of the binary packages.
> > >>
> > >> What about the total increase? Because it's not the same say a 15%
> > >> increase in a 500 MiB .deb, than a 2% in a 100 KiB one obviously. :)
> > >
> > > Yes, I was not clear enough in my email, the total increase was 6%.
>
> Ok, not that bad then I guess.
>
> > >> In theory the proper way to introduce this is to first enable
> > >> decompression and then after a full stable release cycle add compression
> > >> support.
> > >
> > > I'm OK with that. I'm attaching the updated patch which I also
> > > uploaded to Salsa, addressing your review comments. Feel free to
> > > disable compression in a way you please, if you do it in a separate
> > > commit it could be easily reverted later to enable compression.
> >
> > By writing "Feel free to disable compression ..." I did not want to
> > mean pushing the work on you and I would be happy to write that patch
> > if you think it is needed. I agree that in theory disabling
> > compression initially would be the cleanest way forward, but
> > compression left enabled would help running tests with the new
> > compression more easily on the other hand.
> >
> > IMO the real commitment to the format starts with shipping the first
> > zstd-compressed packages in the official archive and this won't happen
> > without DSA team upgrading the infrastructure to accept binary
> > packages using that compression.
> >
> > Also note that people can already hand-craft zstd compressed packages
> > the way it is done in the added dpkg test.
> >
> > What do you think? Should compression be disabled initially and if so
> > would you like me to write the patch for it?
>
> Unfortunately that's not how things work with dpkg. This is a tool
> being used beyond Debian (and Ubuntu), so once upstream contains
> support for just unpacking, then that means the commitment is already
> there, because we have always supported hand-crafting .debs from
> "standard" tools, and that's why the format has always been documented
> in detail.
>
> This is the same reason why the lzma compression format is still
> supported to decompress-only (compression has been obsoleted), while
> it was never accepted in the Debian archive. People should be able to
> unpack old/historic packages with current dpkg-deb. It even still
> supported building and extracting format 0.939000 .deb archives!
>
> If downstreams patch support in for other compressors, I think it might
> be a bit irresponsible as it creates unnecessary compatibility problems
> in the .deb ecosystem. But then, this is free software and people can
> patch in whatever they want. And in the end I consider it not really my
> problem. ;) Some derivative for example added .lz support in, which has
> never been accepted upstream, of course that's not at the same scale
> as Ubuntu, so in this case this might get worse if it ends up not
> making sense to add the support upstream, but oh well.
>
> Disabling compression, is mostly to make sure no builds accidentally
> try to use it while it's too soon (mostly in Debian), or on the wrong
> suite, once and iff DAK accepts them for example. That obviously would
> not prevent people from adding the support in other generators, or
> hand-crafting .deb's.

Sure, I have updated the patch to disable compression and also apply
to the latest dpkg.

I'm wondering if decompression support could be accepted for Bullseye,
to let compression being enabled, too, in Bookworm.

The compression ratios and the compressed sizes did not change much
since my last tests, but many projects adopted zstd support and many
frequently installed packages started depend on it including gcc-10
and libsystemd0:

buster: $ apt-cache rdepends libzstd1 | wc -l
40
sid: $ apt-cache rdepends libzstd1 | wc -l
105

Libzstd did not get leaner, but it will be available on most systems
thus dpkg depending on it would not increase the image/filesystem
size.

The file format stayed stable and other distributions like Fedora and
Arch already adopted it as the default:
https://fedoraproject.org/wiki/Changes/Switch_RPMs_to_zstd_compression
https://archlinux.org/news/now-using-zstandard-instead-of-xz-for-package-compression/

They use different compression levels, Fedora: 19 Arch: 20 (with
--ultra), but the level can be decided later for Bookworm.
If you are already convinced and add decompression support for
Bullseye that's great and thank you, if not, please share your
remaining concerns.

Cheers,
Balint

PS: Sorry for picking this up so close to the freeze.

> > > There is already a thread on ubuntu-devel if you are interested [8].
> > >
> > > I'm wondering if you are OK with the proposed .deb format (extensions,
> > > etc.), because Ubuntu is very close to releasing 18.04 and if we could
> > > agree on at least the package format Ubuntu's dpkg could add
> > > decompression zstd support without risking diverging later from
> > > Debian.
>
> The extension looks fine, that's the standard one used by upstream,
> and the one I used too when starting the implementation. When it comes
> to the .deb format itself, as I've mentioned before, I make no guarantees
> this might get accepted upstream. So the divergence is potentially
> something you might (or might not) need to carry forward and possibly
> might need/want (or not) to unwind yourselves.
>
> Thanks,
> Guillem
--
Balint Reczey
Ubuntu & Debian Developer

Reply via email to