Hi Guillem,

On Mon, Apr 16, 2018 at 3:51 AM, Balint Reczey
<balint.rec...@canonical.com> wrote:
> Hi Guillem
>
> On Sun, Mar 18, 2018 at 3:38 AM, Guillem Jover <guil...@debian.org> wrote:
>> Hi!
>>
>> On Sun, 2018-03-11 at 21:51:05 +0100, Balint Reczey wrote:
>>> Package: dpkg
>>> Version: 1.19.0.5
>>> Severity: wishlist
>>> Tags: patch
>>
>>> Please add support for Zstandard compression to dpkg and other
>>> programs generated by the dpkg source package [1].
>>
>> Thanks. I started implementing this several weeks ago after having
>> discussed it with Julian Andres Klode on IRC, but stopped after seeing
>> the implementation getting messy given the current code structure.
>
> I think it is not that bad. :-)
>
>> In any case, as I mentioned on IRC to Andres, this is something I
>> pondered about already in 2016, when Paul Wise blogged about it, and
>> which I also told Andres about at the time when he was adding lz4
>> support to apt. :) But parked it for later as there were several
>> apparent problems with it at the time.
>>
>> So, the items that come to mind (most from the dpkg FAQ [F]:
>>
>> * Availability in general Unix systems would be one. I think the code
>>   should be portable, but I've not checked properly.
>
> The libzstd package does not have any special dependency and there are
> packages for other Unix-like systems [2][3][4].
>
>> * Size of the shared library another, it would be by far the fattest
>>   compression lib used by dpkg. It's not entirely clear whether the
>>   shlib embeds a zlib library?
>
> I agree that the libzstd library is fairly big and I'd like to look
> into ways of making it leaner, maybe creating a variant with limited
> features covering what is needed in dpkg, apt, btrfs-progs and other
> system packages.
> It does not seem to embed the zlib library, but it offers many
> features which may be obsolete for dpkg.
>
> I tried dropping support for legacy file formats for example
> (ZSTD_LEGACY_SUPPORT=8) and the size of the library dropped to 382K
> from the original 490K.
>
>> * Increase in the (build-)essential set (directly and transitively).
>
> Yes, that's true, while apt also started supporting Zstd and .
>
>> * It also seems the format has changed quite some times already, and
>>   it's probably the reason for the fat shlib. Not sure if the format
>>   has stabilized enough to use this as good long-term storage format,
>>   and what's the policy regarding supporting old formats for example,
>>   given that this is intended mainly to be used for real-time and
>>   streaming content and similar. For example the Makefile for libzstd
>>   defaults to supporting v0.4+ only, which does not look great.
>
> Format stability is a very valid concern and upstream claims the
> current format to be stable [5] (since zstd v0.8.1).
>
>> * The license seems fine, as being very permissive, or it could affect
>>   availability. This one I need to add to the FAQ.
>> * Memory usage seemed fine or slight better depending on the compression
>>   level, but not when wanting equal or less space used?
>> * Space used seemed worse.
>
> Yes, space used is worse than with xz compression, but I think the
> much better compression and decompression speed would make up for
> that.
>
>> * Compression and decompression speed seemed better depending on the
>>   compression and decompression levels.
>>
>> [F] 
>> <https://wiki.debian.org/Teams/Dpkg/FAQ#Q:_Can_we_add_support_for_new_compressors_for_.deb_packages.3F>
>>
>> Overall I'm still not sure whether this is worth it. Also the
>> tradeoffs for stable are different to unstable/testing, or for
>> fast/slow networks, or long-term storage, one-time installations,
>> or things like CI and similar.
>>
>> In any case this would still need discussion on debian-devel, and
>> involvement from other parts of the project, at least ftp-masters for
>> example. And whether the added "eternal" support makes sense if we are
>> or not planning to eventually switch to the compressor as the default,
>> for example, etc.
>
> I agree that the tradeoffs are very different for the use cases and
> please feel free to bring this topic to debian-devel quoting any part
> of my emails.
>
>>
>>> $ rm -rf firefox-xz/* ;time  dpkg-deb -R firefox-xz.deb firefox-xz/
>>> real 0m4,270s
>>> user 0m4,220s
>>> sys 0m0,630s
>>> $ rm -rf firefox-zstd/* ;time  dpkg-deb -R firefox-zstd.deb firefox-zstd/
>>> real 0m0,765s
>>> user 0m0,556s
>>> sys 0m0,462s
>>
>> Right, although that might end up being noise when factored into a
>> normal dpkg installation, due to the fsync()s, or maintscript
>> execution, etc.
>
> I agree that fsync()s and scripts add more time overall to the
> installation time, but fsync()'s effect is decreasing with faster
> storage. I would like to look into speeding up maintscript execution.
>
> I need to reproduce my results on latest sid, but  when installing big
> package-sets like ubuntu-desktop on a server VM the package
> decompression was the biggest (~40%) contributor to CPU utilization
> and to make a meaningful improvement in that area I think switching to
> a very fast decompressor is neccessary.
>
> I think the biggest price we have to pay here is the slower download
> of the somewhat bigger compressed packages, but IMO the real solution
> here is rolling out DeltaDebs [6] support, which is planned to be an
> improvement over debdelta [7]. DeltaDebs could save around 90% of
> bandwidth - or download time needed for packages.
>
> Since DeltaDeb generation also involves decompression, Zstd could
> speed this up, too.
>
>>
>>> Tests on the full Ubuntu main archive showed ~6% average increase in
>>> the size of the binary packages.
>>
>> What about the total increase? Because it's not the same say a 15%
>> increase in a 500 MiB .deb, than a 2% in a 100 KiB one obviously. :)
>
> Yes, I was not clear enough in my email, the total increase was 6%.
>
>>
>> And follows a quick code review, not very deep, given that whether to
>> include support for this is still open.
>>
>>
>>> From 79aad733cbc7edd44e124702f82b8a46a3a4aea9 Mon Sep 17 00:00:00 2001
>>> From: Balint Reczey <balint.rec...@canonical.com>
>>> Date: Thu, 8 Mar 2018 09:53:36 +0100
>>> Subject: [PATCH 1/4] dpkg: Add Zstandard compression support
>>
>>> diff --git a/dpkg-deb/main.c b/dpkg-deb/main.c
>>> index 52e9ce67d..7f898210e 100644
>>> --- a/dpkg-deb/main.c
>>> +++ b/dpkg-deb/main.c
>>> @@ -108,7 +108,7 @@ usage(const struct cmdinfo *cip, const char *value)
>>>  "      --[no-]uniform-compression   Use the compression params on all 
>>> members.\n"
>>>  "  -z#                              Set the compression level when 
>>> building.\n"
>>>  "  -Z<type>                         Set the compression type used when 
>>> building.\n"
>>> -"                                     Allowed types: gzip, xz, none.\n"
>>> +"                                     Allowed types: gzip, xz, zstd, 
>>> none.\n"
>>>  "  -S<strategy>                     Set the compression strategy when 
>>> building.\n"
>>>  "                                     Allowed values: none; extreme 
>>> (xz);\n"
>>>  "                                     filtered, huffman, rle, fixed 
>>> (gzip).\n"
>>
>> In theory the proper way to introduce this is to first enable
>> decompression and then after a full stable release cycle add compression
>> support.
>
> I'm OK with that. I'm attaching the updated patch which I also
> uploaded to Salsa, addressing your review comments. Feel free to
> disable compression in a way you please, if you do it in a separate
> commit it could be easily reverted later to enable compression.

By writing "Feel free to disable compression ..." I did not want to
mean pushing the work on you and I would be happy to write that patch
if you think it is needed. I agree that in theory disabling
compression initially would be the cleanest way forward, but
compression left enabled would help running tests with the new
compression more easily on the other hand.

IMO the real commitment to the format starts with shipping the first
zstd-compressed packages in the official archive and this won't happen
without DSA team upgrading the infrastructure to accept binary
packages using that compression.

Also note that people can already hand-craft zstd compressed packages
the way it is done in the added dpkg test.

What do you think? Should compression be disabled initially and if so
would you like me to write the patch for it?

>
>>
>>> diff --git a/lib/dpkg/compress.c b/lib/dpkg/compress.c
>>> index 44075cdb6..e20add3b7 100644
>>> --- a/lib/dpkg/compress.c
>>> +++ b/lib/dpkg/compress.c
>>> @@ -750,6 +753,127 @@ static const struct compressor compressor_lzma = {
>>>       .decompress = decompress_lzma,
>>>  };
>>>
>>> +/*
>>> + * Zstd compressor.
>>> + */
>>> +
>>> +#define ZSTD         "zstd"
>>> +
>>> +#ifdef WITH_LIBZSTD
>>> +
>>> +static void
>>> +decompress_zstd(int fd_in, int fd_out, const char *desc)
>>> +{
>>> +  size_t const buf_in_size = ZSTD_DStreamInSize();
>>
>> The indentation should be tabs, not spaces here. There are several
>> other problems with spacing and blank lines, etc in other places.
>
> Fixed.
>
>>
>>> +  void*  const buf_in  = malloc(buf_in_size);
>>
>> m_malloc().
>>
>>> +  size_t const buf_out_size = ZSTD_DStreamOutSize();
>>> +  void*  const buf_out = malloc(buf_out_size);
>>> +  size_t init_result, just_read, to_read;
>>> +  ZSTD_DStream* const dstream = ZSTD_createDStream();
>>> +  if (dstream == NULL) {
>>> +    ohshit(_("ZSTD_createDStream() error "));
>>> +  }
>>
>> Unnecessary braces, and non-obvious error message.
>
> Fixed.
>
>>
>>> +  /* TODO: a file may consist of multiple appended frames (ex : pzstd).
>>> +   * The following implementation decompresses only the first frame */
>>
>> The commit fixing this should be folded into this one.
>
> Done, all commits are squashed.
>
>>
>>> +  init_result = ZSTD_initDStream(dstream);
>>> +  if (ZSTD_isError(init_result)) {
>>> +    ohshit(_("ZSTD_initDStream() error : %s"), 
>>> ZSTD_getErrorName(init_result));
>>> +  }
>>> +  to_read = init_result;
>>> +  while ((just_read = fd_read(fd_in, buf_in, to_read))) {
>>> +    ZSTD_inBuffer input = { buf_in, just_read, 0 };
>>> +    while (input.pos < input.size) {
>>> +      ZSTD_outBuffer output = { buf_out, buf_out_size, 0 };
>>> +      to_read = ZSTD_decompressStream(dstream, &output , &input);
>>> +      if (ZSTD_isError(to_read)) {
>>> +        ohshit(_("ZSTD_decompressStream() error : %s \n"),
>>> +               ZSTD_getErrorName(to_read));
>>> +      }
>>> +      fd_write(fd_out, output.dst, output.pos);
>>
>> Return value not checked.
>
> Fixed.
>
>>
>>> +    }
>>> +  }
>>> +}
>>
>>> diff --git a/man/dpkg-source.man b/man/dpkg-source.man
>>> index 2233d7a8d..991162003 100644
>>> --- a/man/dpkg-source.man
>>> +++ b/man/dpkg-source.man
>>> @@ -176,7 +176,7 @@ Specify the compression to use for created tarballs and 
>>> diff files
>>>  (\fB\-\-compression\fP since dpkg 1.15.5).
>>>  Note that this option will not cause existing tarballs to be recompressed,
>>>  it only affects new files. Supported values are:
>>> -.IR gzip ", " bzip2 ", " lzma " and " xz .
>>> +.IR gzip ", " bzip2 ", " lzma ", " zstd " and " xz .
>>>  The default is \fIxz\fP for formats 2.0 and newer, and \fIgzip\fP for
>>>  format 1.0. \fIxz\fP is only supported since dpkg 1.15.5.
>>>  .TP
>>
>>> diff --git a/scripts/Dpkg/Compression.pm b/scripts/Dpkg/Compression.pm
>>> index 3dbc4adf0..4ea512fdc 100644
>>> --- a/scripts/Dpkg/Compression.pm
>>> +++ b/scripts/Dpkg/Compression.pm
>>> @@ -72,6 +72,12 @@ my $COMP = {
>>>       decomp_prog => [ 'unxz', '--format=lzma' ],
>>>       default_level => 6,
>>>      },
>>> +    zstd => {
>>> +     file_ext => 'zst',
>>> +     comp_prog => [ 'zstd', '-q' ],
>>> +     decomp_prog => [ 'unzstd', '-q' ],
>>> +     default_level => 19,
>>> +    },
>>>      xz => {
>>>       file_ext => 'xz',
>>>       comp_prog => [ 'xz' ],
>>
>> I don't think it makes sense to add support for source packages, as
>> long as there's no significant usage by upstreams, which I don't think
>> there is.
>
> Agreed.
>
>>
>>> From 9dec1a3f6be2e3d525a92f5a123300618407cb19 Mon Sep 17 00:00:00 2001
>>> From: Balint Reczey <balint.rec...@canonical.com>
>>> Date: Thu, 8 Mar 2018 10:14:30 +0100
>>> Subject: [PATCH 2/4] Add test for zstd decompression
>>
>> This should be folded into the main implementation commit.
>>
>>> From c927d94df0fdc59c25961505a5438b0dfc58710a Mon Sep 17 00:00:00 2001
>>> From: Balint Reczey <balint.rec...@canonical.com>
>>> Date: Fri, 9 Mar 2018 15:19:43 +0100
>>> Subject: [PATCH 3/4] dpkg: Support Zstandard compressed packages with 
>>> multiple
>>>  frames
>>
>> This should be folded with the main implementation commit.
>>
>>
>>> From d4b3f22299339f4b54f0013b5f86eff48db1e8c4 Mon Sep 17 00:00:00 2001
>>> From: Balint Reczey <balint.rec...@canonical.com>
>>> Date: Fri, 9 Mar 2018 11:19:24 +0100
>>> Subject: [PATCH 4/4] dpkg: Enable zstd uniform compression
>>>
>>> ---
>>>  dpkg-deb/main.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/dpkg-deb/main.c b/dpkg-deb/main.c
>>> index 7f898210e..7a40ecb80 100644
>>> --- a/dpkg-deb/main.c
>>> +++ b/dpkg-deb/main.c
>>> @@ -245,6 +245,7 @@ int main(int argc, const char *const *argv) {
>>>    if (opt_uniform_compression &&
>>>        (compress_params.type != COMPRESSOR_TYPE_NONE &&
>>>         compress_params.type != COMPRESSOR_TYPE_GZIP &&
>>> +       compress_params.type != COMPRESSOR_TYPE_ZSTD &&
>>>         compress_params.type != COMPRESSOR_TYPE_XZ))
>>>      badusage(_("unsupported compression type '%s' with uniform 
>>> compression"),
>>>               compressor_get_name(compress_params.type));
>>
>> I'm not sure why this would be split into a different commit, or
>> introduced at a different point. Older dpkg-deb will either not understand
>> the data.tar.zst or both the data.tar.zst and the control.tar.zst, so
>> there's no point in delaying this.
>>
>> Thanks,
>> Guillem
>
> Thank you for your feedback. I hope I could address your concerns and
> I'm looking forward to the discussion on debian-devel.

If you would like me to start the discussion here, just ping me on irc
or via email and I'll gladly do it.

>
> There is already a thread on ubuntu-devel if you are interested [8].
>
> I'm wondering if you are OK with the proposed .deb format (extensions,
> etc.), because Ubuntu is very close to releasing 18.04 and if we could
> agree on at least the package format Ubuntu's dpkg could add
> decompression zstd support without risking diverging later from
> Debian.

I'm sorry for getting back to you after you kindly reviewed my
original patches, it was my unfortunate mistake.

Thanks,
Balint

>
> Cheers,
> Balint
>
> --
> Balint Reczey
> Ubuntu & Debian Developer
>
> [2] https://www.opencsw.org/packages/zstd/
> [3] https://brewinstall.org/Install-zstd-on-Mac-with-Brew/
> [4] https://www.freshports.org/archivers/zstd/
> [5] https://github.com/facebook/zstd/issues/731
> [6] https://wiki.debian.org/Teams/Dpkg/Spec/DeltaDebs
> [7] http://debdelta.debian.net/
> [8] https://lists.ubuntu.com/archives/ubuntu-devel/2018-March/040211.html



-- 
Balint Reczey
Ubuntu & Debian Developer

Reply via email to