Re: Patches to make GNU gzip and BusyBox gzip produce identical compression results

2019-09-05 Thread Daniel Edgecumbe
Well, not just the default behaviour.

It's probably nonsensical to assume that a specific compression level will
produce identical (or even similar) results across different implementations 
because
there are other possible variables, like the GZIP_FAST flag.

There is though a slight complication in that due to BusyBox being embedded
software we skip some checks. For example, with GZIP_LEVELS off, -1..9 flags are
just ignored transparently. So in that case scripts that rely on the compression
levels will fail anyway.

The patch set was more aimed at fixing some behaviours that I think aren't 
intuitive.
Specifically I think it's very strange that the default would go from -9 to -6 
when
levels are enabled. Much head scratching was had.

Daniel


On 05/09/2019 14.32, Kang-Che Sung wrote:
> Well, it seems that Denys merged the changes before I have the time to
> respond to comments, but anyway:
> 
> I actually disliked the argument about "de-facto standard" on compression
> levels. I think scripts should not depend on the default compression level,
> and should instead specify it explicitly. The reason is that you can not
> guarantee the 'gzip' on one machine is the same implementation on another
> (there are implementations that use more extensive search, for example,
> 7-zip and zopfli), it is just a coincidence that BusyBox gzip uses an
> implementation that's compatible with the most popular (zlib, I think), but
> it shouldn't be "guaranteed", let alone becoming a "standard" on
> implementaion or on the default behaviors.
> 
> I am not suggesting to change the patch now. I'm just mentioning that
> relying on default behavior of any program, for any script, is a bad idea
> in general. (E.g. Do specify -6 if your script expects it, even though it
> seems redundant, because user setting can always override the program
> defaults.)
> 

-- 
Daniel Edgecumbe | esotericnonsense
Kalix NO, Sverige | +358 46 584 2810
em...@esotericnonsense.com | https://esotericnonsense.com
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Patches to make GNU gzip and BusyBox gzip produce identical compression results

2019-09-05 Thread Kang-Che Sung
Well, it seems that Denys merged the changes before I have the time to
respond to comments, but anyway:

I actually disliked the argument about "de-facto standard" on compression
levels. I think scripts should not depend on the default compression level,
and should instead specify it explicitly. The reason is that you can not
guarantee the 'gzip' on one machine is the same implementation on another
(there are implementations that use more extensive search, for example,
7-zip and zopfli), it is just a coincidence that BusyBox gzip uses an
implementation that's compatible with the most popular (zlib, I think), but
it shouldn't be "guaranteed", let alone becoming a "standard" on
implementaion or on the default behaviors.

I am not suggesting to change the patch now. I'm just mentioning that
relying on default behavior of any program, for any script, is a bad idea
in general. (E.g. Do specify -6 if your script expects it, even though it
seems redundant, because user setting can always override the program
defaults.)
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Patches to make GNU gzip and BusyBox gzip produce identical compression results

2019-09-05 Thread Denys Vlasenko
Applied, thanks

On Mon, Sep 2, 2019 at 11:58 PM Daniel Edgecumbe
 wrote:
>
> Let's try once more, patches inline (new email client, gah!)
>
> 0001-gzip-default-level-with-ENABLE_FEATURE_GZIP_LEVELS-s.patch
>
> From 9d06f01e2805a5d6f1d775ceb651ae18ae2e1808 Mon Sep 17 00:00:00 2001
> From: Daniel Edgecumbe 
> Date: Mon, 2 Sep 2019 22:03:14 +0100
> Subject: [PATCH 1/3] gzip: default level with ENABLE_FEATURE_GZIP_LEVELS
>  should be 6
>
> Fixes an off-by-one that actually resulted in level 7 being used
> ---
>  archival/gzip.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/archival/gzip.c b/archival/gzip.c
> index 17341de45..37db347b8 100644
> --- a/archival/gzip.c
> +++ b/archival/gzip.c
> @@ -,7 +,7 @@ int gzip_main(int argc UNUSED_PARAM, char **argv)
>  #if ENABLE_FEATURE_GZIP_LEVELS
> opt >>= (BBUNPK_OPTSTRLEN IF_FEATURE_GZIP_DECOMPRESS(+ 2) + 1); /* 
> drop cfkvq[dt]n bits */
> if (opt == 0)
> -   opt = 1 << 6; /* default: 6 */
> +   opt = 1 << 5; /* default: 6 */
> opt = ffs(opt >> 4); /* Maps -1..-4 to [0], -5 to [1] ... -9 to [5] */
> max_chain_length = 1 << gzip_level_config[opt].chain_shift;
> good_match   = gzip_level_config[opt].good;
> --
> 2.23.0
>
> 0002-gzip-set-compression-flags-correctly-as-per-standard.patch
>
> From 4280c9633b359dcbf2ddadcf33790b8690f81c82 Mon Sep 17 00:00:00 2001
> From: Daniel Edgecumbe 
> Date: Mon, 2 Sep 2019 22:05:26 +0100
> Subject: [PATCH 2/3] gzip: set compression flags correctly as per standard
>
> With this change and CONFIG_GZIP_FAST=2, CONFIG_FEATURE_GZIP_LEVELS=y,
>
> GNU gzip and BusyBox gzip now produce identical output at each compression
> level (excluding 1..3, as BusyBox does not implement these levels).
> ---
>  archival/gzip.c | 22 --
>  1 file changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/archival/gzip.c b/archival/gzip.c
> index 37db347b8..f13748aa1 100644
> --- a/archival/gzip.c
> +++ b/archival/gzip.c
> @@ -259,6 +259,7 @@ enum {
>
>  #if !ENABLE_FEATURE_GZIP_LEVELS
>
> +   comp_level = 9,
> max_chain_length = 4096,
>  /* To speed up deflation, hash chains are never searched beyond this length.
>   * A higher limit improves compression ratio but degrades the speed.
> @@ -334,10 +335,12 @@ struct globals {
>  #define head (G1.prev + WSIZE) /* hash head (see deflate.c) */
>
>  #if ENABLE_FEATURE_GZIP_LEVELS
> +   unsigned comp_level;
> unsigned max_chain_length;
> unsigned max_lazy_match;
> unsigned good_match;
> unsigned nice_match;
> +#define comp_level (G1.comp_level)
>  #define max_chain_length (G1.max_chain_length)
>  #define max_lazy_match   (G1.max_lazy_match)
>  #define good_match  (G1.good_match)
> @@ -1919,7 +1922,7 @@ static void bi_init(void)
>  /* 
> ===
>   * Initialize the "longest match" routines for a new file
>   */
> -static void lm_init(unsigned *flags16p)
> +static void lm_init(void)
>  {
> unsigned j;
>
> @@ -1927,8 +1930,6 @@ static void lm_init(unsigned *flags16p)
> memset(head, 0, HASH_SIZE * sizeof(*head));
> /* prev will be initialized on the fly */
>
> -   /* speed options for the general purpose bit flag */
> -   *flags16p |= 2; /* FAST 4, SLOW 2 */
> /* ??? reduce max_chain_length for binary files */
>
> //G1.strstart = 0; // globals are zeroed in pack_gzip()
> @@ -2076,10 +2077,16 @@ static void zip(void)
>
> bi_init();
> ct_init();
> -   deflate_flags = 0;  /* pkzip -es, -en or -ex equivalent */
> -   lm_init(_flags);
> +   lm_init();
>
> -   put_16bit(deflate_flags | 0x300); /* extra flags. OS id = 3 (Unix) */
> +   deflate_flags = 0x300; /* extra flags. OS id = 3 (Unix) */
> +#if ENABLE_FEATURE_GZIP_LEVELS
> +   /* Note that comp_levels < 4 do not exist in this version of gzip */
> +   if (comp_level == 9) {
> +   deflate_flags |= 0x02; /* SLOW flag */
> +   }
> +#endif
> +   put_16bit(deflate_flags);
>
> /* The above 32-bit misaligns outbuf (10 bytes are stored), flush it 
> */
> flush_outbuf_if_32bit_optimized();
> @@ -2224,6 +2231,9 @@ int gzip_main(int argc UNUSED_PARAM, char **argv)
> if (opt == 0)
> opt = 1 << 5; /* default: 6 */
> opt = ffs(opt >> 4); /* Maps -1..-4 to [0], -5 to [1] ... -9 to [5] */
> +
> +   comp_level = opt + 4;
> +
> max_chain_length = 1 << gzip_level_config[opt].chain_shift;
> good_match   = gzip_level_config[opt].good;
> max_lazy_match   = gzip_level_config[opt].lazy2 * 2;
> --
> 2.23.0
>
> 0003-gzip-set-default-compression-level-to-6-when-CONFIG_.patch
>
> From 12d30559486502feec4e2821b3ab45ae6139e7aa Mon Sep 17 00:00:00 2001
> From: Daniel Edgecumbe 
> Date: Mon, 2 Sep 2019 22:09:15 +0100
> Subject: [PATCH 3/3] gzip: set default compression 

Re: Patches to make GNU gzip and BusyBox gzip produce identical compression results

2019-09-03 Thread Aaro Koskinen
Hi,

On Tue, Sep 03, 2019 at 10:31:20AM +0800, Kang-Che Sung wrote:
> gzip -9 is quite fast in modern processors, and if someone builds busybox
> without CONFIG_FEATURE_GZIP_LEVELS, I think they are moke likely to stick
> with -9 as default instead of -6.

No, -9 is really slow with little gain:

https://marc.info/?l=busybox=142989731614406=2

A.
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Patches to make GNU gzip and BusyBox gzip produce identical compression results

2019-09-03 Thread Martijn Dekker

Op 03-09-19 om 04:31 schreef Kang-Che Sung:

Excuse me, but I wonder one thing on the third patch: Why should we follow
strictly with gzip on the no-options default behavior?


For two reasons.

First, the default 6 compression level is a de-facto standard. BSD gzip 
and Apple gzip (on macOS) use this default as well. So there is a 
reasonable expectation that different gzip implementations act the same. 
For instance, if the default for busybox gzip becomes 9, then someone 
writing a script using busybox gzip could reasonably expect that the 
compression level will still be 9 when the same script is run on another 
system. That would be wrong. Implementations should not deviate from 
de-facto standards without a strong reason.


Second, the inherent reason for this default has not gone away. While 
processor speeds have exploded since the default was set, so has the 
typical size of compressed files. Multiple gigabytes are nothing unusual 
these days. And gzip is often used for compression on the fly, precisely 
because it offers a good compromise between speed and compression ratio. 
So I believe 6 continues to be a reasonable default.



The better change would be to allow the builder to choose the compression level
at build time.


I disagree with that as well, for the same reasons.

- Martijn

--
modernish -- harness the shell
https://github.com/modernish/modernish
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Patches to make GNU gzip and BusyBox gzip produce identical compression results

2019-09-03 Thread Daniel Edgecumbe
I split out the third patch for this reason. I agree, it can be dropped
if people see no need for it. Adding a compile-time option I think is
overkill.

Daniel

On 03/09/2019 04.31, Kang-Che Sung wrote:
>> From 12d30559486502feec4e2821b3ab45ae6139e7aa Mon Sep 17 00:00:00 2001
>> From: Daniel Edgecumbe 
>> Date: Mon, 2 Sep 2019 22:09:15 +0100
>> Subject: [PATCH 3/3] gzip: set default compression level to 6 when
>>  CONFIG_FEATURE_GZIP_LEVELS=n
>>
>> With this change, GNU gzip -n and BusyBox gzip now produce identical output
>> assuming that CONFIG_GZIP_FAST=2.
>> ---
> 
> Excuse me, but I wonder one thing on the third patch: Why should we follow
> strictly with gzip on the no-options default behavior? gzip -9 is quite fast
> in modern processors, and if someone builds busybox without
> CONFIG_FEATURE_GZIP_LEVELS, I think they are moke likely to stick with -9 as
> default instead of -6.
> 
> The better change would be to allow the builder to choose the compression 
> level
> at build time. It would be better to resolve the debate on which level should
> be the default, Otherwise, I think the third patch can be dropped.
> 

-- 
Daniel Edgecumbe | esotericnonsense
Kalix NO, Sverige | +358 46 584 2810
em...@esotericnonsense.com | https://esotericnonsense.com
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Patches to make GNU gzip and BusyBox gzip produce identical compression results

2019-09-02 Thread Kang-Che Sung
> From 12d30559486502feec4e2821b3ab45ae6139e7aa Mon Sep 17 00:00:00 2001
> From: Daniel Edgecumbe 
> Date: Mon, 2 Sep 2019 22:09:15 +0100
> Subject: [PATCH 3/3] gzip: set default compression level to 6 when
>  CONFIG_FEATURE_GZIP_LEVELS=n
>
> With this change, GNU gzip -n and BusyBox gzip now produce identical output
> assuming that CONFIG_GZIP_FAST=2.
> ---

Excuse me, but I wonder one thing on the third patch: Why should we follow
strictly with gzip on the no-options default behavior? gzip -9 is quite fast
in modern processors, and if someone builds busybox without
CONFIG_FEATURE_GZIP_LEVELS, I think they are moke likely to stick with -9 as
default instead of -6.

The better change would be to allow the builder to choose the compression level
at build time. It would be better to resolve the debate on which level should
be the default, Otherwise, I think the third patch can be dropped.
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Patches to make GNU gzip and BusyBox gzip produce identical compression results

2019-09-02 Thread Daniel Edgecumbe
Let's try once more, patches inline (new email client, gah!)

0001-gzip-default-level-with-ENABLE_FEATURE_GZIP_LEVELS-s.patch

>From 9d06f01e2805a5d6f1d775ceb651ae18ae2e1808 Mon Sep 17 00:00:00 2001
From: Daniel Edgecumbe 
Date: Mon, 2 Sep 2019 22:03:14 +0100
Subject: [PATCH 1/3] gzip: default level with ENABLE_FEATURE_GZIP_LEVELS
 should be 6

Fixes an off-by-one that actually resulted in level 7 being used
---
 archival/gzip.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/archival/gzip.c b/archival/gzip.c
index 17341de45..37db347b8 100644
--- a/archival/gzip.c
+++ b/archival/gzip.c
@@ -,7 +,7 @@ int gzip_main(int argc UNUSED_PARAM, char **argv)
 #if ENABLE_FEATURE_GZIP_LEVELS
opt >>= (BBUNPK_OPTSTRLEN IF_FEATURE_GZIP_DECOMPRESS(+ 2) + 1); /* drop 
cfkvq[dt]n bits */
if (opt == 0)
-   opt = 1 << 6; /* default: 6 */
+   opt = 1 << 5; /* default: 6 */
opt = ffs(opt >> 4); /* Maps -1..-4 to [0], -5 to [1] ... -9 to [5] */
max_chain_length = 1 << gzip_level_config[opt].chain_shift;
good_match   = gzip_level_config[opt].good;
-- 
2.23.0

0002-gzip-set-compression-flags-correctly-as-per-standard.patch

>From 4280c9633b359dcbf2ddadcf33790b8690f81c82 Mon Sep 17 00:00:00 2001
From: Daniel Edgecumbe 
Date: Mon, 2 Sep 2019 22:05:26 +0100
Subject: [PATCH 2/3] gzip: set compression flags correctly as per standard

With this change and CONFIG_GZIP_FAST=2, CONFIG_FEATURE_GZIP_LEVELS=y,

GNU gzip and BusyBox gzip now produce identical output at each compression
level (excluding 1..3, as BusyBox does not implement these levels).
---
 archival/gzip.c | 22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/archival/gzip.c b/archival/gzip.c
index 37db347b8..f13748aa1 100644
--- a/archival/gzip.c
+++ b/archival/gzip.c
@@ -259,6 +259,7 @@ enum {
 
 #if !ENABLE_FEATURE_GZIP_LEVELS
 
+   comp_level = 9,
max_chain_length = 4096,
 /* To speed up deflation, hash chains are never searched beyond this length.
  * A higher limit improves compression ratio but degrades the speed.
@@ -334,10 +335,12 @@ struct globals {
 #define head (G1.prev + WSIZE) /* hash head (see deflate.c) */
 
 #if ENABLE_FEATURE_GZIP_LEVELS
+   unsigned comp_level;
unsigned max_chain_length;
unsigned max_lazy_match;
unsigned good_match;
unsigned nice_match;
+#define comp_level (G1.comp_level)
 #define max_chain_length (G1.max_chain_length)
 #define max_lazy_match   (G1.max_lazy_match)
 #define good_match  (G1.good_match)
@@ -1919,7 +1922,7 @@ static void bi_init(void)
 /* ===
  * Initialize the "longest match" routines for a new file
  */
-static void lm_init(unsigned *flags16p)
+static void lm_init(void)
 {
unsigned j;
 
@@ -1927,8 +1930,6 @@ static void lm_init(unsigned *flags16p)
memset(head, 0, HASH_SIZE * sizeof(*head));
/* prev will be initialized on the fly */
 
-   /* speed options for the general purpose bit flag */
-   *flags16p |= 2; /* FAST 4, SLOW 2 */
/* ??? reduce max_chain_length for binary files */
 
//G1.strstart = 0; // globals are zeroed in pack_gzip()
@@ -2076,10 +2077,16 @@ static void zip(void)
 
bi_init();
ct_init();
-   deflate_flags = 0;  /* pkzip -es, -en or -ex equivalent */
-   lm_init(_flags);
+   lm_init();
 
-   put_16bit(deflate_flags | 0x300); /* extra flags. OS id = 3 (Unix) */
+   deflate_flags = 0x300; /* extra flags. OS id = 3 (Unix) */
+#if ENABLE_FEATURE_GZIP_LEVELS
+   /* Note that comp_levels < 4 do not exist in this version of gzip */
+   if (comp_level == 9) {
+   deflate_flags |= 0x02; /* SLOW flag */
+   }
+#endif
+   put_16bit(deflate_flags);
 
/* The above 32-bit misaligns outbuf (10 bytes are stored), flush it */
flush_outbuf_if_32bit_optimized();
@@ -2224,6 +2231,9 @@ int gzip_main(int argc UNUSED_PARAM, char **argv)
if (opt == 0)
opt = 1 << 5; /* default: 6 */
opt = ffs(opt >> 4); /* Maps -1..-4 to [0], -5 to [1] ... -9 to [5] */
+
+   comp_level = opt + 4;
+
max_chain_length = 1 << gzip_level_config[opt].chain_shift;
good_match   = gzip_level_config[opt].good;
max_lazy_match   = gzip_level_config[opt].lazy2 * 2;
-- 
2.23.0

0003-gzip-set-default-compression-level-to-6-when-CONFIG_.patch

>From 12d30559486502feec4e2821b3ab45ae6139e7aa Mon Sep 17 00:00:00 2001
From: Daniel Edgecumbe 
Date: Mon, 2 Sep 2019 22:09:15 +0100
Subject: [PATCH 3/3] gzip: set default compression level to 6 when
 CONFIG_FEATURE_GZIP_LEVELS=n

With this change, GNU gzip -n and BusyBox gzip now produce identical output
assuming that CONFIG_GZIP_FAST=2.
---
 archival/gzip.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/archival/gzip.c b/archival/gzip.c
index 

Re: Patches to make GNU gzip and BusyBox gzip produce identical compression results

2019-09-02 Thread Eli Schwartz
On 9/2/19 5:31 PM, Daniel Edgecumbe wrote:
> A discussion with eschwartz on the Arch Linux freenode IRC channel led
> to the discovery of some minor implementation details lacking in the
> BusyBox gzip applet which can cause output to differ both across GNU
> gzip and BusyBox, and different versions of BusyBox.
> 
> Please find attached three seperate patches for the solution of these
> issues.
> 
> I've also pushed the branch at
> https://git.esotericnonsense.com/busybox.git/

Thanks. :D

-- 
Eli Schwartz
Arch Linux Bug Wrangler and Trusted User



signature.asc
Description: OpenPGP digital signature
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox