Re: Fast lzma radix matchfinder

Adam Tuja Tue, 14 Jun 2022 17:50:49 -0700

Hello,

> Speed gains depend on the nature of the source data

It's more or less true for every LZ-compressor and in general case doesn't change much. There were some specific data that were compressed worse, there were also some compressed better than lzma, but still, the difference wasn't big and only exceptionally noticeable.

> to achieve about the same ratio as 7-Zip requires double the dictionary size

In general, to be "compatible" with lzma compression ratios, he chose to increase dictionary size. It's in `man fxz` /Compression preset levels.

It could be, more or less, achieved by adjusting match finder. In reality this isn't working so well and increasing dictionary is better way.

To illustrate it I used lzip presets in xz, fastlzma2; I also increased match length by 50% but, as it turned out, it didn't change that much. [1]

Given that increasing is still 2 times faster and utilizing more processors doesn't use much more memory either, it was obvious choice.

Increased dictionary size increases decompression memory requirement but it's still 6 times smaller than what compression needs. And these days phones have 8+ times more memory than highest preset (128MB).

Speaking of dictionary sizes and presets, I'm surprised that lzip's presets for levels 8 and 9 don't increase as by 100% as lower levels and are not 32M and 64M respectively.

> Also, having a level 11 that compresses less than level 9 is confusing to users.

Compressors that use more than one algorithm use this exact way to distinguish between them. [2]

As long as it's stated in manual/help it should be no problem.

> Increasing the number of levels also hinders data recovery

Like how? It produces lzma stream that can be decompressed by lzma decompressor. Decompressor doesn't know nor care about levels.

> options like -11 or -19 are not compatible with POSIX or GNU

standards

Then maybe an option to choose mode. There are two already - fast and normal, they are not selectable at the moment but again, decompressor doesn't know, nor care about that - it only needs to know dictionary size.

Anyway you my find something useful there anyway.

[1] https://pastebin.com/ckEv4Yc3

[2] for example: https://github.com/inikep/lizard

14.06.2022, 18:23, "Antonio Diaz Diaz" <anto...@gnu.org>:

Adam Tuja wrote:
The comparison here would be the same as with lzma, that is slightly faster. [1]
Bigger advantage, beside compression speed, is revealed in memory consumption
for multiple threads - it's halved for single thread but 1/4 for 2 threads and
1/8 for 4 threads [1][2].

Very interesting. Thank you for bringing this to my attention. I expect to
look at it in depth when I find the time, but I guess it may be difficult
(or impossible) to integrate it meaningfully into plzip because it seems
very different from what plzip does. See for example
https://github.com/conor42/fast-lzma2#readme

"Speed gains depend on the nature of the source data."

"The largest caveat is that the match-finder is a block algorithm, and to
achieve about the same ratio as 7-Zip requires double the dictionary size,
which raises the decompression memory usage."

it is not worth the trouble of breaking lzip's reproducibility
Don't know what you mean by "reproducibility"

Lzip is more than a compressor. It is a set of tools designed around a
format tuned for long-term archiving. It is important that the output of
lzip does not change frequently between versions because such changes may
hinder some kinds of data recovery. See for example
http://www.nongnu.org/lzip/manual/lziprecover_manual.html#Reproducing-one-sector

We need to think about the consequences of the consequences (sic) of any
change to the interface or to the algorithm.

but I didn't mean to replace current encoder/s, rather complement them.
If it was used it could be different compression levels, like 11-19.

Increasing the number of levels also hinders data recovery.

Moreover, options like -11 or -19 are not compatible with POSIX or GNU
standards. See
http://www.nongnu.org/arg-parser/manual/arg_parser_manual.html#Argument-syntax

Also, having a level 11 that compresses less than level 9 is confusing to users.

So these may also be difficult to integrate meaningfully into lzip.

Best regards,
Antonio.

Re: Fast lzma radix matchfinder

Reply via email to