Hi Steffen,
Steffen Nurpmeso wrote:
#?0|kent:plzip-1.11$ cp /x/balls/gcc-13.2.0.tar.xz X1
#?0|kent:plzip-1.11$ cp X1 X2
[...]
-rw-r----- 1 steffen steffen 89049959 May 7 22:14 X1.lz
-rw-r----- 1 steffen steffen 89079463 May 7 22:14 X2.lz
Note that if you use uncompressible files as input, you'll always obtain
similar compressed sizes, no matter the compression level or the dictionary
size. Try the test with gcc-13.2.0.tar and you'll see the difference. (As in
your other test with /x/doc/coding/austin-group/202x_d4.txt).
I think dynamically scalling according to the processors, talking
into account the dictionary size, as you said above, is the sane
approach for "saturating" with plzip, in the above job there are
quite a lot of files, of varying size (the spam DB being very
large), and one recipe is not good for them all.
Maybe there is a better way (almost optimal for many files) to compress the
spam DB that does not require a parallel compressor, but uses all the
processors in your machine. (And, as a bonus, achieves maximum compression
on files of any size and produces reproducible files).
ls | xargs -n1 -P4 lzip -9
The command above should produce better results than a saturated plzip.
'ls' may be replaced by any way to generate a list of the files to be
compressed. See
http://www.gnu.org/software/findutils/manual/html_node/find_html/xargs-options.html
Hope this helps,
Antonio.