curiosity got the best of me here.  Found I had a machine with several
major compression programs on it, and while this tired argument isn't
worth this, I found it the idea of a comparison interesting, so I
thought I'd share...

[EMAIL PROTECTED] wrote:
you obviously missed the part about the preposterously slow compression
times.

Well..
You missed my point. I know you don`t crosscompile but you can compress
all Archives on a stronger Computer. So a VAX isn`t the best hardware to
start with LZMA compressing. Yes..

ick.  no.

Something else: You compress install Sets for CDs mostly just once but
they get decompressed many times. :)

depends.  I build sets often on slow machines.  More often than I install
'em.  In OpenBSD, I don't get much of a vote, but I get more than you. :)

And if somebody wanna change these install Sets he could use another Box
for the compression. But I don4t think there`s anybody out there who just
owns a 486 and a VAX.

As a NATIVE BUILDING OS, we assume you have only one machine.  OpenBSD
does not need a second machine to build, and hopefully will not.  When
you need a second machine, you might as well be cross building.  Heck,
you ARE cross-building.

You would save a lot Bandwith and also storage space on the CDs.

Bandwidth is cheap.
If you can cut the time to download from 30 minutes to 3 minutes, you
have accomplished something.  Changing it from 30 minutes to 15 minutes,
I don't think that changes a thing.  You will still get up and go do
something else (or you need a job).

Far less painful to all than changing compression would probably be
DVD media.  If you are going to inconvience some people, heck, at least
give them a REAL benefit, not 20% more.

E.g. the src gets compressed very well and the decompression isn`t that
much slower.

Oh?  Where are your numbers?

Being you have brought this topic up in the past, I'm going to show you
some real numbers.

Took comp39.tgz for i386 and ran these tests on a slow amd64 system (yeah,
choice of an i386 file for running the test on an amd64 is kinda strange,
but it really doesn't matter much).  Unzipped it, and copied it several
times, ran these compressions and watched RAM usage using top:

~/comptest $ time bzip2 comp39b.tar
    1m7.03s real     0m55.97s user     0m0.43s system
(maximum RAM used: around 8M)

~/comptest $ time rzip comp39c.tar
    0m36.42s real     0m26.62s user     0m1.14s system
(maximum RAM used: around 118M)

~/comptest $ time lzma e comp39d.tar comp39d.tar.lz
    7m5.59s real     6m54.79s user     0m0.59s system
(maximum RAM used: around 80M, I think)

~/comptest $ time gzip -9 comp39e.tar
    1m16.87s real     1m15.28s user     0m0.42s system
(maximum RAM used: around 700k)

Results:
~/comptest $ ls -l comp*
-rw-r--r--  1 njholland  njholland   75288260 Apr 17 19:20 comp39.tgz
-rw-r--r--  1 njholland  njholland  218347520 Apr 17 19:25 comp39a.tar
-rw-r--r--  1 njholland  njholland   50369737 Apr 17 19:26 comp39b.tar.bz2
-rw-r--r--  1 njholland  njholland   25860279 Apr 17 19:29 comp39c.tar.rz
-rw-r--r--  1 njholland  njholland   20849017 Apr 17 19:38 comp39d.tar.lz
-rw-r--r--  1 njholland  njholland   75288272 Apr 17 19:45 comp39e.tar.gz

Comments:
rzip and lzma turned in some good numbers (REALLY good numbers),
but the RAM required was absolutely absurd (at least, for this
application).  Your average mac68k or 486 would be in swap hell for
a week, and many good firewall systems would take many times as long
to load.  I'm rather amazed that rzip was such a screamer in
compression speed for this app, I've never seen it outrun anything
before.  I repeated the test, btw, consistent.

bzip2 seems to be an answer in search of a problem.  It does things a
little better for a big price.

gzip is the only one with a RAM footprint that is acceptable for a
multi-platform OS.  Really, the amount of change in our lives that
switching to a "better" compressor would make is not worth the
difference here.

HOWEVER, I did spot one interesting thing...first time I ran the gzip
test, I got a funny answer:

~/comptest $ time gzip comp39e.tar
    0m29.39s real     0m26.70s user     0m0.44s system

[EMAIL PROTECTED]
~/comptest $ ls -l comp39                                                       
                                   ~/comptest $ ls -l comp39*
-rw-r--r--  1 njholland  njholland   75288260 Apr 17 19:20 comp39.tgz
  ...
-rw-r--r--  1 njholland  njholland   75990958 Apr 17 19:26 comp39e.tar.gz

Since the size didn't line up, I figured that was probably due the
wrong compression "factor" specified -- apparently, the file sets are
built with a -9.  Note that in my humble opinion, the -9 on gzip that
tar and/or the build process uses is Just Not Worth It -- for a tiny
improvement in file size, you wait well over twice as long.  You think
that doesn't matter?  Re-pack a mac68k build some day.

ok, one last nail:
~ $ ls -l `which rzip` `which lzma` `which bzip2` `which gzip`
-r-xr-xr-x  6 root  bin     29248 Feb 19 14:08 /usr/bin/gzip*
-r-xr-xr-x  1 root  bin     33024 Oct 22 22:02 /usr/local/bin/bzip2*
-rwxr-xr-x  1 root  wheel  202845 Jan 31 15:06 /usr/local/bin/lzma*
-rwxr-xr-x  1 root  bin     34208 Oct 22 22:16 /usr/local/bin/rzip*

Keep in mind, another major criteria for OpenBSD is to keep it so
that the install process can be run off a floppy disk.  Simply
four or five extra K for bzip would be a challenge.  Oh, and check
out the library dependencies -- lzma seems to need the 1.1MB
libstdc++ that none of the others needed.  Between the app and the
library, your floppy is now full  BZZZT.  Game over.  Sorry.
IF we were to do something silly like this, rzip is probably a lot
closer to being used (though building on a mac68k would take a
month instead of a week, I suspect).

> That`s why I asked.. and because I read about it on undeadly.org (-:

There is no one best file compression system, you have to weigh a LOT
of considerations, including:
  RAM needed
  Compression time
  End file size
  Portability
  License
  pipe-ability
  goal

AND you have to do it for YOUR particular dataset.  For this sample,
LZMA worked very well from a compression efficiency.  However,
comp39.tgz has lots of text files and other very compressible things,
if this idea were not so easily rejected for many other reasons, you
would have to repeat this test over the entire OpenBSD file set.
xfont39.tgz shows very little gain.

I did some evaluations of various compressions systems a while back for
an app where I COULD throw RAM and processor at the task, and the end
size mattered more than most other things.  rzip was the winner then,
before lzma was ported, but it nearly got disqualified because there
didn't seem to be an app that would decompress the files under Windows.
This mattered to us.  When lzma was ported, reports showed that it was
"better" than rzip in a few ways -- however for this application, rzip
was still a very clear winner in compression and speed.  Your results
with your data will vary.  For this application, we didn't care about
the RAM demands, time, pipe-ability, etc. of the application, we just
needed to cram as much on a DVD-R as we could.  That being said, if
rzip and lzma didn't exist, I would NOT have used bzip2 over gzip, as
the benefit was virtually non-existent, not worth the headaches for
another 5% capacity.

On the other hand, if my app needed to be piped (and it almost, but not
quite, does), rzip would be out of the game, as it requires random file
access on both the compression and decompressions steps.  Regardless
of its other virtues, it can't be piped.

OpenBSD file sets have their own criteria.  Being the absolutely
smallest size possible is not very high on the list.


heh. figured I should run some decompression times, and really remove
all remaining curiosities in my mind about compression alternatives:

~/comptest $ time gunzip comp39.tgz
    0m21.76s real     0m2.03s user     0m0.32s system
~/comptest $ time bunzip2 comp39b.tar.bz2
    0m24.30s real     0m21.62s user     0m0.54s system
~/comptest $ time rzip -d comp39c.tar.rz
    1m16.76s real     0m9.89s user     0m2.75s system
~/comptest $ time lzma d comp39d.tar.lz comp39d.tar
    0m17.13s real     0m4.59s user     0m0.53s system

amazingly, rzip was slower to decompress than to compress.  I'm
guessing it is all the disk thrashing -- it is pretty horrible
sounding at times.

So now...can we leave the engineering of OpenBSD to the people who
have a proven track record at looking at the whole picture, and
lay off the back-seat engineering?

Nick.

Reply via email to