On Sat, Sep 25, 1999 at 04:26:55AM +0200, Wolfgang Weisselberg wrote:
> Hi!
>
> Trying to kill the keyboard, [EMAIL PROTECTED] produced:
> > On Wed, Sep 22, 1999 at 11:36:20AM +0200, Wolfgang Weisselberg wrote:
> > > > > Second, and most importantly, is that using compression
> > > > > on a tar backup could result in total data loss in a archive after a bad
>
> > bzip2 handles this OK - the archive is split into blocks of a few hundred K as
> > part of the compression, and each block is independently checksummed so that
> > a damaged archive can have all undamaged blocks recovered.
>
> bzip2 has blocks of 100 to 900KB (corresponding to -1 to -9).
> That is a bit too big for my taste: you can loose a awful lot
> of data in 2 MB worth of files/scripts ...
This is worse than gzip. Gzip uses smaller (max 32K) blocks, but each block has no
checksum, if you lose 1 bit in a gzip stream you lose the whole backup. Of course
you can compress files individually, but this loses even more compression. I
don't know of any compressor other than bzip2 that allows recovery of parts of a file.
Of course, the Right Way to avoid tape errors is to keep more than 1 copy of each FS
(this backup and the one before it at least.)
> > It also gives very good compression (much better than gzip), but uses a lot
> > of CPU so this is not useful for running a fast tape at full speed.
> > (I get 66% of a P2-450 used writing to a QIC-250 that does 80K/s.)
>
> Much better is in the eye of the beholder. You get an
> additionally 10 or 20 % over a gzip, usually with a hefty
> time increase. Over my kernel source, tarred:
>
> method time r. time r. size r. time/size
> gzip -1 25 secs 1 1/3.18 1
> gzip -6 49 secs 1.96 1/3.77 1.65
> gzip -9 120 secs 4.80 1/3.80 4.02
>
> bzip2 -1 172 secs 6.88 1/4.00 5.47
> bzip2 -6 186 secs 7.44 1/4.51 5.29
> bzip2 -9 184 secs 7.36 1/4.70 4.98
>
> In other words, bzip2 -6 is 280% longer than a gzip -6 while
> it decreases the compressed space by an additional 20%.
> The savings for bzip -9 are better, though; in this case
> it is even a bit faster. But taking 3 times as long for 20%
> less is only worth it if it's gonna be transferred often or
> if it's critical to fit it on the medium.
Unfortunately compression is like that - you can use huge amounts
of CPU to get small gains in compression ratio - but looking at the
ACT most compressors as efficient as bzip2 are much slower.
Szip is an exception, but like gzip it has no block based error
recovery. If the tape keeps streaming though it doesn't really matter
how much CPU it uses.
> Worth it?
>
> IMHO:
> - not because of the block size of 600KB in this case, that's
> too large (you'll really want a "per file" compression or
> blocks that are much smaller, say 10-30Kb).
> - for compression only if your computer is fast enough to keep
> the tape still streaming.
>
In my case it is, but I guess most fast machines have a faster and
bigger tape than my 250MB QIC. Even for me, I need 40MB of memory
buffers to keep it streaming. (although the latest bzip2 doesn't
slow down so much on very repetitive data, so this may not be
needed any more.)
--
[EMAIL PROTECTED]