On Tuesday 24 November 2015 21:41:44 Jon LaBadie wrote:

> On Tue, Nov 24, 2015 at 09:51:56PM +0000, Debra S Baddorf wrote:
> > Others:   how about also some comments about software vs  hardware
> > compression? I was under the impression that software compression
> > allows amanda better control, since she then knows how much will
> > actually fit on the tape, though it is slower, since she has to DO
> > the compression. Where’s the balance  (speed vs space gain)?  
> > Clearly, David has a lot of data here, so space may be the driving
> > factor for him.
>
> Many factors go into this decision.
>
>   Compress in the tape drive (hardware) or let amanda use gzip
> (software)? Why not both?
>   If software, should I compress on the amanda client or on the
> server?
>
> Hard- or software?
>
> Some parts of amanda's decision making depend on knowing how big the
> the tape is (eg. during the planning stage) and how much room remains
> (eg. before each taping).  [Note, someone (likely jlm) may correct me]
>
> If you use hardware compression, amanda does not have an accurate
> picture of the situation.  With splitting DLEs across multiple tapes,
> this is less of a concern.  And if you find your tapes are
> consistantly underfilled, just "lie" to amanda about the tapes
> capacity, your 6.25TB tape becomes 7.0TB.  I'm not certain, but I
> believe that once amanda begins writing to a tape, it continues to
> write the chunks until it receives notice that the tape is full.  Then
> it tapes the failed chunk on the next tape (assuming it can use
> another tape).
>
> Basically, I would not hesitate to use hardware compression on an
> LTO-6. Other opinions may vary.  See later about whether you should
> software compress.
>
> Why not both?
>
> At one time the compressor algorithm used on tape drives could
> actually `expand' random data while trying to compress it.  Both
> encrypted and already compressed data are sufficiently random to cause
> this problem. Thus the oft repeated admonition "don't use both
> software and hardware compression".
>
> Modern tape formats avoid this.  These formats include LTO-x through
> at least 4 and I have no reason to assume any change in 5&6.  The way
> they avoid it is to read data from the bus to memory and compress it
> to other memory.  If compression causes a size increase, the original,
> uncompressed data is sent to tape instead of the data that expanded.
>
> What about software compression?
>
> The LTO ?foundation? says LTO-6 does a 2.5:1 compression.  Lots of
> people feel gzip gives about a 2.0:1 compression.  So hardware
> compression is better - right?  Well those numbers are for "generic
> data".  Is your data "generic", unlikely.  I've seen text compress
> to 10% and other data, like media, barely compress.

I have seen formatted text like g-code, which has a limited "vocabulary" 
compress to 4% in gzip -best here.  But not all text is so limited as to 
do that.

> You may wish to check your data.  Create a huge file on that 2TB
> holding disk.  Make it 10-20% of the tape capacity, say 700GB-1TB.
> Create it by "cat'ing" representative data together.  Then tape
> the file using a dd in a loop something like this (untested).
>
>   SECONDS=0
>   while dd if=<above_file> of=<your_tape_device> bs=<1M or 2M>
>   do
>     echo $SECONDS
>   done
>   echo $SECONDS
>
> The loop will terminate when dd reaches the end of the tape.
> You can then count up the number of dd's run (plus a partial)
> and figure out how much data was written and how long it took.
>
> Set your tape drive to use 1 or 2MB blocks (match above) and
> actually, make several runs under these conditions:
>
> 1. hw compress off -- gives total capacity of tape and time to tape
> 2. hw compress on  -- should be more dd's run to see effect of hw comp
>
> Use gzip to compress the file
>
> 3. hw compress off -- should give same tape capacity as 1, and you can
>                       calculate if hw or sw compress is more effective
> 4. hw compress on  -- can hw compress shrink it more?  Is the combo
>                       better than the best of 2. or 3.

This scenario is much easier said than done, Jon.  The ONLY reliable way 
to disable hw compression once it has been enabled, is to read out the 
32k tape label 1st block of the tape, rewind the tape, turn hw 
compression off, and without doing anything that will move the tape, 
re-write that label block back to the tape.  Only that sequence seems to 
reset the hw compression flag in the tape header.

Since I've been using v-tapes for around a decade now, they don't have hw 
compress ability. The recovery, should it be needed, is 10 to 100x 
faster from  v-tapes as the hard drive is random access, whereas the 
tape is not. I am not setup to do offsite, but so far, knock on wood,  
smartctl has given me more than enough warning of an impending drive 
failure.

> Client or Server SW Compression?
>
> Face it, compression is cpu intensive.  Some clients may not be able
> to easily handle gzip'ing the data.  On the other hand, if you don't
> client compress, you are sending much more data over the network.  Can
> the net handle it.  On the third hand ;) your server is liable to be
> dumping multiple clients simultaneously.  If you server compress, can
> the server handle multiple gzips plus saving to disk plus sending to
> tape?
>
> Only you can decide what is right for your situation.  But keep in
> mind, it is not a one for all thing.  You can choose to compress or
> not, client or server on a host by host or even DLE by DLE basis.

Which I do by using assorted dumptypes in the individual DLE.

> Good Luck!
>
> HTH,
> Jon
>
> > Very good progress David!
> > Deb Baddorf
> >
> > > On Nov 24, 2015, at 3:27 PM, David Simpson
> > > <david.simp...@eng.ox.ac.uk> wrote:
> > >
> > > I’ve now got a working dump configuration – or at least I was able
> > > to test a dump [to completion] a subset of the data. I’m now
> > > expanding that test.
> > >
> > > I can see how important it is to get the DLEs right/manageable now
> > > and the implications of data structure (for Amanda). I might have
> > > quite a lot to do – to define them all, when I get to that, for
> > > the full archive. I’m likely to split up by host with labels to
> > > make it a bit easier. I think I’ve been able to form something
> > > vaguely sensible for the first server (interms of disklist) by
> > > firstly running a du then examining it. Forming the DLE’s using
> > > includes/excludes then a catch all (hope that works! will find out
> > > when I’ve done the dump – ongoing).  I can see the problem of
> > > large data deep in the directory structure. Not so nice from a
> > > DLE/disklist definition POV.
> > >
> > > The problem now is throughput.
> > >
> > > My average dump rate is quite poor on the test I’ve run – I’m not
> > > surprised though as this client is limited by network. This will
> > > be made better at some point by running a test over a separate
> > > logical network.
> > >
> > > The average write/throughput to tape (HP LTO6 + Quantum
> > > superloader 3 + Centos7) is around 83 mb/s . That’s the best I’ve
> > > seen from my testing so far. This is with 5x DLEs  ranging from
> > > around 100 to around 200 GB. I’ve been working with that same
> > > data.
> > >
> > > Current settings:
> > > LTO6 tape
> > > part_size set to 200GB
> > > chuncksize set to 500mb
> > > holding disk usable 2000GB/2TB
> > > software compression off (Amanda)
> > > hardware compression on (tape)
> > >
> > > record no
> > > strategy noinc
> > > skip-incr yes
> > > auth "ssh"
> > > GNUTAR
> > >
> > >
> > > I’m starting to think more CPU (clock) + striped RAID (for Amanda
> > > holding disk) to feed it would help. What other factors should I
> > > look at?
> > >
> > > What are the best ways to tune it?
> > >
> > > Still not convinced about my tape profile (amtapetype). It
> > > returned quite a low speed...
> > >
> > > What kernel module are you using?
> > >
> > > thanks
> > > David
> > >
> > > --------------------
> > > David Simpson - Computing Support Officer
> > > IBME - Institute of Biomedical Engineering
> > > Old Road Campus Research Building
> > > Oxford, OX3 7DQ
> > > Tel: 01865 617697 ext: 17697
> > >
> >>> End of included message <<<


Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page <http://geneslinuxbox.net:6309/gene>

Reply via email to