On Tue, Nov 24, 2015 at 09:51:56PM +0000, Debra S Baddorf wrote:
> Others:   how about also some comments about software vs  hardware 
> compression?
> I was under the impression that software compression allows amanda better 
> control, since she then knows how much will actually fit on the tape, though 
> it is slower, since she has to DO the compression.
> Where’s the balance  (speed vs space gain)?   Clearly, David has a lot of 
> data here, so space may be the driving factor for him.
> 

Many factors go into this decision.

  Compress in the tape drive (hardware) or let amanda use gzip (software)?
  Why not both?
  If software, should I compress on the amanda client or on the server?

Hard- or software?

Some parts of amanda's decision making depend on knowing how big the
the tape is (eg. during the planning stage) and how much room remains
(eg. before each taping).  [Note, someone (likely jlm) may correct me]

If you use hardware compression, amanda does not have an accurate
picture of the situation.  With splitting DLEs across multiple tapes,
this is less of a concern.  And if you find your tapes are consistantly
underfilled, just "lie" to amanda about the tapes capacity, your 6.25TB
tape becomes 7.0TB.  I'm not certain, but I believe that once amanda
begins writing to a tape, it continues to write the chunks until it
receives notice that the tape is full.  Then it tapes the failed chunk
on the next tape (assuming it can use another tape).

Basically, I would not hesitate to use hardware compression on an LTO-6.
Other opinions may vary.  See later about whether you should software compress.

Why not both?

At one time the compressor algorithm used on tape drives could actually
`expand' random data while trying to compress it.  Both encrypted and
already compressed data are sufficiently random to cause this problem.
Thus the oft repeated admonition "don't use both software and hardware
compression".

Modern tape formats avoid this.  These formats include LTO-x through
at least 4 and I have no reason to assume any change in 5&6.  The way
they avoid it is to read data from the bus to memory and compress it to
other memory.  If compression causes a size increase, the original,
uncompressed data is sent to tape instead of the data that expanded.

What about software compression?

The LTO ?foundation? says LTO-6 does a 2.5:1 compression.  Lots of
people feel gzip gives about a 2.0:1 compression.  So hardware
compression is better - right?  Well those numbers are for "generic
data".  Is your data "generic", unlikely.  I've seen text compress
to 10% and other data, like media, barely compress.

You may wish to check your data.  Create a huge file on that 2TB
holding disk.  Make it 10-20% of the tape capacity, say 700GB-1TB.
Create it by "cat'ing" representative data together.  Then tape
the file using a dd in a loop something like this (untested).

  SECONDS=0
  while dd if=<above_file> of=<your_tape_device> bs=<1M or 2M>
  do
    echo $SECONDS
  done
  echo $SECONDS

The loop will terminate when dd reaches the end of the tape.
You can then count up the number of dd's run (plus a partial)
and figure out how much data was written and how long it took.

Set your tape drive to use 1 or 2MB blocks (match above) and
actually, make several runs under these conditions:

1. hw compress off -- gives total capacity of tape and time to tape
2. hw compress on  -- should be more dd's run to see effect of hw comp

Use gzip to compress the file

3. hw compress off -- should give same tape capacity as 1, and you can
                      calculate if hw or sw compress is more effective
4. hw compress on  -- can hw compress shrink it more?  Is the combo
                      better than the best of 2. or 3.

Client or Server SW Compression?

Face it, compression is cpu intensive.  Some clients may not be able to
easily handle gzip'ing the data.  On the other hand, if you don't client
compress, you are sending much more data over the network.  Can the net
handle it.  On the third hand ;) your server is liable to be dumping
multiple clients simultaneously.  If you server compress, can the server
handle multiple gzips plus saving to disk plus sending to tape?

Only you can decide what is right for your situation.  But keep in mind,
it is not a one for all thing.  You can choose to compress or not, client
or server on a host by host or even DLE by DLE basis.

Good Luck!

HTH,
Jon
> Very good progress David!
> Deb Baddorf
> 
> 
> > On Nov 24, 2015, at 3:27 PM, David Simpson <david.simp...@eng.ox.ac.uk> 
> > wrote:
> > 
> > I’ve now got a working dump configuration – or at least I was able to test 
> > a dump [to completion] a subset of the data. I’m now expanding that test.
> > 
> > I can see how important it is to get the DLEs right/manageable now and the 
> > implications of data structure (for Amanda). I might have quite a lot to do 
> > – to define them all, when I get to that, for the full archive. I’m likely 
> > to split up by host with labels to make it a bit easier. I think I’ve been 
> > able to form something vaguely sensible for the first server (interms of 
> > disklist) by firstly running a du then examining it. Forming the DLE’s 
> > using includes/excludes then a catch all (hope that works! will find out 
> > when I’ve done the dump – ongoing).  I can see the problem of large data 
> > deep in the directory structure. Not so nice from a DLE/disklist definition 
> > POV.
> > 
> > The problem now is throughput.
> > 
> > My average dump rate is quite poor on the test I’ve run – I’m not surprised 
> > though as this client is limited by network. This will be made better at 
> > some point by running a test over a separate logical network.
> > 
> > The average write/throughput to tape (HP LTO6 + Quantum superloader 3 + 
> > Centos7) is around 83 mb/s . That’s the best I’ve seen from my testing so 
> > far. This is with 5x DLEs  ranging from around 100 to around 200 GB. I’ve 
> > been working with that same data. 
> > 
> > Current settings:
> > LTO6 tape
> > part_size set to 200GB
> > chuncksize set to 500mb
> > holding disk usable 2000GB/2TB
> > software compression off (Amanda)
> > hardware compression on (tape)
> > 
> > record no
> > strategy noinc
> > skip-incr yes
> > auth "ssh"
> > GNUTAR
> > 
> > 
> > I’m starting to think more CPU (clock) + striped RAID (for Amanda holding 
> > disk) to feed it would help. What other factors should I look at? 
> > 
> > What are the best ways to tune it?
> > 
> > Still not convinced about my tape profile (amtapetype). It returned quite a 
> > low speed... 
> > 
> > What kernel module are you using?
> > 
> > thanks
> > David
> >  
> > --------------------
> > David Simpson - Computing Support Officer
> > IBME - Institute of Biomedical Engineering 
> > Old Road Campus Research Building
> > Oxford, OX3 7DQ
> > Tel: 01865 617697 ext: 17697
>>> End of included message <<<

-- 
Jon H. LaBadie                 j...@jgcomp.com
 11226 South Shore Rd.          (703) 787-0688 (H)
 Reston, VA  20190              (703) 935-6720 (C)

Reply via email to