On Tue, Nov 24, 2015 at 09:51:56PM +0000, Debra S Baddorf wrote: > Others: how about also some comments about software vs hardware > compression? > I was under the impression that software compression allows amanda better > control, since she then knows how much will actually fit on the tape, though > it is slower, since she has to DO the compression. > Where’s the balance (speed vs space gain)? Clearly, David has a lot of > data here, so space may be the driving factor for him. >
Many factors go into this decision. Compress in the tape drive (hardware) or let amanda use gzip (software)? Why not both? If software, should I compress on the amanda client or on the server? Hard- or software? Some parts of amanda's decision making depend on knowing how big the the tape is (eg. during the planning stage) and how much room remains (eg. before each taping). [Note, someone (likely jlm) may correct me] If you use hardware compression, amanda does not have an accurate picture of the situation. With splitting DLEs across multiple tapes, this is less of a concern. And if you find your tapes are consistantly underfilled, just "lie" to amanda about the tapes capacity, your 6.25TB tape becomes 7.0TB. I'm not certain, but I believe that once amanda begins writing to a tape, it continues to write the chunks until it receives notice that the tape is full. Then it tapes the failed chunk on the next tape (assuming it can use another tape). Basically, I would not hesitate to use hardware compression on an LTO-6. Other opinions may vary. See later about whether you should software compress. Why not both? At one time the compressor algorithm used on tape drives could actually `expand' random data while trying to compress it. Both encrypted and already compressed data are sufficiently random to cause this problem. Thus the oft repeated admonition "don't use both software and hardware compression". Modern tape formats avoid this. These formats include LTO-x through at least 4 and I have no reason to assume any change in 5&6. The way they avoid it is to read data from the bus to memory and compress it to other memory. If compression causes a size increase, the original, uncompressed data is sent to tape instead of the data that expanded. What about software compression? The LTO ?foundation? says LTO-6 does a 2.5:1 compression. Lots of people feel gzip gives about a 2.0:1 compression. So hardware compression is better - right? Well those numbers are for "generic data". Is your data "generic", unlikely. I've seen text compress to 10% and other data, like media, barely compress. You may wish to check your data. Create a huge file on that 2TB holding disk. Make it 10-20% of the tape capacity, say 700GB-1TB. Create it by "cat'ing" representative data together. Then tape the file using a dd in a loop something like this (untested). SECONDS=0 while dd if=<above_file> of=<your_tape_device> bs=<1M or 2M> do echo $SECONDS done echo $SECONDS The loop will terminate when dd reaches the end of the tape. You can then count up the number of dd's run (plus a partial) and figure out how much data was written and how long it took. Set your tape drive to use 1 or 2MB blocks (match above) and actually, make several runs under these conditions: 1. hw compress off -- gives total capacity of tape and time to tape 2. hw compress on -- should be more dd's run to see effect of hw comp Use gzip to compress the file 3. hw compress off -- should give same tape capacity as 1, and you can calculate if hw or sw compress is more effective 4. hw compress on -- can hw compress shrink it more? Is the combo better than the best of 2. or 3. Client or Server SW Compression? Face it, compression is cpu intensive. Some clients may not be able to easily handle gzip'ing the data. On the other hand, if you don't client compress, you are sending much more data over the network. Can the net handle it. On the third hand ;) your server is liable to be dumping multiple clients simultaneously. If you server compress, can the server handle multiple gzips plus saving to disk plus sending to tape? Only you can decide what is right for your situation. But keep in mind, it is not a one for all thing. You can choose to compress or not, client or server on a host by host or even DLE by DLE basis. Good Luck! HTH, Jon > Very good progress David! > Deb Baddorf > > > > On Nov 24, 2015, at 3:27 PM, David Simpson <david.simp...@eng.ox.ac.uk> > > wrote: > > > > I’ve now got a working dump configuration – or at least I was able to test > > a dump [to completion] a subset of the data. I’m now expanding that test. > > > > I can see how important it is to get the DLEs right/manageable now and the > > implications of data structure (for Amanda). I might have quite a lot to do > > – to define them all, when I get to that, for the full archive. I’m likely > > to split up by host with labels to make it a bit easier. I think I’ve been > > able to form something vaguely sensible for the first server (interms of > > disklist) by firstly running a du then examining it. Forming the DLE’s > > using includes/excludes then a catch all (hope that works! will find out > > when I’ve done the dump – ongoing). I can see the problem of large data > > deep in the directory structure. Not so nice from a DLE/disklist definition > > POV. > > > > The problem now is throughput. > > > > My average dump rate is quite poor on the test I’ve run – I’m not surprised > > though as this client is limited by network. This will be made better at > > some point by running a test over a separate logical network. > > > > The average write/throughput to tape (HP LTO6 + Quantum superloader 3 + > > Centos7) is around 83 mb/s . That’s the best I’ve seen from my testing so > > far. This is with 5x DLEs ranging from around 100 to around 200 GB. I’ve > > been working with that same data. > > > > Current settings: > > LTO6 tape > > part_size set to 200GB > > chuncksize set to 500mb > > holding disk usable 2000GB/2TB > > software compression off (Amanda) > > hardware compression on (tape) > > > > record no > > strategy noinc > > skip-incr yes > > auth "ssh" > > GNUTAR > > > > > > I’m starting to think more CPU (clock) + striped RAID (for Amanda holding > > disk) to feed it would help. What other factors should I look at? > > > > What are the best ways to tune it? > > > > Still not convinced about my tape profile (amtapetype). It returned quite a > > low speed... > > > > What kernel module are you using? > > > > thanks > > David > > > > -------------------- > > David Simpson - Computing Support Officer > > IBME - Institute of Biomedical Engineering > > Old Road Campus Research Building > > Oxford, OX3 7DQ > > Tel: 01865 617697 ext: 17697 >>> End of included message <<< -- Jon H. LaBadie j...@jgcomp.com 11226 South Shore Rd. (703) 787-0688 (H) Reston, VA 20190 (703) 935-6720 (C)