That is somewhat of a complicated question.
The simplest statement is that if Amanda manages the compression, and you have told it the capacity
of the tape, then it knows what can fit on the tape. It also knows what a particular DLE can be
compressed to based on the history. If the tape drive is doing the compression, then it is a black
box. Amanda doesn't know what the DLE got compressed to, and it doesn't know how that relates to the
capacity of the tape. That makes planning more difficult. Also, computers are getting faster and are
typically multi core, so having gz running compressions for multiple DLEs on multiple cores is
easily manageable.
Then there are the howevers. I'm currently dealing with a couple of servers that are each getting
into the range of 50 to 100 TB of capacity that needs to be backed up to LTO6. One of those servers
has been too frequently running into 36 hour or even 60 hour backup cycles. As I was comparing the
two servers, I noticed that on one server, the largest amount of data consists of TIFF files for the
digitized herbarium collection. Those don't compress, so I had set those DLEs to not use
compression. I was getting well over 200MB/s from disk to holding disk for those, and then on the
order of 155MB/s out to tape. Then, on both this same server and on the server that was running over
a day, the DLEs that were being compressed were getting something on the order of 15MB/s from disk
to holding disk, followed by on the order of 155MB/s out to tape. On the one server, that wasn't
such a big deal, because the largest amount of data was not being compressed. On the other server,
all of the data is being compressed, and the compression is significant, but it has become the
bottleneck. Top shows multiple of Amanda's gz processes at the top of the list all day.
So, I'm beginning to rethink things for this server. These are SuperMicro servers with two AMD
Opteron2.6GHz12core processor 6344 running Ubuntu LTS 14.04. They both have large external SAS
multipath disk cabinets that are managed with mdadm and lvm. They both currently have about 24
external drives ranging from 1TB to 6TB built into a number of Raid5 and Raid6 arrays, and they both
have two 1TB enterprise SSDs for holding disks. The tape systems are Overland NEOs 200 series with
IBM LTO6 tape drives. My understanding of LTO6 is that the compression is hardware accelerated and
is not supposed to slow down the data transfer. It is certainly going to be a bit of an experiment,
but I'm reaching the point where I need to figure out how to get these backups done more quickly. As
it is now, the tape is getting a lot of idle time while it waits for DLEs to be completed and ready
to be written out to tape.
I've been using Amanda for more than 10 years on these servers and their predecessors with LTO6 and
previously with AIT5, and it has always worked well. I'm only now getting the rapidly increasing
demand for large data arrays that is putting real stress on our backup capabilities. I've got 3
Amanda servers with LTO6 libraries backing up about 12 servers in 4 different departments.
On 10/28/16 12:40 PM, Ochressandro Rettinger wrote:
Why does Amanda recommend the use of software compression vs: the built in
hardware compression of the tape drive itself? Is that in fact still the current recommendation?
-Sandro
--
---------------
Chris Hoogendyk
-
O__ ---- Systems Administrator
c/ /'_ --- Biology & Geosciences Departments
(*) \(*) -- 315 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst
<hoogen...@bio.umass.edu>
---------------
Erdös 4