That is somewhat of a complicated question.

The simplest statement is that if Amanda manages the compression, and you have told it the capacity of the tape, then it knows what can fit on the tape. It also knows what a particular DLE can be compressed to based on the history. If the tape drive is doing the compression, then it is a black box. Amanda doesn't know what the DLE got compressed to, and it doesn't know how that relates to the capacity of the tape. That makes planning more difficult. Also, computers are getting faster and are typically multi core, so having gz running compressions for multiple DLEs on multiple cores is easily manageable.

Then there are the howevers. I'm currently dealing with a couple of servers that are each getting into the range of 50 to 100 TB of capacity that needs to be backed up to LTO6. One of those servers has been too frequently running into 36 hour or even 60 hour backup cycles. As I was comparing the two servers, I noticed that on one server, the largest amount of data consists of TIFF files for the digitized herbarium collection. Those don't compress, so I had set those DLEs to not use compression. I was getting well over 200MB/s from disk to holding disk for those, and then on the order of 155MB/s out to tape. Then, on both this same server and on the server that was running over a day, the DLEs that were being compressed were getting something on the order of 15MB/s from disk to holding disk, followed by on the order of 155MB/s out to tape. On the one server, that wasn't such a big deal, because the largest amount of data was not being compressed. On the other server, all of the data is being compressed, and the compression is significant, but it has become the bottleneck. Top shows multiple of Amanda's gz processes at the top of the list all day.

So, I'm beginning to rethink things for this server. These are SuperMicro servers with two AMD Opteron2.6GHz12core processor 6344 running Ubuntu LTS 14.04. They both have large external SAS multipath disk cabinets that are managed with mdadm and lvm. They both currently have about 24 external drives ranging from 1TB to 6TB built into a number of Raid5 and Raid6 arrays, and they both have two 1TB enterprise SSDs for holding disks. The tape systems are Overland NEOs 200 series with IBM LTO6 tape drives. My understanding of LTO6 is that the compression is hardware accelerated and is not supposed to slow down the data transfer. It is certainly going to be a bit of an experiment, but I'm reaching the point where I need to figure out how to get these backups done more quickly. As it is now, the tape is getting a lot of idle time while it waits for DLEs to be completed and ready to be written out to tape.

I've been using Amanda for more than 10 years on these servers and their predecessors with LTO6 and previously with AIT5, and it has always worked well. I'm only now getting the rapidly increasing demand for large data arrays that is putting real stress on our backup capabilities. I've got 3 Amanda servers with LTO6 libraries backing up about 12 servers in 4 different departments.


On 10/28/16 12:40 PM, Ochressandro Rettinger wrote:

Why does Amanda recommend the use of software compression vs: the built in hardware compression of the tape drive itself? Is that in fact still the current recommendation?

                -Sandro


--
---------------

Chris Hoogendyk

-
   O__  ---- Systems Administrator
  c/ /'_ --- Biology & Geosciences Departments
 (*) \(*) -- 315 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst

<hoogen...@bio.umass.edu>

---------------

Erdös 4

Reply via email to