questions about proposed amanda hardware solution

2002-10-11 Thread Mike Simpson

Folks --

I've been using amanda for a couple of years now on Sun hardware 
(DLT8000-based L9 tape library, with stctl as the tape changer 
controller).  It's worked great, but I'm nearing capacity with the 
current system and trying to decide whether to expand (another L9 and 
a load of DLT-IV and Sun-bought holding disk) or switch to another 
hardware platform and tape format.  I was hoping folks could take a 
look at my suggested new setup, and offer advice, warnings, horror 
stories, etc.

The amanda control host would be a smallish Linux rackmount job with
attached SCSI+RAID disk enclosure -- currently I'm considering a Dell
PowerEdge 1650 (1U, PIII-based) with a Dell PowerVault 220S (3U, 14 x
36 GB Ultra160 SCSI disks + PERC/3 RAID controller).  Doing a RAID-5 +
hot spare across the fourteen disks would give me 400 "real" GB
(1024^3) of holding disk.  I'd use RedHat Linux 7.3 for the OS.  Main
reason for picking Dell and RedHat over other PC/Linux options is just
that they're the standard for my group these days.  For a tape library
controller, I'm hoping that either the "mtx" or "scsi-changer" drivers
will be able to control the library hardware (see below).

My standard backup "depth" has been to try to have around two weeks of
"dailies", with a short dumpcycle to get a high proportion of level
zeroes onto the tapes (more for convenience of restores than anything
else).  I've also got a separate "offsite" rotation of all level
zeroes, run once a week, with about four weeks of depth, in case the 
data center ever burns down.  With my current backups, I wind up 
having the operators flip the DLT tapes in the L9 every day, which 
causes lots of dropped tapes and more wear and tear on the library.  
I'd like to get away from that, and wind up with a library capacity 
that lets me keep the daily rotation loaded all the time, with an 
extra slot so I can load the offsite tape for the week on Monday 
morning and then forget about it until the next Monday.

After poking around, the AIT3 tape format from Sony feels like a nice
direction in which to move (alternatives being sticking with DLT, or
moving to SDLT or Ultrium).  I don't know much about hardware vendors
for AIT3 libraries, but two of my candidates are the QualStar CLS-4216
(2U, with 1 AIT3 drive and 16 tape slots) or the Overland LibraryPro
(4U-ish, with 1 AIT3 drive and 19 tape slots).  Either of these would 
let me load two weeks worth of tapes and a weekly offsite.  I'm 
assuming I'd wind up with two libraries and two amanda rotations with 
about 200+ GB of data in each.  I could go with software 2:1 
compression to save on holding disk, or dump to disk uncompressed and 
then try to get the mythic 2.6:1 hardware compression advertised for 
the AIT3 format.

Does the above sound reasonable? terrible?  Have I missed any 
compatibility issues, and does anyone use a setup similar to this one?

Thanks for any and all advice,

-mgs





Re: questions about proposed amanda hardware solution

2002-10-14 Thread Mike Simpson

Folks --

I want to thank everybody for all the feedback.  Diskwise, it's nice 
to know that IDE solutions are workable, and I'm going to look into 
them.  Politically (isn't it always about the politics?) I'm in all 
likelihood going to be forced into using Dell hardware, and as far as 
I can tell from some site browsing at dell.com, Dell stuff is all 
SCSI-based once you get into that kind of capacity.

I'm still curious about hooking up an AIT3 tape library to Linux, and 
whether or not the mtx or scsi-changer drivers can actually control 
something like a QualStar or Overland tape library.  Is anyone out 
there using AIT3 tape libraries, and if so, what are you using to 
connect the amanda software to them?

Thanks,

-mgs





question about blocksize and tape buffers

2003-02-24 Thread Mike Simpson
Hi --

In doing some simple testing with my new AIT-3 based tape libraries, 
I note that they really seem to prefer larger block sizes, despite 
the admonition in Sony's manual to use 512 byte blocks for best 
performance.  Here's what I'm seeing using "dd" to write to the tape 
device (/dev/nst0) under RedHat Linux 7.3:

  bs= 5125.4 mb/s
 10246.0 mb/s
 20486.1 mb/s
 40966.7 mb/s

  bs= 32k7.5 mb/s
  64k7.3 mb/s
 128k   10.6 mb/s
 256k   10.2 mb/s

  bs= 2mb8.6 mb/s

The vendor-cited maximum native data transfer rate is 12 mb/s.

The "st" driver man page notes that it uses up to 16 x 128k internal 
buffers, which is why I tried 2mb for a top-end blocksize.  I've got 
the drive's dip switches set to manually disallow compression, and 
not let the driver change it.  The manual for the AIT-3 drive seems 
to indicate their are two buffers on the drive, an "interface" buffer 
(2 MB) and a "group" buffer (8 MB), although I'm not sure what each 
one is used for.

>From watching "iostat" during the test runs, the lower block sizes 
look like they're starving the drive, i.e. you can see the kb/s drop 
to zero every few seconds, then come back up to peak (11-12 mb/s) for 
5-6 seconds, then slide back to zero, etc.  At the higher blocksizes 
(128k, 256k) the drive pegs 11-12 mb/s and stays there.  For some 
reason 2mb doesn't work as well, I'm not sure why.

>From the Amanda docs and code, it looks like the Amanda taper
allocates "tapebufs" number of 32k buffers as shared memory.  Data
from the holding disk files gets pushed into the buffers, and then the
buffers flush to the tape device.  It also looks like there's a
"blocksize" config directive in the tapetypes section, but the man
page says that, "The minimum blocksize value is 32 KBytes.  The
maximum blocksize value is 32 KBytes." (?)

My question is this:  given the better performance under the higher
blocksizes, would increasing the buffer size from 32k up to something
like 128k give me better performance?  Can I obtain the same objective
(keeping the drive streaming) by just increasing "tapebufs" to some
fairly large number?

Thanks for any thoughts,

-mgs




Re: Who uses amanda?

2003-03-12 Thread Mike Simpson
Dr. Kirkby --

I support about thirty or so UNIX servers (Solaris, AIX, Linux) that 
represent the development, testing, and production environments for 
the electronic resources of the University of Wisconsin at Madison 
Libraries.  I started using Amanda 2.4.x about two years ago to do 
backups on these systems, originally using a collection of 
miscellaneous resources (hosts, holding disks, and tape drives) left 
over from previous backup strategies.

We're just now finishing a re-implementation of the whole system that
will give us 1 TB of capacity on our primary (and now, finally,
dedicated) Amanda host.  We use a modestly-priced Linux host, coupled
with 500 GB of IDE-to-SCSI holding disk, and five Overland LibraryPro
AIT-3 autoloaders.  We configure for at least fourteen days of mixed
full and incremental backups (level zeroes every three days), plus at
least six weeks of level zero offsites.  Cost of the entire system,
including tape media and three years of onsite next-day support for
all hardware, was $60k.  Doing it again, shaving support costs and
with recent pricing changes, I think I could probably do it for $50k
even.

By comparison, I did a three-year cost workup for buying backups on 
our central storage solution (Tivoli-based).  Even assuming low 
initial capacity, ramping up gradually to 1 TB across the three 
years, the cost was easily $200k+, for a system that (in my opinion) 
would be almost useless to us in a true disaster-recovery scenario, 
as opposed to occasional file restores when someone fumble-fingers an 
"rm -rf".

I have been told that I didn't allow for the cost of my time to
implement and babysit our system;  but from watching one of my
colleagues struggle with our Tivoli implementation, I'm not sure it
makes that much of a difference.  And that same colleague has now
approached me about buying some of the excess time and capacity on our
Amanda system, so we'll do some cost-recovery there as well.

-mgs




amrecover failure, corrupted gzip file?

2003-03-21 Thread Mike Simpson
Hi --

Running Amanda 2.4.4 servers and clients, using a RedHat 7.3 tape 
host, backing up using DUMP method (dump/restore) ext2 filesystems on 
a RedHat 7.2 client host:

I tried to do an amrecover on the /home filesystem (~8 GB), which
recovered all of the directories (as expected) and about 2/3's of the
actual data before terminating with a message asking if I wanted to 
change volumes (which I stupidly forgot to cut and paste, and can't 
find in any of my scrollback buffers, sorry).  Then prompted for 
aborting, then whether or not to dump core, the terminated.  Nothing 
particularly unusual in the amrecover debug file on the client side.

The corresponding amidxtaped debug file on the tape host side seemed 
to be running normally, then terminating on a gzip error:

  amidxtaped: time 10.959: Ready to execv amrestore with:
  path = /usr/local/sbin/amrestore
  argv[0] = "amrestore"
  argv[1] = "-p"
  argv[2] = "-h"
  argv[3] = "-l"
  argv[4] = "LSX-13"
  argv[5] = "-f"
  argv[6] = "2"
  argv[7] = "/dev/tapex"
  argv[8] = "^whisper$"
  argv[9] = "^/home$"
  argv[10] = "20030315"
  amrestore:   2: restoring whisper._home.20030315.0

  gzip: stdin: invalid compressed data--format violated
  Error 32 (Broken pipe) offset -2147483648+131072, wrote 0
  amrestore: pipe reader has quit in middle of file.
  amidxtaped: time 3606.244: amrestore terminated normally with status: 2
  amidxtaped: time 3606.244: rewinding tape ...
  amidxtaped: time 3623.767: done
  amidxtaped: time 3623.768: pid 5140 finish time Thu Mar 20 11:11:19 2003

I was able to recover the raw file from tape using dd, i.e.

  dd if=/dev/tapex of=./label-x bs=128k count=1

which recovered:

  AMANDA: TAPESTART DATE 20030315 TAPE LSX-13

Then:

  mt -f /dev/tapex asf 2
  dd if=/dev/tapex of=./label-2 bs=128k count=1

which recovered:

  AMANDA: FILE 20030315 whisper /home lev 0 comp .gz program /sbin/dump
  To restore, position tape at start of file and run:
  dd if= bs=128k skip=1 | /usr/bin/gzip -dc | sbin/restore -f... -

I did that, and was successful in recovering the file from tape:

  -rw-r--r--1 msimpson msimpson 2872049664 Mar 20 13:20 whisper_home_0.gz

I tried to do the pipe to "restore", with a failure similar to the 
above.  The gzip file looks like it's become corrupted:

  $ file whisper_home_0.gz
  whisper_home_0.gz: gzip compressed data, from Unix, max speed

  $ file -z whisper_home_0.gz
  whisper_home_0.gz: new-fs dump file (little endian), This dump Sat Mar 15 20:03:59 
2003, Previous dump Wed Dec 31 18:00:00 1969, Volume 1, Level zero, type: tape header, 
Label none, Filesystem /home, Device /dev/datavg/homelv, Host whisper.doit.wisc.edu, 
Flags 1 (gzip compressed data, from Unix, max speed)

_but_:

  $ gzip -l /projects/archives/whisper_home_0.gz
compresseduncompressed  ratio uncompressed_name
2872049664   0   0.0% whisper_home_0

and when I try to unzip it, even using the trick I found at 
www.gzip.org to avoid the 4gb file limit that's apparently a problem 
on some versions of gzip, I get the same error as in the debug file:

  $ gunzip < whisper_home_0.gz > whisper_home_0

  gunzip: stdin: invalid compressed data--format violated

  $ ls -l whisper_home*
  -rw-r--r--1 msimpson msimpson 2872049664 Mar 20 13:20 whisper_home_0.gz
  -rw-r--r--1 msimpson msimpson 6030524416 Mar 20 15:23 whisper_home_0

Any tips or tricks or other thoughts?  Is this the Linux dump/restore 
problem I've seen talked about on the mailing list?  I don't 
understand how the gzip file could be corrupted by a problem internal 
to the dump/restore cycle.

Thanks for any help,

-mgs




Re: amrecover failure, corrupted gzip file?

2003-03-28 Thread Mike Simpson
Hi --

> Any tips or tricks or other thoughts?  Is this the Linux dump/restore 
> problem I've seen talked about on the mailing list?  I don't 
> understand how the gzip file could be corrupted by a problem internal 
> to the dump/restore cycle.

Answering my own question after a week of testing ... I think I've 
discovered a bug in Amanda 2.4.4.  This is what I've deciphered:

(1) Restores of backup sets that compressed to < 1 gb worked fine.
Backup sets that, when compressed, were > 1 GB blew up every time
with gzip corruption error messages.  This was consistent across
OS's (Solaris 8, RedHat 7.x), filesystem types (ufs, vxfs, 
ext2/3), and backup modes (DUMP, GNUTAR).

(2) The gzip corruption message always occured at the same spot, i.e.

gzip: stdin: invalid compressed data--format violated
Error 32 (Broken pipe) offset 1073741824+131072, wrote 0

which is 1024^3 bytes + 128k.  I note that in my Amanda 
configuration, I had "chunksize" defined to "1 gbyte" and 
"blocksize" set to "128 kbytes" (the chunksize was just for
convenience, the blocksize seems to maximize my write 
performance).

(3) I used "dd" to retrieve one of the compressed images that was 
failing.  At the 1 gb mark in the file, the more-or-less random
bytes of the compressed stream were interrupted by exactly 32k of
zeroed bytes.  I note that 32k is Amanda's default blocksize.

(4) For last night's backups, I set "chunksize" to an arbitrarily
high number, to prevent chunking, which works fine in my setup
because I use one very large ext3 partition for all of my Amanda
holding disk, which nullifies concerns about filesystem size and
max file size.  The restores I've done this morning have all 
worked fine, including the ones that had previously shown the
corruption.

I'm not enough of a C coder to come up with a real patch to fix this. 
I'm hoping the above gives enough clues to let someone who _is_ a 
real C coder do so.

If this should be posted to the amanda-hackers list, please feel free 
to do so, or let me know and I'll do it.  Also, if any other 
information would be helpful, just ask.

Thanks,

-mgs