Hello all,
I've been using amanda for, well, years now, and I have never run accross this problem before. I have one brand new host that is taking *WAY* too long to back up. Here's the skinny: It's a Sun SPARC Ultra-5 running Solaris-2.7 with amanda 2.4.2p2. There are a total of six machines that I installed, all on the same day, one of which has a nearly identical hardware setup. (The only difference is that this particular host has two hard drives instead of one.) All the other new machines send backups in reasonable ammounts of time -- only this one seems to be affected. (And, of course, over 20 old hosts running identical versions of the same software work just fine. :) I started a test run of amdump at 3 pm (15:00) yesterday: | $ amdump tiem According to amstatus, amanda happily started processing and began to dump this particular computer, named alnus, at 15:08:57. At 09:47:30 this morning amstatus reports that it has only gotten 1.24% through with 705 megs of data! I sat down and did some figuring. Dump has been running on this host for 18:21:27, and has only gotten 1.24% completed by sending 8992k of the dump back to the tape host. According to a little math, I figure that it's moving about 139 bytes/sec, and has about 61 days 16:26:36 left until it completes! To say that this is unacceptable is a minor understatement. I've been struggling with this problem for four days now, and all I can do is scratch my head and wonder. The next thing I did was run the dump by hand according to the command listed in alnus:/tmp/amanda/sendbackup.*.debug file: | $ time /bin/sh -c "ssh alnus '/usr/sbin/ufsdump 0sf 1048576 - \ | /dev/dsk/c0t0d0s0' | wc -c" This command should, if I understand correctly, open a channel to alnus (via ssh), run ufsdump, send the data back over the net to the tape host (where I issued the command), count the number of bytes sent, and give me a time readout of how long it took. Granted, it's not exactly how amanda does things, but it should give me an idea of how long it *should* take. Here's the output from that command: | DUMP: Writing 32 Kilobyte records | DUMP: Date of this level 0 dump: Thu Sep 27 10:23:45 2001 | DUMP: Date of last level 0 dump: the epoch | DUMP: Dumping /dev/rdsk/c0t0d0s0 (alnus:/) to standard output. | DUMP: Mapping (Pass I) [regular files] | DUMP: Mapping (Pass II) [directories] | DUMP: Estimated 1444816 blocks (705.48MB) on 0.01 tapes. | DUMP: Dumping (Pass III) [directories] | DUMP: Dumping (Pass IV) [regular files] | DUMP: 49.32% done, finished in 0:10 | DUMP: 1444798 blocks (705.47MB) on 1 volume at 601 KB/sec | DUMP: DUMP IS DONE | 739737600 | |real 20:10.9 |user 1:37.3 |sys 6.8 As I would have expected, it only took about 20 minutes to send 705 megs over the net back to the tape host. (Mind you, this is *while* the amanda test dump is still running on alnus!) The next thing I did was take a look at the dump program running on alnus. Solaris' ufsdump spawns about four children when it runs. Who knows what they're all doing, but I was intent on finding out. As it turns out, according to a process trace with truss, most of the time one child sits there and polls for more input, while the others sit sleeping. (Maybe that'll mean something to someone...?) Anyone have any ideas? Why should this work normally when run manually and then slow to a snail's pace when run by amanda? The first thing that occured to me is that I screwed something up during the initial installation. I used this machine as a prototype for setting up the others. When I did my initial setup, it was sitting in my office, using a different IP address on a different subnet. While sitting on my desk, everything worked fine -- including amanda. But when I moved it to it's home in another office amanda started acting up. Since all of my hosts are installed from the network automatically, the first thing I thought about was that I had screwed up something in the /etc networking config files, or in some state file somewhere, that must still have some lingering value from the old network stuck in it. So I reinstalled it from sratch, reformatting the hard drives and everything. Nothing doing. The next thing that sprang to mind is that there is something wrong with the system. I installed a kit from Sun that included the extra hard drive and a two-drive IDE cable, replacing the floppy with the hard drive+bracket and the existing one-drive IDE cable with the new two-drive cable. I thought perhaps there is something amiss with the system hardware, or the way that Solaris is accessing two IDE hard drives, that is causing this problem. But that doesn't explain why amanda ran perfectly while the machine was still sitting on my desk, nor does it explain why running a dump by hand now only takes 20 minutes. (The third thing to occur to me is that the machine is just plain posessed. I've run accross that one before, but it usually affects other systems than just amanda... :) And, if all else fails, reinstalling from scratch has always fixed the problem -- assuming that it's not a hardware failure that is.) Attached below are my config files and sendbackup.*.debug from alnus, if that helps anything. ---[ sendbackup.20010926150859.debug ]---------------------------------------- sendbackup: debug 1 pid 5026 ruid 0 euid 0 start time Wed Sep 26 15:08:59 2001 /usr/libexec/sendbackup: version 2.4.2p2 sendbackup: got input request: DUMP / 0 1970:1:1:0:0:0 OPTIONS |;bsd-auth;index; parsed request as: program `DUMP' disk `/' lev 0 since 1970:1:1:0:0:0 opt `|;bsd-auth;index;' sendbackup: try_socksize: send buffer size is 65536 sendbackup: stream_server: waiting for connection: 0.0.0.0.877 sendbackup: stream_server: waiting for connection: 0.0.0.0.878 sendbackup: stream_server: waiting for connection: 0.0.0.0.879 waiting for connect on 877, then 878, then 879 sendbackup: stream_accept: connection from 160.36.46.151.819 sendbackup: stream_accept: connection from 160.36.46.151.820 sendbackup: stream_accept: connection from 160.36.46.151.821 got all connections sendbackup: spawning /usr/sbin/ufsdump in pipeline sendbackup: argument list: dump 0usf 1048576 - /dev/rdsk/c0t0d0s0 sendbackup: started index creator: "/usr/sbin/ufsrestore -tvf - 2>&1 | sed -e ' s/^leaf[ ]*[0-9]*[ ]*\.// t /^dir[ ]/ { s/^dir[ ]*[0-9]*[ ]*\.// s%$%/% t } d '" ------------------------------------------------------------------------------ ---[ amanda.conf ]------------------------------------------------------------ # # AMANDA.CONF # ############################################################################## ############################################################################## # # Organization Configuration # ========================== # # org "<name>" - your organization name for reports # # mailto "<user>" - space-separated list of operators at your # site to mail reports to. # # dumpuser "<user>" - The username to run dumps under. # org "tiem" mailto "root" dumpuser "root" ############################################################################## # # Server Setup Options # ==================== # # # Networking Options: # # inparallel <n> - Run a maximum of n dumps simultaneously. # # netusage <n> Kbps - Maximum network bandwidth to use. # inparallel 8 netusage 10000 Kbps # # Tape Options: # # dumpcycle <n> [days|weeks] # - Number of days in a normal dump cycle. # # runspercycle <n> - The number of times amdump will be run in a # single dumpcycle. # # tapecycle <n> tapes - The number of tapes in rotation. # # runtapes <n> - The number of tapes to use in a single run. # # Ex: The following configuration will tell Amanda that: # (a) You run Amanda 5 times a week # (b) You want to use two tapes every time you run # (c) You have enough tapes to last you two months before you want # Amanda to start over with the first tape again, and # (d) You expect to get a full level-zero backup of everything once each # week. # # dumpcycle 7 days # runspercycle 5 # tapecycle 80 tapes # runtapes 2 dumpcycle 7 days runspercycle 6 tapecycle 80 tapes runtapes 2 # # Backup Level Options: # # bumpsize <n> Mb - The minimum savings threshold, in megabytes, # before Amanda bumps a dump up to the next # level. # # bumpdays <n> - The minimum days that a partition is to # spend at a level before Amanda bumps it to # the next level. # # bumpmult <n> - threshold = bumpsize * (level-1)**bumpmult # # etimeout <n> - number of seconds per filesystem for # estimates. # # dtimeout <n> - number of idle seconds before dump is # aborted. # # tapebufs <n> - how many 32k buffers to allocate. # bumpsize 20 Mb bumpdays 1 bumpmult 4 etimeout 300 dtimeout 1800 tapebufs 20 # # Tape Device Options: # # Specify tape device and/or tape changer. If you don't have a tape # changer, and you don't want to use more than one tape per run of # amdump, just comment out the definition of tpchanger. # # Some tape changers require tapedev to be defined; others will use # their own tape device selection mechanism. Some use a separate tape # changer device (changerdev), others will simply ignore this parameter. # Some rely on a configuration file (changerfile) to obtain more # information about tape devices, number of slots, etc; others just need # to store some data in files, whose names will start with changerfile. # For more information about individual tape changers, read # docs/TAPE.CHANGERS. # # At most one changerfile entry must be defined; select the most # appropriate one for your configuration. If you select man-changer, # keep the first one; if you decide not to use a tape changer, you may # comment them all out. # # The following parameters are only needed if you are using a tape # changer: # # tpchanger "<filename>" - The name of the tape-changer script # # changerfile "<path>/<filename>" # - Tape changer file (see paragraph two above) # # changerdev "<device>" - The device name used by the tape-changer # script to control the tape library or # autoloader device. # # tapedev "<device>" - The device name used by taper to access the # tape drive. (May be different from the # device used to change the tapes.) # # tapetype <configuration name> # - The name of the tape configuration to use. # (Tape configuration is defined below.) # # labelstr "<regex>" - A regular expression that matches all # labels, of all Amanda tapes associated with # this configuration. # tpchanger "chg-scsi" #changerfile "/var/adm/amanda/tiem/changer-status" changerfile "/etc/amanda/tiem/chg-scsi.conf" #changerdev "/dev/nrst29" # Drive with compression tapedev "0" # Drive with compression # changerdev "/dev/nrst5" # Drive without compression # tapedev "/dev/nrst5" # Drive without compression #tapetype TIEM-DAT-WITH-COMPRESSION tapetype TIEM-DAT-WITH-COMPRESSION # Syntax: VOL-[2-digit week number]-[single-character tape letter] labelstr "^VOL-[0-9][0-9]-[A-Z]$" # # Holding Disk Options: # # Specify holding disks. These are used as a temporary staging area for # dumps before they are written to tape and are recommended for most # sites. The advantages include: tape drive is more likely to operate # in streaming mode (which reduces tape and drive wear, reduces total # dump time); multiple dumps can be done in parallel (which can # dramatically reduce total dump time. The main disadvantage is that # dumps on the holding disk need to be flushed (with amflush) to tape # after an operating system crash or a tape failure. If no holding # disks are specified then all dumps will be written directly to tape. # If a dump is too big to fit on the holding disk than it will be # written directly to tape. If more than one holding disk is specified # then they will all be used round-robin. # # holdingdisk <holding-disk-configuration-name> { # [comment "<string>"] # directory "<path>" # [use <n> <Kb|Mb|Gb>] # [chunksize <n> <Kb|Mb|Gb>] # } # # where: # # comment "<string>" - Optional comment # # directory "<path>" - The directory where Amanda will place it's # temporary files. # # use <n> <Kb|Mb|Gb> - The ammount of space Amanda is allowed to # use. # # chunksize <n> <Kb|Mb|Gb> # - Large dumps may be broken into smaller # chunks to be held on the holding disk. This # parameter specifies the size of the chunks # that you want Amanda to break a large dump # into. # # n > 0 = split chunks into sizes of n # 0 = split chunks into INT_MAX chunks # n < 0 = don't split at all, dump large # filesystems directly to tape # # If Amanda cannot find a tape on which to store backups, it will run as # many backups as it can and store the results to the holding disk(s). # Under this scenario, Amanda will switch to running only incremental # backups in order to save space. this is reffered to by Amanda as # "degraded mode" backups. # # By default, Amanda will use 100% of the holding disks for these # degraded mode backups. However, the user can specify a percentage of # the holding space to use. The remaining space will be used by Amanda # to store as many regular (non-degraded mode) backups as it can fit, # saving the reserved portion for degraded-only backups. # # reserve <n> - Percentage of holding disks to reserve for # degraded mode backups. # # # Main holding disk -- we can allocate the entire drive just for Amanda, if we # want. holdingdisk apis_feldspar_homes_amanda { comment "Main holding disk" directory "/feldspar/homes/amanda/tiem" # use 8192 Mb # chunksize -1 } # # Secondary holding disk -- there is other data that also resides on this # drive, so we can't count on being able to use the whole thing. holdingdisk apis_export_amanda { comment "Secondary holding disk (partition is shared with other data)" directory "/export/amanda/tiem" # use 1800 Mb # chunksize -1 } reserve 30 # # Tape Index Options: # # Amanda needs a few Mb of diskspace for the log and debug files, as # well as a database. This stuff can grow large, so the conf directory # isn't usually appropriate. Some sites use /usr/local/var and some # /usr/adm. Create an amanda directory under there. You need a # separate infofile and logdir for each configuration, so create # subdirectories for each conf and put the files there. Specify the # locations below. # # infofile "<path>" - Pathname to the database. # # logdir "<path>" - Pathname to the log files. # # indexdir "<path>" - Pathname to the database index directory. # (Holds a list of the files that have been # backed up for each disk entry in the # database.) # infofile "/var/adm/amanda/tiem/curinfo" # database filename logdir "/var/adm/amanda/tiem" # log directory indexdir "/var/adm/amanda/tiem/index" # index directory # # Tape Configuration Options: # (See Tape Device Options above, parameter "tapetype") # # Define the type of tape you use here, and use it in "tapetype" above. # Some typical types of tapes are included here. The tapetype tells # amanda how many MB will fit on the tape, how big the filemarks are, # and how fast the tape device is. # # There's a script by Icarus Sparry <[EMAIL PROTECTED]> that may help # calculating the size of a tape. Get it from the URL # http://www.amanda.org/amanda-users/users/Oct-Dec.1995/msg00208.html # # For completeness Amanda should calculate the inter-record gaps too, # but it doesn't. For EXABYTE and DAT tapes this is ok. Anyone using 9 # tracks for amanda and need IRG calculations? Drop me a note if so. # define tapetype TIEM-DAT { comment "TIEM-DAT tape drives" # # How much can we fit on one tape? # # Theoretical max: #length 12297 mbytes # Theoretical max. # Average max: #length 10445 mbytes length 10000 mbytes #filemark 100 kbytes speed 1024 kbytes } define tapetype TIEM-DAT-WITH-COMPRESSION { comment "TIEM-DAT tape drives" length 20890 mbytes # these numbers are not accurate #filemark 100 kbytes # but you get the idea speed 1024 kbytes } # # Dump Configuration Options: # # Dump configurations are used in the disklist file to specify how to # back up a particular disk, and any optional handling of that backup # data. For instance, you can specify to back up one disk on a client # using DUMP, and another disk on a client using GNUTAR. Once you have # that set, you can specify whether or not to encrypt that backup data # before sending it over the network from the client to the server. And # once the data's on the server, you can specify whether or not to # compress the data before sending it to the tape drive. # # define dumptype <dumptype-configuration-name> { # [comment "<string>"] # [auth <bsd|krb4>] # [comprate <x> [<y>]] # } # # These are referred to by the disklist file. The dumptype specifies # certain parameters for dumping including: # auth - authentication scheme to use between server and client. # Valid values are "bsd" and "krb4". Default: [auth bsd] # comment - just a comment string # comprate - set default compression rate. Should be followed by one or # two numbers, optionally separated by a comma. The 1st is # the full compression rate; the 2nd is the incremental rate. # If the second is omitted, it is assumed equal to the first. # The numbers represent the amount of the original file the # compressed file is expected to take up. # Default: [comprate 0.50, 0.50] # compress - specify compression of the backed up data. Valid values are: # "none" - don't compress the dump output. # "client best" - compress on the client using the best (and # probably slowest) algorithm. # "client fast" - compress on the client using fast algorithm. # "server best" - compress on the tape host using the best (and # probably slowest) algorithm. # "server fast" - compress on the tape host using a fast # algorithm. This may be useful when a fast # tape host is backing up slow clients. # Default: [compress client fast] # dumpcycle - set the number of days in the dump cycle, ie, set how often a # full dump should be performed. Default: from DUMPCYCLE above # exclude - specify files and directories to be excluded from the dump. # Currently only useful with gnutar. Valid values are: # "regular expression" - an re defining which files to exclude # list "filename" - a file (on the client!) containing # re's (1 per line) defining which files # to exclude. # Default: include all files # holdingdisk - should the holding disk be used for this dump. Useful for # dumping the holding disk itself. Default: [holdingdisk yes] # ignore - do not back this filesystem up. Useful for sharing a single # disklist in several configurations. # index - keep an index of the files backed up. Default: [index no] # kencrypt - encrypt the data stream between the client and server. # Default: [kencrypt no] # maxdumps - max number of concurrent dumps to run on the client. # Default: [maxdumps 1] # priority - priority level of the dump. Valid levels are "low", "medium" # or "high". These are really only used when Amanda has no # tape to write to because of some error. In that "degraded # mode", as many incrementals as will fit on the holding disk # are done, higher priority first, to insure the important # disks are at least dumped. Default: [priority medium] # program - specify the dump system to use. Valid values are "DUMP" and # "GNUTAR". Default: [program "DUMP"]. # record - record the dump in /etc/dumpdates. Default: [record yes] # skip-full - skip the disk when a level 0 is due, to allow full backups # outside Amanda, eg when the machine is in single-user mode. # skip-incr - skip the disk when the level 0 is NOT due. This is used in # archive configurations, where only full dumps are done and # the tapes saved. # starttime - delay the start of the dump? Default: no delay # strategy - set the dump strategy. Valid strategies are currently: # "standard" - the standard one. # "nofull" - do level 1 dumps every time. This can be used, # for example, for small root filesystems that # only change slightly relative to a site-wide # prototype. Amanda then backs up just the # changes. # "noinc" - do level 0 dumps every time. # "skip" - skip all dumps. Useful for sharing a single # disklist in several configurations. # Default: [strategy standard] # # Note that you may specify previously defined dumptypes as a shorthand way # of defining parameters. define dumptype home { comment "Home area" #compress client fast #compress client best #compress server best compress none priority high index yes } define dumptype system { comment "System area" #compress client fast #compress client best #compress server best compress none priority low index yes } # # Network Options: # # These are referred to by the disklist file. They define the attributes # of the network interface that the remote machine is accessed through. # Notes: - netusage above defines the attributes that are used when the # disklist entry doesn't specify otherwise. # - the values below are only samples. # - specifying an interface does not force the traffic to pass # through that interface. Your OS routing tables do that. This # is just a mechanism to stop Amanda trashing your network. # Attributes are: # use - bandwidth we can consume. Note that this is # not guaranteed; it is only a guideline that # Amanda will try to keep within. define interface local { comment "a local disk" use 1000 kbps } define interface le0 { comment "10 Mbps ethernet" use 600 kbps } ------------------------------------------------------------------------------ ---[ chg-scsi.conf ]---------------------------------------------------------- number_configs 1 eject 0 # Tapedrives need an eject command sleep 10 # Seconds to wait until the tape gets ready cleanmax 10 # How many times could a cleaning tape get used changerdev /dev/nrst29 # # Next comes the data for drive 0 # config 0 drivenum 0 dev /dev/nrst29 # the device that is used for the tapedrive 0 (BSD type, no rewind, no compression) startuse 0 # The slots associated with the drive 0 enduse 5 # statfile /var/adm/amanda/tiem/changer-status # The file where the actual slot is stored cleancart 0 # the slot where the cleaningcartridge for drive 0 is located cleanfile /var/adm/amanda/tiem/clean-status # The file where the cleanings are recorded usagecount /var/adm/amanda/tiem/totaltime-status tapestatus /var/adm/amanda/tiem/tape-status #labelfile /etc/amanda/tiem/labelfile # Use this if you have an barcode reader ------------------------------------------------------------------------------ ---[ disklist ]--------------------------------------------------------------- ############################################################################### # # Only back up: # (a) Home areas # (b) Directories that contain user-specific information, like: # /usr/spool/calendar # /usr/spool/cron # (c) Directories that contain information that changes over time # (except log files) # (d) Any OS directories that are not auto-installed # ############################################################################### # # Fri Jul 28 11:14:22 EDT 2000 # ---------------------------- # # NOTE: This doesn't work very well: # # <hostname> /usr/spool/calendar home # <hostname> /usr/spool/cron home # # The problem is that /usr/spool/calendar only exists if a user has run # /usr/openwin/bin/cm sometime during the machine's lifetime. If we use an # entry like the ones listed above then amanda will have a cow when it # encounters a machine that has no /usr/spool/calendar. # # While amanda still runs and backs up, I don't want to have to sort through # the error messages. Ideally, I just want to be able to glance at a report, # see if there are errors or not, and then just move on if there are none. # # There are two solutions that come to mind. One is to simply back up all of # /usr/spool, even though this will get things like lock files and print # queues -- transitory items that we will not want to back up. The second # solution is to back up /usr/spool with a new backup definition that excludes # all the lock and queue directories. Maybe create a "spool" definition in # addition to "home" and "system"? # # Defining an exclude list is only useful with gnutar -- at least with the # current version of Amanda. At this writing I am using ufsdump on all # Solaris hosts, and for the sake of all things anal-retentive I don't want to # start mixing my backup programs under Solaris. I like things to be # homogeneous. So I think that for now I'm going to go with the idea of # backing up all of /usr/spool. It's small, and space is cheap (at the # moment, anyway). # # I think that the best way to handle it is to treat /usr/spool/ with "home" # priority instead of "system" priority. This means that /usr/spool will get # a higher priority than any other system directory, but it'll ensure that we # get more frequent backups of /usr/spool/calendar for users like Dr. Hallam, # who have formed a symbiotic dependency on programs like OpenWindows' # calendar manager. (Have I mentioned yet how stupid I think that it is to # store user data in /usr instead of in their home areas?) # # -mp # # acer / system # acer /usr system # acer /usr/openwin system acer /usr/spool home # # Alces is a research symetric multi-parallel machine. It has 512GB of hard # drive space -- there is no way we could back it all up even if we wanted # to. So instead we're backing up the system and that's it. There are a # few special home areas locates in /export, so we're backing it up too. # I don't think that it is necessary to back up /usr/local+/src, since it's # only a mirror image of what's on ecology. # alces / system alces /usr system #alces /usr/local system #alces /usr/local/src system alces /export system # # Alnus is a slave server like apis. # alnus / system alnus /export system alnus /alnus/homes/a home alnus /alnus/homes/b home # # Apis was installed by hand, not auto-installed. Back up entire system # apis / system apis /usr system apis /var system apis /apis/homes home apis /database/homes home apis /export system apis /usr/local system apis /usr/local/src system balaena /usr/spool home balaena /balaena/homes home bison /usr/spool home bison /bison/homes home bison /clay/homes/a home bison /clay/homes/b home bufo /usr/spool home diatom /usr/spool home ecoli /usr/spool home fragaria /usr/spool home fragaria /fragaria/homes home omykiss /usr/spool home # # Raz is a pseudo-slave server... Do we need to do any more than this? # #raz / system #raz /usr system #raz /usr/openwin system raz /usr/spool home raz /raz/homes/a home raz /raz/homes/b home salmo /usr/spool home equus /usr/spool home equus /chalk/homes/a home equus /chalk/homes/b home equus /flint/homes/a home equus /flint/homes/b home equus /granite/homes/a home equus /granite/homes/b home equus /lava/homes/a home equus /lava/homes/b home equus /opal/homes/a home equus /opal/homes/b home equus /quartz/homes/a home equus /quartz/homes/b home equus /slate/homes home # # Pinky was installed by hand, it is a linux box # pinky / system pinky /pinky/homes home # # Ecology, our main master server -- back up everything! # ecology / system ecology /usr/local system ecology /usr/local/src system ecology /ecology/homes/a home ecology /ecology/homes/b home ecology /ecology/homes/c home gulo /usr/spool home gulo /gulo/homes home gulo /diamond/homes home buki /usr/spool home buki /buki/homes home ulmus /usr/spool home ulmus /ulmus/homes home ursus /usr/spool home ursus /ursus/homes home vitis /usr/spool home vitis /vitis/homes home vedi /usr/spool home vedi /vedi/homes/a home vedi /vedi/homes/b home vedi /vedi/homes/c home dama /usr/spool home dama /dama/homes/a home dama /dama/homes/b home dama /dama/homes/c home grus /usr/spool home grus /grus/homes home ovis /usr/spool home ovis /ovis/homes/a home ovis /ovis/homes/b home ovis /ovis/homes/c home falco /usr/spool home falco /falco/homes/a home falco /falco/homes/b home falco /falco/homes/c home vireo /usr/spool home vireo /vireo/homes/a home vireo /vireo/homes/b home vireo /vireo/homes/c home ------------------------------------------------------------------------------