strange dump summary
Hi, I recently upgraded a system from RHL8 to RHL9 and am seeing the following errors from Amanda's mailed backup report. I'm not sure if it's Amanda or some of the underlying tools like dump(1) involved here: FAILURE AND STRANGE DUMP SUMMARY: localhost feff9f0 lev 0 ERROR [not in disklist] localhost feff9f0 lev 0 ERROR [not in disklist] localhost feff9f0 lev 0 ERROR [not in disklist] localhost feff9f0 lev 0 ERROR [not in disklist] localhost feff9f0 lev 0 ERROR [not in disklist] localhost feff9f0 lev 0 ERROR [not in disklist] localhost feff9f0 lev 0 ERROR [not in disklist] localhost feff9f0 lev 0 ERROR [not in disklist] The dumping and taping seems to proceed correctly in spite of these strange errors. Has anyone seen this before? Thanks, Ben pgp0.pgp Description: PGP signature
next tape selection?
Hi. How does Amanda decide which tape to use next -- such as when mentioning it in mailed backup reports? The next tape Amanda expects to use is .. I ask because I have labelled a dozen tapes and, after the first dump is complete, Amanda says The next tape Amanda expects to use is Daily12. Why isn't it asking for a new tape? Can someone help explain this, or perhaps explain the selection algorithm? Thanks, Ben
re-doing bad tapes
On Friday, amdump wrote my dump images to a tape successfully, but a subsequent amverify showed that the tape is defective. Now, my dumps have been removed from the holding disk, but I don't think the images on the tape are usable. Is there any way to re-do this backup onto a good tape or am I hosed? Ben
CD-RW taper
Has anyone successfully used the Tivano CD-RW taper replacement with Amanda? After installing it, I get fatal errors from amdump about Connection reset by peer. If anyone has it working, could you please contact me so that we can compare setups? Thanks, Ben
Re: Amanda Newbie
"Casey" == Casey Scott [EMAIL PROTECTED] writes: Casey Hi all. I am new to Amanda, so please forgive the newbie question. Casey How do I blank a tape? amlabel -f ? When I attempted my first real Casey backup, it finished sucessfully but was a mix of Level 0 and 1. Casey Shouldn't all be Level 0 on the first backup to a tape? Is it possible that you had done test runs beforehand that included some real level 0 dumps? There is an FAQ item on clearing Amanda's databases before starting with production runs. I followed these steps and my first backup included solely level 0s dumps. Ben
Re: Amanda Newbie
"Casey" == Casey Scott [EMAIL PROTECTED] writes: Casey Does doing amlabel -f reset the tape? I only have one tape right now, Casey and I have completed the procedure for flush out the database. I would Casey like the tape to be reset also You can relabel the tape if you want. Just to be sure, you can `rm' the tapelist file also when starting afresh. Ben
Re: start
"Alain" == Alain Muls [EMAIL PROTECTED] writes: Alain 2. how to do tests to check whether the configuration is Alain working? Once you've configured Amanda, you run `amcheck conf' to test the configuration. This is documented in the `amcheck' manual page. To *really* test the configuration, of course, you should do a few backups, verifies and try to recover some data from the tapes. Ben
Re: hostname lookup failed
"David" == David Lloyd [EMAIL PROTECTED] writes: David If that doesn't work, try dig 83.0.168.192.in-addr.arpa dig -x 192.168.0.83 is less of a mouthful. :-) Ben
Re: Backup time
"Olivier" == Olivier Collet [EMAIL PROTECTED] writes: Olivier I have a doubt. I launched the backup yesterday at 18:00 and today it is Olivier 10:35 and it is still not done! I have put 'only' three servers in this Olivier backup as a test. Is it normal it is so slow ? How can I tell the backup Olivier is still going on ? You can run `amstatus' (see `man amstatus' for details). You might be having the same problem I did as a new Amanda user -- I was backing up the holiding disk! Check to make sure this is not the case. Ben
Re: amandad failing
"John" == John Jolet [EMAIL PROTECTED] writes: John i have an intel-based server running solaris x86 8. inetd John causes it to listen on the amanda port, until you do an John amcheck or amdump or something on the server. then it stops John listening on that port. this is what is in the syslog: John poisonivy.austx.runt.theresidentclub.com inetd[182]: [ID John 667328 daemon.error] amanda/udp server failing (looping), John service terminated. We've reinstalled amanda, and are John reluctant to strip the machine and start over (especially John without a backup!). Any ideas what could be going wrong? What do you have in inetd.conf? I bet your `amandad' service is using the `nowait' option. Ben
Re: Don't open attachment!!!
"Jonathan" == Jonathan Dill [EMAIL PROTECTED] writes: Jonathan In case you haven't heard, don't open that Snow White attachment! I'll Jonathan send more details shortly so you know this isn't a hoax... I would have thought that by now, users would have been sufficiently slapped around the head with a wet fish to make your suggestion a given. Obviously not! :-( Ben
Re: PowerVault 100T
For anyone interested--and for the purposes of archiving for the next poor b*stard--I solved my problem with the Dell PowerVault 100T producing excessive soft errors with DDS4 media (but working fine with DDS, DSS2 and DDS3). It turns out to be a bug in the drive's firmware which is present in revision 8071 and is solved by upgrading to rev. 8130. Ben
Soft errors with PowerVault 100T
Apologies for the semi-off-topic posting, but this no doubt reaches many people with experience in tape drives. My Dell PowerVault 100T (an Archive Pyton) has developed a strange problem where it reports excessive soft errors for perfectly good, brand new DDS4 tapes (having now tried five in a row from a fresh box). This leads to a) terrible throughput as the drive farts around retrying and b) failure via I/O errors once it all gets a bit too much. It's also concerning that the LED is indicating this. The drive has been replaced once with no improvement. What could be the problem here? I've checked cabling, termination, tapes, etc. I'm running `mt setblk 32768; dd if=/dev/zero of=/dev/st0` and provoking the problem this way. Any pointers would be *much* appreciated! Cheers, Ben
Re: Soft errors with PowerVault 100T
Ben Apologies for the semi-off-topic posting, but this no doubt reaches Ben many people with experience in tape drives. My Dell PowerVault 100T Ben (an Archive Pyton) has developed a strange problem where it reports Ben excessive soft errors for perfectly good, brand new DDS4 tapes (having Ben now tried five in a row from a fresh box). I should add that I don't see these problems when using plain old DDS 90m tapes. I'm stumped! :-( Ben
forcing level 0 backups
Is there any easy way to force a level 0 backup of every disk in the disklist? Ben
Re: Need help...
"John" == John R Jackson [EMAIL PROTECTED] writes: John You must have some kind of hostname lookup problem, which is unrelated John to Amanda. Amanda tried to look up "master" and "localhost" and got John an error. Start checking your DNS configuration, /etc/resolv.conf, John /etc/hosts, YP/NIS/NIS+ maps, and whatever tells your system how to John do lookups. I wonder if Amanada needs a ``TROUBLESHOOTING'' file like that which appears in the Samba distribtion? A lot of these questions seem to be the same old, same old -- perhaps a document which walks you through the common problems is in order? Not that I have time to volunteer, unfortunately. Ben
Re: tweaking the schedule
"John" == John R Jackson [EMAIL PROTECTED] writes: John Planner generates the list of what to do based on size, requests for John special service (forced full dumps, etc) and so on. That's a text "file" John with one line per disk piped into driver. Does the planner consider the bandwidth that the tape server has to each client? If the estimation says "50 MB" and the link is 14 kbps, then it should be started first. :-) Ben
tweaking the schedule
I have a couple of machines that are slow and poorly connected to my tape server. In every instance, I happen to *know* that if these machines were to appear first in the backup schedule, the backup would finish more quickly. Is there a way I can push them to the front? Ben
Re: RedHat7.0 and Xinetd
"Sergio" == Sergio Pereira [EMAIL PROTECTED] writes: Sergio Hi folks, I'm trying to use AMANDA 2.4.2p1 but some Sergio problems are happening. So, first of all I need to know Sergio how can I call amandad, amindexd and amidxtaped on xinetd Here's my files, which work for me. Ben [/etc/xinetd.d/amanda] # default: on # description: The AMANDA backup client. service amanda { socket_type = dgram wait= yes user= amanda server = /usr/local/libexec/amandad } [/etc/xinetd.d/amandaidx] # default: on # description: The Amanda indexer. service amandaidx { socket_type = stream wait= no user= amanda server = /usr/local/libexec/amindexd } [/etc/xinetd.d/amidxtape] # default: on # description: The Amanda index/tape server. service amidxtape { socket_type = stream wait= no user= amanda server = /usr/local/libexec/amidxtaped }
slightly altering dump sizes?
Hi. I'm about to try and wedge a program into the AMANDA pipeline that will encrypt a dump image as it's being written to tape. Due to a couple of complications with the algorithm I'm using, the size of the file in the holding disk may be increased by up to 10 bytes when it's written to tape. Will this upset AMANDA? Ben
Re: Error msg interpretation
FAILURE AND STRANGE DUMP SUMMARY: mars sda7 lev 3 FAILED [data timeout] It means dumper on your server stopped getting data for 30 minutes (or whatever you set dtimeout to in amanda.conf). That's interesting, considering `mars' is the tape server. Ben
Re: Estimates
jrj wrote: When amstatus reports on estimates, are these supposed to be the estimates for the volume of data dumped at the specified dump level? ... Yes. They are the estimates for the level Amanda picked to be done. Every day when Amanda runs, I see the same huge estimates for disks that, once backed up, only had a few hundred KB of incrementals to do. You might poke through the amdump.NN files. The first part is the estimating phase and it's not too hard to see the results coming back. If those are really wrong, the next place I'd look is a typical /tmp/amanda/sendsize*debug file on a client and see what's going on in there. In the case of the full dump only two days into the cycle, it seems that dump on that workstation reported an identical size for levels 0 and 1, so of course, Amanda decided that level 1 was unwise and did a level 0. Today, it did a realistic level 1 of 700KB. Why should I be seeing this strange behaviour early on when Amanda has little backup history? Does it keep this backup history over many dump cycles? Ben
Hostname lookup failed?
I just got a note from amcheck about a problem with one of my client workstations: Amanda Backup Client Hosts Check ERROR: metro: [addr 203.24.38.228: hostname lookup failed] Client check: 5 hosts checked in 1.692 seconds, 1 problem found This is nonsense -- I can ping the host, so my resolver is not broken. What else can cause this? Thanks, Ben
Re: self check request timed out
jrj wrote: The only things I can think of to try at this point are a complete rebuild of Amanda (blow away all traces of the build area you used before), or upgrading to the latest gcc and building that for the specific host, or making sure you have all the latest Solaris patches. You might like to try building without any -O optimisation flags, just to be sure GDB isn't being confused by optimised code and reporting something altogether different when you print values. Ben
encryption
I see that it's possible to encrypt client/server traffic with Kerberos 4. Might it be possible to implement a simpler, less onerous security scheme such as the private key CAST-128 algorithm? I'd be prepared to have a shot at implementing it, if my patches have some chance of being accepted. :-) Ben
Re: Hostname lookup failed?
Is it possible that the error message could be improved? Sure. Do you have a suggestion? For starters, it should state that the error is coming from the client, not from amcheck on the tape server. Something like: "Client-side error: unable to perform reverse lookup on IP". Ben
Re: Problem with amverify
jrj wrote: (** Cannot do /usr/bin/gtar dumps) This says that GNU tar is installed in /usr/bin/gtar on the client, but since amverify runs on the server, it is apparantly not in that same location, so it could not run it. Version 2.4.2 tries to handle this better by looking for a basename match as well as a complete match, i.e. if the basename of the version of GNU tar on your server is "gtar", amverify would have used it. Why does it try to match the name at all? If you know the file on tape is a tar file, why not just run the designated tar command on the tape server? Ben
Re: Problem with amverify
I'll ask again, how do you do that in a shell script? And remember that it has to run on every flavor of Unix in the world, not just Linux or Solaris, so no cheating and using other than the most generic commands and options :-). I believe this is quite portable: dd bs=1 skip=257 count=5 if=/dev/tape of=/some/tmp/file if test `cat /some/tmp/file` = ustar ; then : This is a tar file. fi Ben
Re: Problem with amverify
I believe this is quite portable: dd bs=1 skip=257 count=5 if=/dev/tape of=/some/tmp/file if test `cat /some/tmp/file` = ustar ; then : This is a tar file. fi But wait... You just threw away the first 257 bytes from stdin. Now what are you going to put into stdout for tar to read? Err, rewind to the beginning of the file? ;-) Ben
110%?
amstatus says that one of my partitions has been dumped to 110%. Then it just sits there, never writing the dump to tape (there are no tapers running). scooby:sda5 0 127811k dumping 141056k (110.36%) What's up with this? I'm running CVS Amanda on the tape server now. Thanks, B.
Re: 110%?
It means the estimate (127811k) was wrong and the filesystem is larger than the estimate. It must be dumping something, look for process activity on your client and server. There are five concurrent dump processes running for /dev/sda5: 404 ?S 0:00 dump 0usf 1048576 - /dev/sda5 408 ?S 0:00 dump 0usf 1048576 - /dev/sda5 409 ?S 0:01 dump 0usf 1048576 - /dev/sda5 410 ?S 0:01 dump 0usf 1048576 - /dev/sda5 412 ?S 0:01 dump 0usf 1048576 - /dev/sda5 They don't seem to be doing much: [root@scooby /root]# strace -p 404 wait4(-1, unfinished ... [root@scooby /root]# strace -p 408 read(19, unfinished ... [root@scooby /root]# strace -p 409 pause( unfinished ... [root@scooby /root]# strace -p 410 write(1, "%\317\301[\4\364s\6\227\245\312\250\307-\354\247\2\225"..., 6144 unfinished ... [root@scooby /root]# strace -p 412 pause( unfinished ... Processes 408 and 410 *are* in read/write calls, but they're not doing much. Is something wedged, and if so, why? Ben
Holding incrementals on disk
Hi. You wrote in the Amanda FAQ: ``This can be wasteful, specially if you have a small amount of data to back up, but expensive large-capacity tapes. One possible approach is to run amdump with tapes only, say once a week, to perform full backups, and run it without tape on the other days, so that it performs incremental backups and stores them in the holding disk. Once or twice a week, you flush all backups in the holding disk to a single tape.'' Can you expand on this? What should I use for the cycle parameters in amanda.conf? How should I flush the incrementals in the holding disk to tape? Thanks, Ben
Never-ending dump
I've been running an `amdump' for about 16 hours now -- the following dump (a level 0, as it happens) keeps happening over and over again. When I reach 100%, it starts again! dublin:sda4 0 1878319k dumping 1679872k (89.43%) (8:08:27) This file system is too large for the holding disk, if that makes any difference. Any idea what's going on here? I've just had to abort the backup. Thanks, Ben
Re: Never-ending dump
cmarble wrote: Is this 2.4.2? If so, there's a known bug that we think is fixed in the latest CVS sources. If you can't or don't want to get them, there will be a 2.4.2p1 shortly, or ask me offline and I'll send you the patch. Will I only have to update my 2.4.2 installation on my Amanda server or will the clients need the new version too? According to John, server only (thank goodness!) Ben
HP-DAT.ps
I want to use this template for labels. I just tried printing the template (through Ghostscript) and get an empty PCL output file. Is there something weird with this PostScript file that I should know about? Ben
Ejecting tapes
Is there a way for Amanda to eject the tape at the end of a run so that I can simply remove it each day (and to indicate that the tape run has actually completed)? Or should I just run `mt offline' myself? Ben
Re: HP-DAT.ps
I want to use this template for labels. I just tried printing the template (through Ghostscript) and get an empty PCL output file. Is there something weird with this PostScript file that I should know about? You're supposed to add it to amanda.conf's lbl-templ, so that Amanda will print the tape label after each backup, with the complete list of each backup that made it to the tape. As it happens, when I did this and did a backup run, a correct label was printed. Without the list of backups, I wouldn't be surprised if you just got an empty page :-) I got *nothing*, yet in Ghostview, I could see the skeleton tape label. I don't know why I couldn't print that, but there's no problem after all. Thanks! Ben
Re: running as user amanda instead of operator?
oliva wrote: Amanda Backup Client Hosts Check ERROR: running as user "amanda" instead of "operator" Looks like it is the client that is complaining. Maybe xinetd is still running an older version of amandad, that wants to be started as user operator? xinetd is running /usr/local/libexec/amandad, which I just re-installed to be certain. It's from a build tree whose config.status reads: # ../amanda-2.4.2/configure --with-fqdn --with-user=amanda --with-group=disk Any other ideas? Ben
tar 1.12
The docs/INSTALL file says to use GNU tar 1.12 -- is there any reason why a later version, such as 1.13, could not be used (with the provided patches)? Ben
Reusing tapes
I am in the process of debugging my Amanda setup and wish to reuse the same tape over and over. Short of using `mt erase' to completely erase the tape, is there a way I can prevent Amanda from aborting because it thinks I'm overwriting a tape from the backup set? Thanks, Ben
Only one dumper running
I'm doing a test run with about 8 entries in my disklist and Amanda is only running one dumper. When I run `amstatus' I see: 3 dumpers idle : no-diskspace taper idle network free kps: 1984 holding space : 806496k ( 76.91%) What does `no-diskspace' mean here? I seem to have free network capacity, so why aren't up to 4 dumpers (as I've configured) running? Ben
Re: Only one dumper running
3 dumpers idle : no-diskspace A bit of simple arithmetic shows me that there is no room in the holding disk. :-) Sorry, Ben
Diagnosing client-side errors
I am trying to back up a single partition having just installed Amanda. I thought I'd try with this in my `disklist': scooby sda8 always-full I get the following error, but can't work out what I'm doing wrong. Any tips? In general, how can I diagnose client-side problems? Thanks, B. Amanda Backup Client Hosts Check ERROR: scooby: [can not access sda8 (sda8): No such file or directory] Client check: 1 host checked in 0.014 seconds, 1 problem found (brought to you by Amanda 2.4.2)
Re: RedHat 7.0
Anyone got any suggestions?, I am running Amanda v2.4.1 First, this should probably have gone to amanda-users, and not amanda-hackers, but that nit aside, my guess is the upgrade "upgraded" your /etc/inetd.conf file and you will need to put back the entries for amandad, etc. Red Hat Linux 7 uses `xinetd', so the format of the configuration file for the superserver has changed quite dramatically. You need to create a file in /etc/xinetd.d/ similar to the other files in that directory. It should be fairly self-explanatory. Cheers, Ben
Re: infamous amcheck selfcheck request timed out problem
Hi, This is what my /etc/inetd.conf looks like: amandadgram udp wait amanda /usr/local/libexec amandad This is incorrect. The second last argument is a full path to your server program, not just the directory. It should be /usr/local/libexec/amandad. Ben