Re: amdump inconsistancy.
"John R. Jackson" wrote: Thanks John, I think you are absolutely right to question the dump program. ... It's 0.4b19 version ... Well, I'd be a lot happier if it was ancient so I could blame it :-). I've read in this list that there is a Linux dump 0.4b20 out there, so there might be hope for you, John :-) -- Regards Chris Karakas Dont waste your cpu time - crack rc5: http://www.distributed.net
Re: amdump inconsistancy.
Thanks John, I think you are absolutely right to question the dump program. The inconsistancy may be due to the dump command, but it's more likely that the processes it depends on are the main cause, and I don't know how to find out what these processes are. Anytime a filesystem failed to backup, the tapedrive seemed to be idle forever until the READ_TIMEOUT period lapsed, i.e no activity shown. That could be normal. Looking at the amdump.NN file, are the file systems that time out being done with PORT-DUMP (direct to tape) or FILE-DUMP (through the holding disk)? If they are going through the holding disk, it could be normal for the tape to be idle waiting on something to do. It's being done with PORT-DUMP as I don't use the holding disk. However, I did try to use the holding disk but the results was no different. What happens if you do something like this: /sbin/dump 0f - /dev/null / This is what I've got when it's done successfully: DUMP: Date of this level 0 dump: Tue Dec 12 16:02:22 2000 DUMP: Date of last level 0 dump: the epoch DUMP: Dumping /dev/sda1 (/) to standard output DUMP: Label: none DUMP: mapping (Pass I) [regular files] DUMP: mapping (Pass II) [directories] DUMP: estimated 1990332 tape blocks. DUMP: Volume 1 started at: Tue Dec 12 16:02:29 2000 DUMP: dumping (Pass III) [directories] DUMP: dumping (Pass IV) [regular files] DUMP: Volume 1 completed at: Tue Dec 12 16:05:27 2000 DUMP: Volume 1 took 0:02:58 DUMP: Volume 1 transfer rate: 12022 KB/s DUMP: 2139918 tape blocks (2089.76MB) DUMP: finished in 178 seconds, throughput 12022 KBytes/sec DUMP: Date of this level 0 dump: Tue Dec 12 16:02:22 2000 DUMP: Date this dump completed: Tue Dec 12 16:05:27 2000 DUMP: Average transfer rate: 12022 KB/s DUMP: DUMP IS DONE but when it wasn't successful, it got stucked after the "Pass IV" step, and was waiting for something indefinitely. or this: /sbin/dump 0f - / | rsh localhost "cat /dev/null" What version of dump are you using? You really, really, really want to get the latest stuff from sourceforge. It was reportedly pretty bad for a while but has gotten a good maintainer now and is much better. It's 0.4b19 version which came with RedHat 7.0 so I guess it's pretty current Clearly the dumps are getting started and moving some data, so something must be freezing up. I'm not sure how to track this down other than to try and catch it in the act and run gcore on the various programs and then a debugger to see what they are waiting on. Yes, it would be great to know exactly how to track it down. What the various programs are you talking about here John? John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED] Thanks, Hien.
Re: amdump inconsistancy.
Thanks John, I think you are absolutely right to question the dump program. ... It's 0.4b19 version ... Well, I'd be a lot happier if it was ancient so I could blame it :-). You're right that that seems pretty recent, so it may not be the culprit. The inconsistancy may be due to the dump command, but it's more likely that the processes it depends on are the main cause, and I don't know how to find out what these processes are. "ps" is your friend. Find all the "dump" processes, use "ps -fp PID" to display their parent, and keep working your way up the process tree (there may be some program that does this on your OS, but I don't know what it is). Draw a picture as you go. As I recall, there will be three+ dump processes at the lowest layer with no children. Above them will be one or more single dump processes. The parent of the last one will be sendbackup. Sendbackup may have other children (e.g. gzip). You do not need to go back any further than sendbackup -- it's the one doing the network I/O. Here's a sample using a program I have called "pstree": \-+- 01191 root /usr/sbin/inetd -s \-+- 02071 backup amandad \-+- 02072 backup /opt/amanda-2.4.2/libexec/sendbackup \-+- 05481 backup ufsdump 0sf 1048576 - /dev/rdsk/c0t0d0s0 \-+- 05488 backup ufsdump 0sf 1048576 - /dev/rdsk/c0t0d0s0 |--- 05489 backup ufsdump 0sf 1048576 - /dev/rdsk/c0t0d0s0 |--- 05490 backup ufsdump 0sf 1048576 - /dev/rdsk/c0t0d0s0 |--- 05492 backup ufsdump 0sf 1048576 - /dev/rdsk/c0t0d0s0 \--- 05491 backup ufsdump 0sf 1048576 - /dev/rdsk/c0t0d0s0 On the server side you need to find which dumper process is getting the data (look at the amdump file or use something like lsof) and then the two taper processes. Run truss or the equivalent on each process and see if it is making any progress. If none of them are, try gcore-ing the Amanda ones (sendbackup, dumper and the tapers) then run gdb on the binary and core file to see where they are stopped. If you didn't do so, you might want to rebuild Amanda first with -g and not -O (and maybe without shared libraries) so you can get good traceback data. What happens if you do something like this: /sbin/dump 0f - /dev/null / This is what I've got when it's done successfully: OK, that tells us it isn't something fatally flawed, a.k.a. obvious :-). It's probably some network congestion issue someplace. Sigh. Hien. John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]
Re: amdump inconsistancy.
Anytime a filesystem failed to backup, the tapedrive seemed to be idle forever until the READ_TIMEOUT period lapsed, i.e no activity shown. That could be normal. Looking at the amdump.NN file, are the file systems that time out being done with PORT-DUMP (direct to tape) or FILE-DUMP (through the holding disk)? If they are going through the holding disk, it could be normal for the tape to be idle waiting on something to do. What happens if you do something like this: /sbin/dump 0f - /dev/null / or this: /sbin/dump 0f - / | rsh localhost "cat /dev/null" What version of dump are you using? You really, really, really want to get the latest stuff from sourceforge. It was reportedly pretty bad for a while but has gotten a good maintainer now and is much better. These are some typical error from the log: ... Clearly the dumps are getting started and moving some data, so something must be freezing up. I'm not sure how to track this down other than to try and catch it in the act and run gcore on the various programs and then a debugger to see what they are waiting on. Hien Viet Lieu John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]