Re: problem backing up a host with more than 171 disklist entries of root-tar
Hi, This sounds like a classic case of running out of file descriptors -- either on a per-process basis, or on a system-wide basis (more likely per-process, as you seem to be able to reproduce it at will with the same number of disklist entries on that host). probably not on a system-wide basis: (on that host) # cat /proc/sys/fs/file-max 4096 # cat /proc/sys/fs/file-nr 1071650 4096 Regards, Bernie
Re: problem backing up a host with more than 171 disklist entries of root-tar
Hi, Please give the following patch a try and let me know if it solves the problem. the problem persists: selfcheck checked the last 100 lines of the disklist. Yes, the patched amandad has been started: amandad: debug 1 pid 26880 ruid 37 euid 37 start time Sat May 12 12:06:57 2001 amandad: version 2.4.2p2 amandad: build: VERSION=Amanda-2.4.2p2 amandad:BUILT_DATE=Sat May 12 12:00:17 CEST 2001 Regards, Bernie
Re: missing files
John R. Jackson wrote: What shows an error occurred? ... By error I meant you didn't see all the entries in amrecover that you expected to see. There isn't anything in the sendbackup*debug file to indicate what went wrong (or even that anything did go wrong). The more useful place to look is the index file (see below). Thanks for this complete response but you really should have said so in the first place, esp'ly before firing off a bunch of (testy) questions. Anyway, the answer to your quandary is that I didn't install indexing. The INSTALL instructions say If you are going to use the indexing capabilities of Amanda, then add these to your inetd.conf ... amandaidx stream amidxtape stream ... amidxtaped (Section 2.1.E.) which indicates its installation and use is optional. Unlike designers of a certain piece of software, i believe in the the KISS principle and was trying to get a little backup working before making an even more complicated mess. I'm pretty sure i won't be using Amanda now, so yes, i am just testing at this point, but after what is by all appearances a wasted 2 weeks, i thought it'd be reasonable to pursue finding out why Amanda/tar skipped 6 of 10 items (counting empty directories) in a small, single directory backup. For what its worth, my suspicion is still that Amanda has a problem correctly creating and saving the tar --listed-incremental file in some situations, eg w/.files when indexing is off. I'm not prepared to postulate an infinite worlds hypothesis and test for all possibilites (or study source code) though, so i'm hoping someone has a better clue. george I am not cleaning out everything between tests. I don't know how to do that or what that means. Since you're having amrecover/index problems, the important thing to remove is all the index files related to the test. Run amgetconf config indexdir to see where the top of the index directory is, then cd to the host directory within there and then cd to the disk directory within that. Remove all the *.gz files. Then when you run another test you can be sure amrecover (amindexd) isn't see old data by mistake. You can zcat the most recent file Amanda creates to see what amrecover will have to work with. If you see all the files you expect to, but amrecover doesn't show them, then that's one kind of problem (with the index file itself or amrecover). If you don't see the files you expect, then that's a problem with the backup itself.
Re: problem backing up a host with more than 171 disklist entries of root-tar
the problem persists: selfcheck checked the last 100 lines of the disklist. Well, nuts. I was pretty sure that patch was involved. In the first letter you said: After adding one or more lines to the disklist file, only the last 100 lines get checked, then an amandad and a selfcheck process is hanging around: ... So the next step is to make sure amandad and sendsize were compiled with -g, get them hung and attach a debugger to them, then get a stack traceback (where) so we can see where they are stopped. If your OS has gcore, you might use it instead of attaching the debugger. That way, if there are other questions (e.g. what is in variable X), you'll be able to answer them right away without rerunning the test case. If you need more explicit instructions on attaching a debugger to a process or running gcore, just ask. You can either do this with the patch or without, but let us know which way it was so we can get line numbers matched up. Bernie John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]
Re: problem backing up a host with more than 171 disklist entries of root-tar
Hi, So the next step is to make sure amandad and sendsize were compiled with -g, get them hung and attach a debugger to them, then get a stack traceback (where) so we can see where they are stopped. I compiled the whole suite (without amandad.diff, but with make CFLAGS=-g) and copied client-src/.libs/{amandad,selfcheck} to /usr/libexec/amanda/. Using 172 disklist entries of type root-tar: $ ps x PID TTY STAT TIME COMMAND 26808 pts/2S 0:00 -bash 26840 ?S 0:00 amandad 26847 pts/1S 0:00 -bash 26842 ?S 0:00 /usr/libexec/amanda/selfcheck 26874 pts/1R 0:00 ps x $ gdb /usr/libexec/amanda/amandad 26840 GNU gdb 19991004 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-redhat-linux... /var/lib/amanda/26840: No such file or directory. Attaching to program: /usr/libexec/amanda/amandad, Pid 26840 Reading symbols from /usr/lib/amanda/libamanda-2.4.2p2.so...done. Reading symbols from /lib/libm.so.6...done. Reading symbols from /usr/lib/libreadline.so.3...done. Reading symbols from /lib/libtermcap.so.2...done. Reading symbols from /lib/libnsl.so.1...done. Reading symbols from /usr/lib/amanda/libamclient-2.4.2p2.so...done. Reading symbols from /lib/libc.so.6...done. Reading symbols from /lib/ld-linux.so.2...done. Reading symbols from /lib/libnss_files.so.2...done. 0x40136af4 in __libc_write () from /lib/libc.so.6 (gdb) where #0 0x40136af4 in __libc_write () from /lib/libc.so.6 #1 0x401801cc in ?? () from /lib/libc.so.6 #2 0x400a89cb in __libc_start_main (main=0x8048ff0 main, argc=1, argv=0xbe74, init=0x8048c34 _init, fini=0x804ad9c _fini, rtld_fini=0x4000aea0 _dl_fini, stack_end=0xbe6c) at ../sysdeps/generic/libc-start.c:92 (gdb) quit The program is running. Quit anyway (and detach it)? (y or n) y Detaching from program: /usr/libexec/amanda/amandad, Pid 26840 $ gdb /usr/libexec/amanda/selfcheck 26842 GNU gdb 19991004 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-redhat-linux... /var/lib/amanda/26842: No such file or directory. Attaching to program: /usr/libexec/amanda/selfcheck, Pid 26842 Reading symbols from /usr/lib/amanda/libamanda-2.4.2p2.so...done. Reading symbols from /lib/libm.so.6...done. Reading symbols from /usr/lib/libreadline.so.3...done. Reading symbols from /lib/libtermcap.so.2...done. Reading symbols from /lib/libnsl.so.1...done. Reading symbols from /usr/lib/amanda/libamclient-2.4.2p2.so...done. Reading symbols from /lib/libc.so.6...done. Reading symbols from /lib/ld-linux.so.2...done. Reading symbols from /lib/libnss_files.so.2...done. 0x40136af4 in __libc_write () from /lib/libc.so.6 (gdb) where #0 0x40136af4 in __libc_write () from /lib/libc.so.6 #1 0x401801cc in ?? () from /lib/libc.so.6 #2 0x400e68a4 in new_do_write (fp=0x4017e960, data=0x4002d000 access /home/User/info (/home/User/info): Permission denied]\nERROR [could not access /home/User/ilnu (/home/User/ilnu): Permission denied]\nERROR [could not access /home/User/ijk (/home/Us..., to_do=4096) at fileops.c:328 #3 0x400e6360 in _IO_new_do_write (fp=0x4017e960, data=0x4002d000 access /home/User/info (/home/User/info): Permission denied]\nERROR [could not access /home/User/ilnu (/home/User/ilnu): Permission denied]\nERROR [could not access /home/User/ijk (/home/Us..., to_do=4096) at fileops.c:301 #4 0x400e5a1e in _IO_new_file_overflow (f=0x4017e960, ch=-1) at fileops.c:441 #5 0x400e71a7 in __overflow (f=0x4017e960, ch=-1) at genops.c:197 #6 0x400e60a0 in _IO_new_file_xsputn (f=0x4017e960, data=0x804ce78, n=69) at fileops.c:803 #7 0x400d752c in _IO_vfprintf (s=0x4017e960, format=0x804a9b7 ERROR [%s]\n, ap=0xbe10) at vfprintf.c:1259 #8 0x400de050 in printf (format=0x804a9b7 ERROR [%s]\n) at printf.c:31 #9 0x804a016 in check_disk (program=0x804de30 GNUTAR, disk=0x804de37 /home/User/cn, level=0) at selfcheck.c:462 #10 0x8049380 in main (argc=1, argv=0xbf24) at selfcheck.c:157 (gdb) quit The program is running. Quit anyway (and detach it)? (y or n) y Detaching from program: /usr/libexec/amanda/selfcheck, Pid 26842
Re: problem backing up a host with more than 171 disklist entries of root-tar
I compiled the whole suite (without amandad.diff, but with make CFLAGS=-g) and copied client-src/.libs/{amandad,selfcheck} to /usr/libexec/amanda/. I forgot to mention that advfs.diff is applied.