Re: problem backing up a host with more than 171 disklist entries of root-tar

2001-05-12 Thread Bernhard R. Erdmann

Hi,

 This sounds like a classic case of running out of file descriptors --
 either on a per-process basis, or on a system-wide basis (more likely
 per-process, as you seem to be able to reproduce it at will with the
 same number of disklist entries on that host).

probably not on a system-wide basis:
(on that host)
# cat /proc/sys/fs/file-max 
4096
# cat /proc/sys/fs/file-nr 
1071650 4096

Regards,
Bernie



Re: problem backing up a host with more than 171 disklist entries of root-tar

2001-05-12 Thread Bernhard R. Erdmann

Hi,

 Please give the following patch a try and let me know if it solves the
 problem.

the problem persists: selfcheck checked the last 100 lines of the
disklist.

Yes, the patched amandad has been started:

amandad: debug 1 pid 26880 ruid 37 euid 37 start time Sat May 12
12:06:57 2001
amandad: version 2.4.2p2
amandad: build: VERSION=Amanda-2.4.2p2
amandad:BUILT_DATE=Sat May 12 12:00:17 CEST 2001

Regards,
Bernie



Re: missing files

2001-05-12 Thread George Herson

John R. Jackson wrote:
 
 What shows an error occurred?  ...
 
 By error I meant you didn't see all the entries in amrecover that you
 expected to see.  There isn't anything in the sendbackup*debug file to
 indicate what went wrong (or even that anything did go wrong).  The more
 useful place to look is the index file (see below).

Thanks for this complete response but you really should have said so in
the first place, esp'ly before firing off a bunch of (testy)
questions.   Anyway, the answer to your quandary is that I didn't
install indexing.  The INSTALL instructions say If you are going to use
the indexing capabilities of Amanda, then add these to your inetd.conf
... amandaidx stream  amidxtape stream ... amidxtaped (Section
2.1.E.) which indicates its installation and use is optional.  Unlike
designers of a certain piece of software, i believe in the the KISS
principle and was trying to get a little backup working before making an
even more complicated mess.

I'm pretty sure i won't be using Amanda now, so yes, i am just testing
at this point, but after what is by all appearances a wasted 2 weeks, i
thought it'd be reasonable to pursue finding out why Amanda/tar skipped
6 of 10 items (counting empty directories) in a small, single directory
backup.  For what its worth, my suspicion is still that Amanda has a
problem correctly creating and saving the tar --listed-incremental file
in some situations, eg w/.files when indexing is off.  I'm not prepared
to postulate an infinite worlds hypothesis and test for all possibilites
(or study source code) though, so i'm hoping someone has a better clue.

george

 I am not cleaning out everything between tests.  I don't know how to do
 that or what that means.
 
 Since you're having amrecover/index problems, the important thing to
 remove is all the index files related to the test.  Run amgetconf
 config indexdir to see where the top of the index directory is, then
 cd to the host directory within there and then cd to the disk directory
 within that.  Remove all the *.gz files.  Then when you run another test
 you can be sure amrecover (amindexd) isn't see old data by mistake.
 
 You can zcat the most recent file Amanda creates to see what amrecover
 will have to work with.  If you see all the files you expect to, but
 amrecover doesn't show them, then that's one kind of problem (with the
 index file itself or amrecover).  If you don't see the files you expect,
 then that's a problem with the backup itself.




Re: problem backing up a host with more than 171 disklist entries of root-tar

2001-05-12 Thread John R. Jackson

the problem persists: selfcheck checked the last 100 lines of the
disklist.

Well, nuts.  I was pretty sure that patch was involved.

In the first letter you said:

  After adding one or more lines to the disklist file, only the last 100
  lines get checked, then an amandad and a selfcheck process is hanging
  around: ...

So the next step is to make sure amandad and sendsize were compiled
with -g, get them hung and attach a debugger to them, then get a stack
traceback (where) so we can see where they are stopped.

If your OS has gcore, you might use it instead of attaching the debugger.
That way, if there are other questions (e.g. what is in variable X),
you'll be able to answer them right away without rerunning the test case.

If you need more explicit instructions on attaching a debugger to a
process or running gcore, just ask.

You can either do this with the patch or without, but let us know which
way it was so we can get line numbers matched up.

Bernie

John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]



Re: problem backing up a host with more than 171 disklist entries of root-tar

2001-05-12 Thread Bernhard R. Erdmann

Hi,

 So the next step is to make sure amandad and sendsize were compiled
 with -g, get them hung and attach a debugger to them, then get a stack
 traceback (where) so we can see where they are stopped.

I compiled the whole suite (without amandad.diff, but with make
CFLAGS=-g) and copied client-src/.libs/{amandad,selfcheck} to
/usr/libexec/amanda/.

Using 172 disklist entries of type root-tar:

$ ps x
  PID TTY  STAT   TIME COMMAND
26808 pts/2S  0:00 -bash
26840 ?S  0:00 amandad
26847 pts/1S  0:00 -bash
26842 ?S  0:00 /usr/libexec/amanda/selfcheck
26874 pts/1R  0:00 ps x

$ gdb /usr/libexec/amanda/amandad 26840
GNU gdb 19991004
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for
details.
This GDB was configured as i386-redhat-linux...

/var/lib/amanda/26840: No such file or directory.
Attaching to program: /usr/libexec/amanda/amandad, Pid 26840
Reading symbols from /usr/lib/amanda/libamanda-2.4.2p2.so...done.
Reading symbols from /lib/libm.so.6...done.
Reading symbols from /usr/lib/libreadline.so.3...done.
Reading symbols from /lib/libtermcap.so.2...done.
Reading symbols from /lib/libnsl.so.1...done.
Reading symbols from /usr/lib/amanda/libamclient-2.4.2p2.so...done.
Reading symbols from /lib/libc.so.6...done.
Reading symbols from /lib/ld-linux.so.2...done.
Reading symbols from /lib/libnss_files.so.2...done.
0x40136af4 in __libc_write () from /lib/libc.so.6
(gdb) where
#0  0x40136af4 in __libc_write () from /lib/libc.so.6
#1  0x401801cc in ?? () from /lib/libc.so.6
#2  0x400a89cb in __libc_start_main (main=0x8048ff0 main, argc=1, 
argv=0xbe74, init=0x8048c34 _init, fini=0x804ad9c _fini, 
rtld_fini=0x4000aea0 _dl_fini, stack_end=0xbe6c)
at ../sysdeps/generic/libc-start.c:92
(gdb) quit
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program: /usr/libexec/amanda/amandad, Pid 26840

$ gdb /usr/libexec/amanda/selfcheck 26842
GNU gdb 19991004
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for
details.
This GDB was configured as i386-redhat-linux...

/var/lib/amanda/26842: No such file or directory.
Attaching to program: /usr/libexec/amanda/selfcheck, Pid 26842
Reading symbols from /usr/lib/amanda/libamanda-2.4.2p2.so...done.
Reading symbols from /lib/libm.so.6...done.
Reading symbols from /usr/lib/libreadline.so.3...done.
Reading symbols from /lib/libtermcap.so.2...done.
Reading symbols from /lib/libnsl.so.1...done.
Reading symbols from /usr/lib/amanda/libamclient-2.4.2p2.so...done.
Reading symbols from /lib/libc.so.6...done.
Reading symbols from /lib/ld-linux.so.2...done.
Reading symbols from /lib/libnss_files.so.2...done.
0x40136af4 in __libc_write () from /lib/libc.so.6
(gdb) where
#0  0x40136af4 in __libc_write () from /lib/libc.so.6
#1  0x401801cc in ?? () from /lib/libc.so.6
#2  0x400e68a4 in new_do_write (fp=0x4017e960, 
data=0x4002d000  access /home/User/info (/home/User/info):
Permission denied]\nERROR [could not access /home/User/ilnu
(/home/User/ilnu): Permission denied]\nERROR [could not access
/home/User/ijk (/home/Us..., 
to_do=4096) at fileops.c:328
#3  0x400e6360 in _IO_new_do_write (fp=0x4017e960, 
data=0x4002d000  access /home/User/info (/home/User/info):
Permission denied]\nERROR [could not access /home/User/ilnu
(/home/User/ilnu): Permission denied]\nERROR [could not access
/home/User/ijk (/home/Us..., 
to_do=4096) at fileops.c:301
#4  0x400e5a1e in _IO_new_file_overflow (f=0x4017e960, ch=-1) at
fileops.c:441
#5  0x400e71a7 in __overflow (f=0x4017e960, ch=-1) at genops.c:197
#6  0x400e60a0 in _IO_new_file_xsputn (f=0x4017e960, data=0x804ce78,
n=69)
at fileops.c:803
#7  0x400d752c in _IO_vfprintf (s=0x4017e960, format=0x804a9b7 ERROR
[%s]\n, 
ap=0xbe10) at vfprintf.c:1259
#8  0x400de050 in printf (format=0x804a9b7 ERROR [%s]\n) at
printf.c:31
#9  0x804a016 in check_disk (program=0x804de30 GNUTAR, 
disk=0x804de37 /home/User/cn, level=0) at selfcheck.c:462
#10 0x8049380 in main (argc=1, argv=0xbf24) at selfcheck.c:157
(gdb) quit
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program: /usr/libexec/amanda/selfcheck, Pid 26842



Re: problem backing up a host with more than 171 disklist entries of root-tar

2001-05-12 Thread Bernhard R. Erdmann

 I compiled the whole suite (without amandad.diff, but with make
 CFLAGS=-g) and copied client-src/.libs/{amandad,selfcheck} to
 /usr/libexec/amanda/.

I forgot to mention that advfs.diff is applied.