Re: FW: selfcheck hangs
Steven M. Wilson wrote: That could very well be the problem I'm having since I just tried a df on the client system and it ground to a halt trying to located NFS mounts. We rely heavily on NFS here so I'll need to figure out how to get around this problem in the future. Thanks for the info. Mount your non-critical nfs filesystems with mount options soft,intr instead of hard,nointr (or eventually hard,intr) to avoid the blocking. Yes, the man page says intr just causes a lot of trouble, but hard mounts result in non-killable processes (unless you specify intr in which case you can kill the process, but it will not timeout by itself). And be very careful what you categorize critical: a disk with software packages even including amanda, is not critical in this sense; an nfs mounted root partition is critical. (Yes, the man page says intr just cause a lot of trouble, but hard mounts result in non-killable processes. Be careful to put your critical nfs shares on stable hard/software that does not need frequent reboots.) my 0.02 -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, * * quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ...* * ... Are you sure? ... YES ... Phew ... I'm out * ***
RE: FW: selfcheck hangs
no, the list was no help. The problem was that the client had nfs-mounted a disk that was no longer on the net, so anything that iterated over mounts (like df) was hanging. That is probably why reboot solve it. I don't allow key machines to be nfs clients anymore. JLM -Original Message- From: Steven M. Wilson [mailto:[EMAIL PROTECTED] Sent: Wed 6/18/2003 2:32 PM To: Jeremy L. Mordkoff Cc: Subject:Re: FW: selfcheck hangs Jeremy, Did anyone respond off-list to your posting? I have the same problem here from time to time and the only way I've been able to correct is by rebooting the offending client system. Steve Jeremy L. Mordkoff wrote: one system has started refusing to run backups. amcheck reports a timeout. A ps on the client shows several orphaned selfcheck's. I did try killing all amandad's and hitting xinetd with a sigHUP, and then I tried an amcheck again, to no avail. I then reinstalled amanda and repeated. Still no. Here's the debug log. Any ideas would be appreciated. JLM -Original Message- From: root [mailto:[EMAIL PROTECTED] Sent: Fri 6/13/2003 9:20 AM To:[EMAIL PROTECTED] Cc: Subject: amandad: debug 1 pid 23823 ruid 527 euid 527: start at Fri Jun 13 09:16:52 2003 amandad: version 2.4.3 amandad: build: VERSION=Amanda-2.4.3 amandad:BUILT_DATE=Fri Apr 4 10:37:17 EST 2003 amandad:BUILT_MACH=Linux lux1 2.4.18-18.7.xsmp #1 SMP Wed Nov 13 19:01:42 EST 2002 i686 unknown amandad:CC=gcc amandad:CONFIGURE_COMMAND='./configure' '--with-user=amanda' '--with-group=disk' amandad: paths: bindir=/usr/local/bin sbindir=/usr/local/sbin amandad:libexecdir=/usr/local/libexec mandir=/usr/local/man amandad:AMANDA_TMPDIR=/tmp/amanda AMANDA_DBGDIR=/tmp/amanda amandad:CONFIG_DIR=/usr/local/etc/amanda DEV_PREFIX=/dev/ amandad:RDEV_PREFIX=/dev/ DUMP=/sbin/dump amandad:RESTORE=/sbin/restore SAMBA_CLIENT=/usr/bin/smbclient amandad:GNUTAR=/bin/gtar COMPRESS_PATH=/bin/gzip amandad:UNCOMPRESS_PATH=/bin/gzip MAILER=/usr/bin/Mail amandad:listed_incr_dir=/usr/local/var/amanda/gnutar-lists amandad: defs: DEFAULT_SERVER=lux1 DEFAULT_CONFIG=DailySet1 amandad:DEFAULT_TAPE_SERVER=lux1 DEFAULT_TAPE_DEVICE=/dev/null amandad:HAVE_MMAP HAVE_SYSVSHM LOCKING=POSIX_FCNTL SETPGRP_VOID amandad:DEBUG_CODE AMANDA_DEBUG_DAYS=4 BSD_SECURITY USE_AMANDAHOSTS amandad:CLIENT_LOGIN=amanda FORCE_USERID HAVE_GZIP amandad:COMPRESS_SUFFIX=.gz COMPRESS_FAST_OPT=--fast amandad:COMPRESS_BEST_OPT=--best UNCOMPRESS_OPT=-dc amandad: time 0.000: got packet: Amanda 2.4 REQ HANDLE 000-58790808 SEQ 1055510212 SECURITY USER amanda SERVICE selfcheck OPTIONS features=feff9f00;maxdumps=1;hostname=rel2; DUMP hda3 0 OPTIONS |;auth=bsd;compress-fast; DUMP vg01/lv_data 0 OPTIONS |;auth=bsd;compress-fast; amandad: time 0.000: sending ack: Amanda 2.4 ACK HANDLE 000-58790808 SEQ 1055510212 amandad: time 0.001: bsd security: remote host lux1 user amanda local user amanda amandad: time 0.001: amandahosts security check passed amandad: time 0.001: running service /usr/local/libexec/selfcheck amandad: time 30.526: got packet: Amanda 2.4 REQ HANDLE 000-58790808 SEQ 1055510212 SECURITY USER amanda SERVICE selfcheck OPTIONS features=feff9f00;maxdumps=1;hostname=rel2; DUMP hda3 0 OPTIONS |;auth=bsd;compress-fast; DUMP vg01/lv_data 0 OPTIONS |;auth=bsd;compress-fast; amandad: time 31.146: received dup P_REQ packet, ACKing it amandad: time 31.146: sending ack: Amanda 2.4 ACK HANDLE 000-58790808 SEQ 1055510212 amandad: time 61.141: got packet: Amanda 2.4 REQ HANDLE 000-58790808 SEQ 1055510212 SECURITY USER amanda SERVICE selfcheck OPTIONS features=feff9f00;maxdumps=1;hostname=rel2; DUMP hda3 0 OPTIONS |;auth=bsd;compress-fast; DUMP vg01/lv_data 0 OPTIONS |;auth=bsd;compress-fast; amandad: time 61.141: received dup P_REQ packet, ACKing it amandad: time 61.141: sending ack: Amanda 2.4 ACK HANDLE 000-58790808 SEQ 1055510212 -- Steven M. Wilson, Systems and Network Manager Markey Center for Structural Biology Purdue University [EMAIL PROTECTED]765.496.1946
Re: FW: selfcheck hangs
Jeremy, That could very well be the problem I'm having since I just tried a df on the client system and it ground to a halt trying to located NFS mounts. ?We rely heavily on NFS here so I'll need to figure out how to get around this problem in the future. ?Thanks for the info. Steve Jeremy L. Mordkoff wrote: no, the list was no help. The problem was that the client had nfs-mounted a disk that was no longer on the net, so anything that iterated over mounts (like df) was hanging. That is probably why reboot solve it. I don't allow key machines to be nfs clients anymore. JLM -Original Message- From: Steven M. Wilson [mailto:[EMAIL PROTECTED]] Sent: Wed 6/18/2003 2:32 PM To: Jeremy L. Mordkoff Cc: Subject: Re: FW: selfcheck hangs Jeremy, Did anyone respond off-list to your posting? I have the same problem here from time to time and the only way I've been able to correct is by rebooting the offending client system. Steve Jeremy L. Mordkoff wrote: one system has started refusing to run backups. amcheck reports a timeout. A ps on the client shows several orphaned selfcheck's. I did try killing all amandad's and hitting xinetd with a sigHUP, and then I tried an amcheck again, to no avail. I then reinstalled amanda and repeated. Still no. Here's the debug log. Any ideas would be appreciated. JLM -Original Message- From: root [mailto:[EMAIL PROTECTED]] Sent: Fri 6/13/2003 9:20 AM To: [EMAIL PROTECTED] Cc: Subject: amandad: debug 1 pid 23823 ruid 527 euid 527: start at Fri Jun 13 09:16:52 2003 amandad: version 2.4.3 amandad: build: VERSION="Amanda-2.4.3" amandad:BUILT_DATE="Fri Apr 4 10:37:17 EST 2003" amandad:BUILT_MACH="Linux lux1 2.4.18-18.7.xsmp #1 SMP Wed Nov 13 19:01:42 EST 2002 i686 unknown" amandad:CC="gcc" amandad:CONFIGURE_COMMAND="'./configure' '--with-user=amanda' '--with-group=disk'" amandad: paths: bindir="/usr/local/bin" sbindir="/usr/local/sbin" amandad:libexecdir="/usr/local/libexec" mandir="/usr/local/man" amandad:AMANDA_TMPDIR="/tmp/amanda" AMANDA_DBGDIR="/tmp/amanda" amandad:CONFIG_DIR="/usr/local/etc/amanda" DEV_PREFIX="/dev/" amandad:RDEV_PREFIX="/dev/" DUMP="/sbin/dump" amandad:RESTORE="/sbin/restore" SAMBA_CLIENT="/usr/bin/smbclient" amandad:GNUTAR="/bin/gtar" COMPRESS_PATH="/bin/gzip" amandad:UNCOMPRESS_PATH="/bin/gzip" MAILER="/usr/bin/Mail" amandad:listed_incr_dir="/usr/local/var/amanda/gnutar-lists" amandad: defs: DEFAULT_SERVER="lux1" DEFAULT_CONFIG="DailySet1" amandad:DEFAULT_TAPE_SERVER="lux1" DEFAULT_TAPE_DEVICE="/dev/null" amandad:HAVE_MMAP HAVE_SYSVSHM LOCKING=POSIX_FCNTL SETPGRP_VOID amandad:DEBUG_CODE AMANDA_DEBUG_DAYS=4 BSD_SECURITY USE_AMANDAHOSTS amandad:CLIENT_LOGIN="amanda" FORCE_USERID HAVE_GZIP amandad:COMPRESS_SUFFIX=".gz" COMPRESS_FAST_OPT="--fast" amandad:COMPRESS_BEST_OPT="--best" UNCOMPRESS_OPT="-dc" amandad: time 0.000: got packet: Amanda 2.4 REQ HANDLE 000-58790808 SEQ 1055510212 SECURITY USER amanda SERVICE selfcheck OPTIONS features=feff9f00;maxdumps=1;hostname=rel2; DUMP hda3 0 OPTIONS |;auth=bsd;compress-fast; DUMP vg01/lv_data 0 OPTIONS |;auth=bsd;compress-fast; amandad: time 0.000: sending ack: Amanda 2.4 ACK HANDLE 000-58790808 SEQ 1055510212 amandad: time 0.001: bsd security: remote host lux1 user amanda local user amanda amandad: time 0.001: amandahosts security check passed amandad: time 0.001: running service "/usr/local/libexec/selfcheck" amandad: time 30.526: got packet: Amanda 2.4 REQ HANDLE 000-58790808 SEQ 1055510212 SECURITY USER amanda SERVICE selfcheck OPTIONS features=feff9f00;maxdumps=1;hostname=rel2; DUMP hda3 0 OPTIONS |;auth=bsd;compress-fast; DUMP vg01/lv_data 0 OPTIONS |;auth=bsd;compress-fast; amandad: time 31.146: received dup P_REQ packet, ACKing it amandad: time 31.146: sending ack: Amanda 2.4 ACK HANDLE 000-58790808 SEQ 1055510212 amandad: time 61.141: got packet: Amanda 2.4 REQ HANDLE 000-58790808 SEQ 1055510212 SECURITY USER amanda SERVICE selfcheck OPTIONS features=feff9f00;maxdumps=1;hostname=rel2; DUMP hda3 0 OPTIONS |;auth=bsd;compress-fast; DUMP vg01/lv_data 0 OPTIONS |;auth=bsd;compress-fast; amandad: time 61.141: received dup P_REQ packet, ACKing it amandad: time 61.141: sending ack: Amanda 2.4 ACK HANDLE 000-58790808 SEQ 1055510212 -- Steven M. Wilson, Systems and Network Manager Markey Center for Structural Biology Purdue University [EMAIL PROTECTED]765.496.1946
FW: selfcheck hangs
one system has started refusing to run backups. amcheck reports a timeout. A ps on the client shows several orphaned selfcheck's. I did try killing all amandad's and hitting xinetd with a sigHUP, and then I tried an amcheck again, to no avail. I then reinstalled amanda and repeated. Still no. Here's the debug log. Any ideas would be appreciated. JLM -Original Message- From: root [mailto:[EMAIL PROTECTED] Sent: Fri 6/13/2003 9:20 AM To: [EMAIL PROTECTED] Cc: Subject: amandad: debug 1 pid 23823 ruid 527 euid 527: start at Fri Jun 13 09:16:52 2003 amandad: version 2.4.3 amandad: build: VERSION=Amanda-2.4.3 amandad:BUILT_DATE=Fri Apr 4 10:37:17 EST 2003 amandad:BUILT_MACH=Linux lux1 2.4.18-18.7.xsmp #1 SMP Wed Nov 13 19:01:42 EST 2002 i686 unknown amandad:CC=gcc amandad:CONFIGURE_COMMAND='./configure' '--with-user=amanda' '--with-group=disk' amandad: paths: bindir=/usr/local/bin sbindir=/usr/local/sbin amandad:libexecdir=/usr/local/libexec mandir=/usr/local/man amandad:AMANDA_TMPDIR=/tmp/amanda AMANDA_DBGDIR=/tmp/amanda amandad:CONFIG_DIR=/usr/local/etc/amanda DEV_PREFIX=/dev/ amandad:RDEV_PREFIX=/dev/ DUMP=/sbin/dump amandad:RESTORE=/sbin/restore SAMBA_CLIENT=/usr/bin/smbclient amandad:GNUTAR=/bin/gtar COMPRESS_PATH=/bin/gzip amandad:UNCOMPRESS_PATH=/bin/gzip MAILER=/usr/bin/Mail amandad:listed_incr_dir=/usr/local/var/amanda/gnutar-lists amandad: defs: DEFAULT_SERVER=lux1 DEFAULT_CONFIG=DailySet1 amandad:DEFAULT_TAPE_SERVER=lux1 DEFAULT_TAPE_DEVICE=/dev/null amandad:HAVE_MMAP HAVE_SYSVSHM LOCKING=POSIX_FCNTL SETPGRP_VOID amandad:DEBUG_CODE AMANDA_DEBUG_DAYS=4 BSD_SECURITY USE_AMANDAHOSTS amandad:CLIENT_LOGIN=amanda FORCE_USERID HAVE_GZIP amandad:COMPRESS_SUFFIX=.gz COMPRESS_FAST_OPT=--fast amandad:COMPRESS_BEST_OPT=--best UNCOMPRESS_OPT=-dc amandad: time 0.000: got packet: Amanda 2.4 REQ HANDLE 000-58790808 SEQ 1055510212 SECURITY USER amanda SERVICE selfcheck OPTIONS features=feff9f00;maxdumps=1;hostname=rel2; DUMP hda3 0 OPTIONS |;auth=bsd;compress-fast; DUMP vg01/lv_data 0 OPTIONS |;auth=bsd;compress-fast; amandad: time 0.000: sending ack: Amanda 2.4 ACK HANDLE 000-58790808 SEQ 1055510212 amandad: time 0.001: bsd security: remote host lux1 user amanda local user amanda amandad: time 0.001: amandahosts security check passed amandad: time 0.001: running service /usr/local/libexec/selfcheck amandad: time 30.526: got packet: Amanda 2.4 REQ HANDLE 000-58790808 SEQ 1055510212 SECURITY USER amanda SERVICE selfcheck OPTIONS features=feff9f00;maxdumps=1;hostname=rel2; DUMP hda3 0 OPTIONS |;auth=bsd;compress-fast; DUMP vg01/lv_data 0 OPTIONS |;auth=bsd;compress-fast; amandad: time 31.146: received dup P_REQ packet, ACKing it amandad: time 31.146: sending ack: Amanda 2.4 ACK HANDLE 000-58790808 SEQ 1055510212 amandad: time 61.141: got packet: Amanda 2.4 REQ HANDLE 000-58790808 SEQ 1055510212 SECURITY USER amanda SERVICE selfcheck OPTIONS features=feff9f00;maxdumps=1;hostname=rel2; DUMP hda3 0 OPTIONS |;auth=bsd;compress-fast; DUMP vg01/lv_data 0 OPTIONS |;auth=bsd;compress-fast; amandad: time 61.141: received dup P_REQ packet, ACKing it amandad: time 61.141: sending ack: Amanda 2.4 ACK HANDLE 000-58790808 SEQ 1055510212