Re: client failure
Jean-Louis, Dustin Sorry it took so long, but we put the latest snapshot into place and it DID resolve the issue with backup of the older amanda client on the SGI/IRIX box. Thank you, Brian > I'm out the rest of the week and am reluctant to install a > new version when I wouldn't be here to check the result. > > Will compile and install the new release early next week > (with the patch if its not included in p1) and will leave > the everest DLE on the old server for the time being. > > Will let you know how I make out with the install/patch > when I return. > > Thank you, > > Brian > > On Fri, Apr 10, 2009 at 11:46:21AM -0400, Brian Cuttler wrote: > > On Thu, Apr 09, 2009 at 05:04:21PM -0400, Dustin J. Mitchell wrote: > > > On Thu, Apr 9, 2009 at 4:55 PM, Brian Cuttler wrote: > > > > # more chunker.20090409164847.debug > > > > > > That's a chunker debug log -- do you have a dumper debug log? > > > dumper.20090409164847.debug or something similar? You may have > > > several -- see if you can find one that shows something "unusual" at > > > the end (like a traceback). > > > > Sorry, included amdump log, which is not the same as dumper debug... > > and when that meantioned the chunker... > > > > From the client - I wonder if I need to rebuild with port restrictions > > because my new server has them... > > > > Not dynanically reconfigurable ? Requires a rebuild ? > > > > verest 66# more sendbackup.debug > > sendbackup: debug 1 pid 467288 ruid 0 euid 0 start time Thu Apr 9 18:46:58 > > 2009 > > /usr/local/libexec/sendbackup: got input request: DUMP /images3 0 > > 1970:1:1:0:0: > > 0 OPTIONS |;bsd-auth;compress-fast;no-record; > > parsed request as: program `DUMP' disk `/images3' lev 0 since > > 1970:1:1:0:0:0 o > > pt `|;bsd-auth;compress-fast;no-record;' > > waiting for connect on 857, then 690 > > got all connections > > sendbackup: spawning "/usr/sbin/gzip" in pipeline > > sendbackup: argument list: "/usr/sbin/gzip" "--fast" > > sendbackup: spawning "/usr/local/libexec/rundump" in pipeline > > sendbackup: argument list: "xfsdump" "-J" "-F" "-l" "0" "-" > > "/dev/rdsk/dks0d3s0" > > sendbackup: pid 464097 finish time Thu Apr 9 19:12:33 2009 > > > > > > I ran # amdump curie grifserv, generating these new dumper debug > > files in directory /tmp/amanda/server/curie, I don't see anything > > exciting here, but I'm not always certain what to look for. > > > > > > > more dumper.* > > :: > > dumper.20090410113619.debug > > :: > > 123939.211485: dumper: pid 24796 ruid 110 euid 110 version 2.6.1: start > > at F > > ri Apr 10 11:36:19 2009 > > 123939.214530: dumper: pid 24796 ruid 110 euid 110 version 2.6.1: > > rename at > > Fri Apr 10 11:36:19 2009 > > 123939.214802: dumper: getcmd: START 20090410113619 > > 1239378177.267303: dumper: getcmd: PORT-DUMP 00-2 10092 everest > > 34cbfe811f01 > > 00 /images3 NODEVICE 0 1970:1:1:0:0:0 DUMP X X X bsd > > |;bsd-auth;compress > > -fast;index; > > 1239378177.273822: dumper: make_socket opening socket with family 2 > > 1239378177.273911: dumper: connect_port: Try port 10084: available - > > Success > > 1239378177.274039: dumper: connected to 127.0.0.1.10092 > > 1239378177.274044: dumper: our side is 0.0.0.0.10084 > > 1239378177.274053: dumper: try_socksize: send buffer size is 65536 > > :: > > dumper.20090410113619000.debug > > :: > > 123939.211908: dumper: pid 24795 ruid 110 euid 110 version 2.6.1: start > > at F > > ri Apr 10 11:36:19 2009 > > 123939.214921: dumper: pid 24795 ruid 110 euid 110 version 2.6.1: > > rename at > > Fri Apr 10 11:36:19 2009 > > 123939.215212: dumper: getcmd: START 20090410113619 > > 1239378162.227997: dumper: getcmd: PORT-DUMP 00-1 10093 everest > > 34cbfe811f01 > > 00 /images3 NODEVICE 0 1970:1:1:0:0:0 DUMP X X X bsd > > |;bsd-auth;compress > > -fast;index; > > 1239378162.234231: dumper: make_socket opening socket with family 2 > > 1239378162.234317: dumper: connect_port: Try port 10084: available - > > Success > > 1239378162.234448: dumper: connected to 127.0.0.1.10093 > > 1239378162.234453: dumper: our side is 0.0.0.0.10084 > > 1239378162.234461: dumper: try_socksize: send buffer size is 65536 > > :: > > dumper.20090410113619001.debug > > :: > > 123939.212055: dumper: pid 24797 ruid 110 euid 110 version 2.6.1: start > > at F > > ri Apr 10 11:36:19 2009 > > 123939.215056: dumper: pid 24797 ruid 110 euid 110 version 2.6.1: > > rename at > > Fri Apr 10 11:36:19 2009 > > 123939.215350: dumper: getcmd: START 20090410113619 > > 1239378177.276916: dumper: getcmd: QUIT "" > > 1239378177.277106: dumper: pid 24797 finish time Fri Apr 10 11:42:57 2009 > > :: > > dumper.20090410113619002.debug > > :: > > 123939.223566: dumper: pid 24798 ruid 110 euid 110
Re: client failure
I'm out the rest of the week and am reluctant to install a new version when I wouldn't be here to check the result. Will compile and install the new release early next week (with the patch if its not included in p1) and will leave the everest DLE on the old server for the time being. Will let you know how I make out with the install/patch when I return. Thank you, Brian On Fri, Apr 10, 2009 at 11:46:21AM -0400, Brian Cuttler wrote: > On Thu, Apr 09, 2009 at 05:04:21PM -0400, Dustin J. Mitchell wrote: > > On Thu, Apr 9, 2009 at 4:55 PM, Brian Cuttler wrote: > > > # more chunker.20090409164847.debug > > > > That's a chunker debug log -- do you have a dumper debug log? > > dumper.20090409164847.debug or something similar? You may have > > several -- see if you can find one that shows something "unusual" at > > the end (like a traceback). > > Sorry, included amdump log, which is not the same as dumper debug... > and when that meantioned the chunker... > > From the client - I wonder if I need to rebuild with port restrictions > because my new server has them... > > Not dynanically reconfigurable ? Requires a rebuild ? > > verest 66# more sendbackup.debug > sendbackup: debug 1 pid 467288 ruid 0 euid 0 start time Thu Apr 9 18:46:58 > 2009 > /usr/local/libexec/sendbackup: got input request: DUMP /images3 0 > 1970:1:1:0:0: > 0 OPTIONS |;bsd-auth;compress-fast;no-record; > parsed request as: program `DUMP' disk `/images3' lev 0 since > 1970:1:1:0:0:0 o > pt `|;bsd-auth;compress-fast;no-record;' > waiting for connect on 857, then 690 > got all connections > sendbackup: spawning "/usr/sbin/gzip" in pipeline > sendbackup: argument list: "/usr/sbin/gzip" "--fast" > sendbackup: spawning "/usr/local/libexec/rundump" in pipeline > sendbackup: argument list: "xfsdump" "-J" "-F" "-l" "0" "-" > "/dev/rdsk/dks0d3s0" > sendbackup: pid 464097 finish time Thu Apr 9 19:12:33 2009 > > > I ran # amdump curie grifserv, generating these new dumper debug > files in directory /tmp/amanda/server/curie, I don't see anything > exciting here, but I'm not always certain what to look for. > > > > more dumper.* > :: > dumper.20090410113619.debug > :: > 123939.211485: dumper: pid 24796 ruid 110 euid 110 version 2.6.1: start > at F > ri Apr 10 11:36:19 2009 > 123939.214530: dumper: pid 24796 ruid 110 euid 110 version 2.6.1: rename > at > Fri Apr 10 11:36:19 2009 > 123939.214802: dumper: getcmd: START 20090410113619 > 1239378177.267303: dumper: getcmd: PORT-DUMP 00-2 10092 everest > 34cbfe811f01 > 00 /images3 NODEVICE 0 1970:1:1:0:0:0 DUMP X X X bsd > |;bsd-auth;compress > -fast;index; > 1239378177.273822: dumper: make_socket opening socket with family 2 > 1239378177.273911: dumper: connect_port: Try port 10084: available - Success > 1239378177.274039: dumper: connected to 127.0.0.1.10092 > 1239378177.274044: dumper: our side is 0.0.0.0.10084 > 1239378177.274053: dumper: try_socksize: send buffer size is 65536 > :: > dumper.20090410113619000.debug > :: > 123939.211908: dumper: pid 24795 ruid 110 euid 110 version 2.6.1: start > at F > ri Apr 10 11:36:19 2009 > 123939.214921: dumper: pid 24795 ruid 110 euid 110 version 2.6.1: rename > at > Fri Apr 10 11:36:19 2009 > 123939.215212: dumper: getcmd: START 20090410113619 > 1239378162.227997: dumper: getcmd: PORT-DUMP 00-1 10093 everest > 34cbfe811f01 > 00 /images3 NODEVICE 0 1970:1:1:0:0:0 DUMP X X X bsd > |;bsd-auth;compress > -fast;index; > 1239378162.234231: dumper: make_socket opening socket with family 2 > 1239378162.234317: dumper: connect_port: Try port 10084: available - Success > 1239378162.234448: dumper: connected to 127.0.0.1.10093 > 1239378162.234453: dumper: our side is 0.0.0.0.10084 > 1239378162.234461: dumper: try_socksize: send buffer size is 65536 > :: > dumper.20090410113619001.debug > :: > 123939.212055: dumper: pid 24797 ruid 110 euid 110 version 2.6.1: start > at F > ri Apr 10 11:36:19 2009 > 123939.215056: dumper: pid 24797 ruid 110 euid 110 version 2.6.1: rename > at > Fri Apr 10 11:36:19 2009 > 123939.215350: dumper: getcmd: START 20090410113619 > 1239378177.276916: dumper: getcmd: QUIT "" > 1239378177.277106: dumper: pid 24797 finish time Fri Apr 10 11:42:57 2009 > :: > dumper.20090410113619002.debug > :: > 123939.223566: dumper: pid 24798 ruid 110 euid 110 version 2.6.1: start > at F > ri Apr 10 11:36:19 2009 > 123939.226566: dumper: pid 24798 ruid 110 euid 110 version 2.6.1: rename > at > Fri Apr 10 11:36:19 2009 > 123939.226823: dumper: getcmd: START 20090410113619 > 1239378177.276956: dumper: getcmd: QUIT "" > 1239378177.277162: dumper: pid 24798 finish time Fri Apr 10 11:42:57 2009 > > --- >Brian R Cuttler brian.cutt...@wadsworth.org >Computer Systems Support(v) 518 486-1697 >Wadsworth Center(f) 518 473-6384
Re: client failure
On Thu, Apr 09, 2009 at 05:04:21PM -0400, Dustin J. Mitchell wrote: > On Thu, Apr 9, 2009 at 4:55 PM, Brian Cuttler wrote: > > # more chunker.20090409164847.debug > > That's a chunker debug log -- do you have a dumper debug log? > dumper.20090409164847.debug or something similar? You may have > several -- see if you can find one that shows something "unusual" at > the end (like a traceback). Sorry, included amdump log, which is not the same as dumper debug... and when that meantioned the chunker... >From the client - I wonder if I need to rebuild with port restrictions because my new server has them... Not dynanically reconfigurable ? Requires a rebuild ? verest 66# more sendbackup.debug sendbackup: debug 1 pid 467288 ruid 0 euid 0 start time Thu Apr 9 18:46:58 2009 /usr/local/libexec/sendbackup: got input request: DUMP /images3 0 1970:1:1:0:0: 0 OPTIONS |;bsd-auth;compress-fast;no-record; parsed request as: program `DUMP' disk `/images3' lev 0 since 1970:1:1:0:0:0 o pt `|;bsd-auth;compress-fast;no-record;' waiting for connect on 857, then 690 got all connections sendbackup: spawning "/usr/sbin/gzip" in pipeline sendbackup: argument list: "/usr/sbin/gzip" "--fast" sendbackup: spawning "/usr/local/libexec/rundump" in pipeline sendbackup: argument list: "xfsdump" "-J" "-F" "-l" "0" "-" "/dev/rdsk/dks0d3s0" sendbackup: pid 464097 finish time Thu Apr 9 19:12:33 2009 I ran # amdump curie grifserv, generating these new dumper debug files in directory /tmp/amanda/server/curie, I don't see anything exciting here, but I'm not always certain what to look for. > more dumper.* :: dumper.20090410113619.debug :: 123939.211485: dumper: pid 24796 ruid 110 euid 110 version 2.6.1: start at F ri Apr 10 11:36:19 2009 123939.214530: dumper: pid 24796 ruid 110 euid 110 version 2.6.1: rename at Fri Apr 10 11:36:19 2009 123939.214802: dumper: getcmd: START 20090410113619 1239378177.267303: dumper: getcmd: PORT-DUMP 00-2 10092 everest 34cbfe811f01 00 /images3 NODEVICE 0 1970:1:1:0:0:0 DUMP X X X bsd |;bsd-auth;compress -fast;index; 1239378177.273822: dumper: make_socket opening socket with family 2 1239378177.273911: dumper: connect_port: Try port 10084: available - Success 1239378177.274039: dumper: connected to 127.0.0.1.10092 1239378177.274044: dumper: our side is 0.0.0.0.10084 1239378177.274053: dumper: try_socksize: send buffer size is 65536 :: dumper.20090410113619000.debug :: 123939.211908: dumper: pid 24795 ruid 110 euid 110 version 2.6.1: start at F ri Apr 10 11:36:19 2009 123939.214921: dumper: pid 24795 ruid 110 euid 110 version 2.6.1: rename at Fri Apr 10 11:36:19 2009 123939.215212: dumper: getcmd: START 20090410113619 1239378162.227997: dumper: getcmd: PORT-DUMP 00-1 10093 everest 34cbfe811f01 00 /images3 NODEVICE 0 1970:1:1:0:0:0 DUMP X X X bsd |;bsd-auth;compress -fast;index; 1239378162.234231: dumper: make_socket opening socket with family 2 1239378162.234317: dumper: connect_port: Try port 10084: available - Success 1239378162.234448: dumper: connected to 127.0.0.1.10093 1239378162.234453: dumper: our side is 0.0.0.0.10084 1239378162.234461: dumper: try_socksize: send buffer size is 65536 :: dumper.20090410113619001.debug :: 123939.212055: dumper: pid 24797 ruid 110 euid 110 version 2.6.1: start at F ri Apr 10 11:36:19 2009 123939.215056: dumper: pid 24797 ruid 110 euid 110 version 2.6.1: rename at Fri Apr 10 11:36:19 2009 123939.215350: dumper: getcmd: START 20090410113619 1239378177.276916: dumper: getcmd: QUIT "" 1239378177.277106: dumper: pid 24797 finish time Fri Apr 10 11:42:57 2009 :: dumper.20090410113619002.debug :: 123939.223566: dumper: pid 24798 ruid 110 euid 110 version 2.6.1: start at F ri Apr 10 11:36:19 2009 123939.226566: dumper: pid 24798 ruid 110 euid 110 version 2.6.1: rename at Fri Apr 10 11:36:19 2009 123939.226823: dumper: getcmd: START 20090410113619 1239378177.276956: dumper: getcmd: QUIT "" 1239378177.277162: dumper: pid 24798 finish time Fri Apr 10 11:42:57 2009 --- Brian R Cuttler brian.cutt...@wadsworth.org Computer Systems Support(v) 518 486-1697 Wadsworth Center(f) 518 473-6384 NYS Department of HealthHelp Desk 518 473-0773 IMPORTANT NOTICE: This e-mail and any attachments may contain confidential or sensitive information which is, or may be, legally privileged or otherwise protected by law from further disclosure. It is intended only for the addressee. If you received this in error or from someone who was not authorized to send it to you, please do not distribute, copy or use it or any attachments. Please notify the sender immediately by reply e-mail and delete this from your system. Thank you for your cooperation.
Re: client failure
On Thu, Apr 9, 2009 at 7:20 PM, Jean-Louis Martineau wrote: > Can you try this patch? > I need this path to connect to a 2.4.2p1 client, I was not able to compile > 2.4.1 The patch looks good to me, whether it solves Brian's problem notwithstanding. Dustin -- Open Source Storage Engineer http://www.zmanda.com
Re: client failure
Brian, Can you try this patch? I need this path to connect to a 2.4.2p1 client, I was not able to compile 2.4.1 Jean-Louis Brian Cuttler wrote: I'm trying to migrate an amanda client, SGI/IRIX with amanda 2.4.1p1 from server Solaris 9 with amanda 2.4.4 to a server Solaris 10 with amanda 2.6.1. I have an error, but don't see anything standing out in the client's /tmp/amanda tree. FAILURE DUMP SUMMARY: everest /images3 lev 0 FAILED [dumper1 died] Did I cross a threshhold on versioning ? You'd think I could easilly find this on the internet but no. I'd thought there was a client server protocal issue at 2.4.0, did I misremember or am I looking at a different issue ? This page suggests that I can work with clients older than 2.5.1, but I'm not sure its not a client/server protocal issue and not rather than a communications issue. http://wiki.zmanda.com/index.php/Selfcheck_request_failed#Backing_Up_Older_Amanda_Clients_.28pre-2.5.1.29 Suggested that auth "bsd" would allow me to backup older clients. But that wasn't the solution for me. thank you, Brian --- Brian R Cuttler brian.cutt...@wadsworth.org Computer Systems Support(v) 518 486-1697 Wadsworth Center(f) 518 473-6384 NYS Department of HealthHelp Desk 518 473-0773 IMPORTANT NOTICE: This e-mail and any attachments may contain confidential or sensitive information which is, or may be, legally privileged or otherwise protected by law from further disclosure. It is intended only for the addressee. If you received this in error or from someone who was not authorized to send it to you, please do not distribute, copy or use it or any attachments. Please notify the sender immediately by reply e-mail and delete this from your system. Thank you for your cooperation. Index: server-src/dumper.c === --- server-src/dumper.c (revision 1854) +++ server-src/dumper.c (working copy) @@ -2089,7 +2089,7 @@ " ", dumpdate, " OPTIONS ", options, /* compat: if authopt=krb4, send krb4-auth */ - (strcasecmp(authopt, "krb4") ? "" : "krb4-auth"), + (authopt && strcasecmp(authopt, "krb4") ? "" : "krb4-auth"), "\n", NULL); }
Re: client failure
On Thu, Apr 9, 2009 at 4:55 PM, Brian Cuttler wrote: > # more chunker.20090409164847.debug That's a chunker debug log -- do you have a dumper debug log? dumper.20090409164847.debug or something similar? You may have several -- see if you can find one that shows something "unusual" at the end (like a traceback). Dustin -- Open Source Storage Engineer http://www.zmanda.com
Re: client failure
Post dumper.*.debug files? Jean-Louis Brian Cuttler wrote: Dustin, Jean-Louis, On Thu, Apr 09, 2009 at 04:37:24PM -0400, Dustin J. Mitchell wrote: On Thu, Apr 9, 2009 at 4:21 PM, Brian Cuttler wrote: ? everest /images3 lev 0 ?FAILED [dumper1 died] Check the dumper debug logs on the server, rather than the client. # more chunker.20090409164847.debug 1239310127.748174: chunker: pid 23052 ruid 110 euid 110 version 2.6.1: start at Thu Apr 9 16:48:47 2009 1239310127.751130: chunker: pid 23052 ruid 110 euid 110 version 2.6.1: rename at Thu Apr 9 16:48:47 2009 1239310127.751381: chunker: getcmd: START 20090409164827 1239310127.751428: chunker: getcmd: PORT-WRITE 00-3 /thump/amanda/work/20090 409164827/everest._images3.0 everest 34cbfe811f0100 /images3 0 1970:1:1: 0:0:0 1048576 DUMP 1178784 |;bsd-auth;compress-fast;index; 1239310127.752279: chunker: stream_server opening socket with family 2 (requeste d family was 2) 1239310127.752350: chunker: try_socksize: receive buffer size is 65536 1239310127.758397: chunker: bind_portrange2: Try port 10096: Available - Succes s 1239310127.758476: chunker: stream_server: waiting for connection: 0.0.0.0.10096 1239310127.758491: chunker: putresult: 23 PORT 1239310127.764993: chunker: stream_accept: connection from 127.0.0.1.10084 1239310127.765019: chunker: try_socksize: receive buffer size is 65536 1239310127.765485: chunker: putresult: 10 FAILED 1239310127.765592: chunker: pid 23052 finish time Thu Apr 9 16:48:47 2009 amdump.1 amdump: start at Thu Apr 9 16:48:27 EDT 2009 amdump: datestamp 20090409 amdump: starttime 20090409164827 amdump: starttime-locale-independent 2009-04-09 16:48:27 EDT planner: pid 22319 executable /usr/local/libexec/amanda/planner version 2.6.1-20090227 planner: build: VERSION="Amanda-2.6.1-20090227" planner:BUILT_DATE="Mon Mar 9 17:02:49 EDT 2009" planner:BUILT_MACH="i386-pc-solaris2.10" BUILT_REV="1714" planner:BUILT_BRANCH="amanda-261" CC="/opt/SUNWspro/bin/cc" planner: paths: bindir="/usr/local/bin" sbindir="/usr/local/sbin" planner:libexecdir="/usr/local/libexec" planner:amlibexecdir="/usr/local/libexec/amanda" planner:mandir="/usr/local/share/man" AMANDA_TMPDIR="/tmp/amanda" planner:AMANDA_DBGDIR="/tmp/amanda" planner:CONFIG_DIR="/usr/local/etc/amanda" DEV_PREFIX="/dev/dsk/" planner:RDEV_PREFIX="/dev/rdsk/" DUMP="/usr/sbin/ufsdump" planner:RESTORE="/usr/sbin/ufsrestore" VDUMP=UNDEF VRESTORE=UNDEF planner:XFSDUMP=UNDEF XFSRESTORE=UNDEF VXDUMP=UNDEF VXRESTORE=UNDEF planner:SAMBA_CLIENT="/usr/sfw/bin/smbclient" planner:GNUTAR="/usr/sfw/bin/gtar" COMPRESS_PATH="/usr/bin/gzip" planner:UNCOMPRESS_PATH="/usr/bin/gzip" LPRCMD="/usr/bin/lpr" planner: MAILER=UNDEF planner:listed_incr_dir="/usr/local/var/amanda/gnutar-lists" planner: defs: DEFAULT_SERVER="curie" DEFAULT_CONFIG="DailySet1" planner:DEFAULT_TAPE_SERVER="curie" DEFAULT_TAPE_DEVICE="" planner:HAVE_MMAP NEED_STRSTR HAVE_SYSVSHM AMFLOCK_POSIX AMFLOCK_LOCKF planner:AMFLOCK_LNLOCK SETPGRP_VOID AMANDA_DEBUG_DAYS=4 BSD_SECURITY planner:USE_AMANDAHOSTS CLIENT_LOGIN="amanda" CHECK_USERID HAVE_GZIP planner:COMPRESS_SUFFIX=".gz" COMPRESS_FAST_OPT="--fast" planner:COMPRESS_BEST_OPT="--best" UNCOMPRESS_OPT="-dc" READING CONF INFO... driver: pid 22320 executable /usr/local/libexec/amanda/driver version 2.6.1-20090227 planner: timestamp 20090409164827 planner: time 0.000: startup took 0.000 secs SENDING FLUSHES... driver: tape size 822083584 driver: adding holding disk 0 dir /amanda0/work size 469420032 chunksize 1048576 driver: adding holding disk 1 dir /thump/amanda/work size 9148094464 chunksize 1048576 reserving 0 out of 9617514496 for degraded-mode dumps driver: send-cmd time 0.004 to taper: START-TAPER 20090409164827 FLUSH trel /trel 20090409133424 1 /thump/amanda/work/20090409133424/trel._trel.1 ENDFLUSH SETTING UP FOR ESTIMATES... planner: time 0.002: setting up estimates for everest:/images3 everest:/images3 overdue 14344 days for level 0 setup_estimate: everest:/images3: command 0, options: nonelast_level -1 next_level0 -14344 level_days 0getting estimates 0 (-2) -1 (-2) -1 (-2) planner: time 0.002: setting up estimates took 0.000 secs GETTING ESTIMATES... driver: started dumper0 pid 22322 driver: send-cmd time 0.005 to dumper0: START 20090409164827 driver: started dumper1 pid 22323 driver: send-cmd time 0.006 to dumper1: START 20090409164827 driver: started dumper2 pid 22324 driver: send-cmd time 0.006 to dumper2: START 20090409164827 driver: started dumper3 pid 22325 driver: send-cmd time 0.007 to dumper3: START 20090409164827 driver: start time 0.007 inparallel 4 bandwidth 800 diskspace 9617514496 dir OBSOLETE datestamp 20090409164827 driver: drain-ends tapeq FIRST big-dumpers ssSS dumper: pid 22322 executable dumper0 version 2.6.1-2009
Re: client failure
Dustin, Jean-Louis, On Thu, Apr 09, 2009 at 04:37:24PM -0400, Dustin J. Mitchell wrote: > On Thu, Apr 9, 2009 at 4:21 PM, Brian Cuttler wrote: > > ? everest /images3 lev 0 ?FAILED [dumper1 died] > > Check the dumper debug logs on the server, rather than the client. # more chunker.20090409164847.debug 1239310127.748174: chunker: pid 23052 ruid 110 euid 110 version 2.6.1: start at Thu Apr 9 16:48:47 2009 1239310127.751130: chunker: pid 23052 ruid 110 euid 110 version 2.6.1: rename at Thu Apr 9 16:48:47 2009 1239310127.751381: chunker: getcmd: START 20090409164827 1239310127.751428: chunker: getcmd: PORT-WRITE 00-3 /thump/amanda/work/20090 409164827/everest._images3.0 everest 34cbfe811f0100 /images3 0 1970:1:1: 0:0:0 1048576 DUMP 1178784 |;bsd-auth;compress-fast;index; 1239310127.752279: chunker: stream_server opening socket with family 2 (requeste d family was 2) 1239310127.752350: chunker: try_socksize: receive buffer size is 65536 1239310127.758397: chunker: bind_portrange2: Try port 10096: Available - Succes s 1239310127.758476: chunker: stream_server: waiting for connection: 0.0.0.0.10096 1239310127.758491: chunker: putresult: 23 PORT 1239310127.764993: chunker: stream_accept: connection from 127.0.0.1.10084 1239310127.765019: chunker: try_socksize: receive buffer size is 65536 1239310127.765485: chunker: putresult: 10 FAILED 1239310127.765592: chunker: pid 23052 finish time Thu Apr 9 16:48:47 2009 amdump.1 amdump: start at Thu Apr 9 16:48:27 EDT 2009 amdump: datestamp 20090409 amdump: starttime 20090409164827 amdump: starttime-locale-independent 2009-04-09 16:48:27 EDT planner: pid 22319 executable /usr/local/libexec/amanda/planner version 2.6.1-20090227 planner: build: VERSION="Amanda-2.6.1-20090227" planner:BUILT_DATE="Mon Mar 9 17:02:49 EDT 2009" planner:BUILT_MACH="i386-pc-solaris2.10" BUILT_REV="1714" planner:BUILT_BRANCH="amanda-261" CC="/opt/SUNWspro/bin/cc" planner: paths: bindir="/usr/local/bin" sbindir="/usr/local/sbin" planner:libexecdir="/usr/local/libexec" planner:amlibexecdir="/usr/local/libexec/amanda" planner:mandir="/usr/local/share/man" AMANDA_TMPDIR="/tmp/amanda" planner:AMANDA_DBGDIR="/tmp/amanda" planner:CONFIG_DIR="/usr/local/etc/amanda" DEV_PREFIX="/dev/dsk/" planner:RDEV_PREFIX="/dev/rdsk/" DUMP="/usr/sbin/ufsdump" planner:RESTORE="/usr/sbin/ufsrestore" VDUMP=UNDEF VRESTORE=UNDEF planner:XFSDUMP=UNDEF XFSRESTORE=UNDEF VXDUMP=UNDEF VXRESTORE=UNDEF planner:SAMBA_CLIENT="/usr/sfw/bin/smbclient" planner:GNUTAR="/usr/sfw/bin/gtar" COMPRESS_PATH="/usr/bin/gzip" planner:UNCOMPRESS_PATH="/usr/bin/gzip" LPRCMD="/usr/bin/lpr" planner: MAILER=UNDEF planner:listed_incr_dir="/usr/local/var/amanda/gnutar-lists" planner: defs: DEFAULT_SERVER="curie" DEFAULT_CONFIG="DailySet1" planner:DEFAULT_TAPE_SERVER="curie" DEFAULT_TAPE_DEVICE="" planner:HAVE_MMAP NEED_STRSTR HAVE_SYSVSHM AMFLOCK_POSIX AMFLOCK_LOCKF planner:AMFLOCK_LNLOCK SETPGRP_VOID AMANDA_DEBUG_DAYS=4 BSD_SECURITY planner:USE_AMANDAHOSTS CLIENT_LOGIN="amanda" CHECK_USERID HAVE_GZIP planner:COMPRESS_SUFFIX=".gz" COMPRESS_FAST_OPT="--fast" planner:COMPRESS_BEST_OPT="--best" UNCOMPRESS_OPT="-dc" READING CONF INFO... driver: pid 22320 executable /usr/local/libexec/amanda/driver version 2.6.1-20090227 planner: timestamp 20090409164827 planner: time 0.000: startup took 0.000 secs SENDING FLUSHES... driver: tape size 822083584 driver: adding holding disk 0 dir /amanda0/work size 469420032 chunksize 1048576 driver: adding holding disk 1 dir /thump/amanda/work size 9148094464 chunksize 1048576 reserving 0 out of 9617514496 for degraded-mode dumps driver: send-cmd time 0.004 to taper: START-TAPER 20090409164827 FLUSH trel /trel 20090409133424 1 /thump/amanda/work/20090409133424/trel._trel.1 ENDFLUSH SETTING UP FOR ESTIMATES... planner: time 0.002: setting up estimates for everest:/images3 everest:/images3 overdue 14344 days for level 0 setup_estimate: everest:/images3: command 0, options: nonelast_level -1 next_level0 -14344 level_days 0getting estimates 0 (-2) -1 (-2) -1 (-2) planner: time 0.002: setting up estimates took 0.000 secs GETTING ESTIMATES... driver: started dumper0 pid 22322 driver: send-cmd time 0.005 to dumper0: START 20090409164827 driver: started dumper1 pid 22323 driver: send-cmd time 0.006 to dumper1: START 20090409164827 driver: started dumper2 pid 22324 driver: send-cmd time 0.006 to dumper2: START 20090409164827 driver: started dumper3 pid 22325 driver: send-cmd time 0.007 to dumper3: START 20090409164827 driver: start time 0.007 inparallel 4 bandwidth 800 diskspace 9617514496 dir OBSOLETE datestamp 20090409164827 driver: drain-ends tapeq FIRST big-dumpers ssSS dumper: pid 22322 executable dumper0 version 2.6.1-20090227 dumper: pid 22323 executable dumper1 version 2.6.1-20090227 dumper: pid
Re: client failure
Brian, I tested compatibility with 2.4.5, but it should works with 2.4.1. It's a bug in the server since the driver died, can you get a backtrace of the process? Also, send me the amdump.1 file. Jean-Louis Brian Cuttler wrote: I'm trying to migrate an amanda client, SGI/IRIX with amanda 2.4.1p1 from server Solaris 9 with amanda 2.4.4 to a server Solaris 10 with amanda 2.6.1. I have an error, but don't see anything standing out in the client's /tmp/amanda tree. FAILURE DUMP SUMMARY: everest /images3 lev 0 FAILED [dumper1 died] Did I cross a threshhold on versioning ? You'd think I could easilly find this on the internet but no. I'd thought there was a client server protocal issue at 2.4.0, did I misremember or am I looking at a different issue ? This page suggests that I can work with clients older than 2.5.1, but I'm not sure its not a client/server protocal issue and not rather than a communications issue. http://wiki.zmanda.com/index.php/Selfcheck_request_failed#Backing_Up_Older_Amanda_Clients_.28pre-2.5.1.29 Suggested that auth "bsd" would allow me to backup older clients. But that wasn't the solution for me. thank you, Brian --- Brian R Cuttler brian.cutt...@wadsworth.org Computer Systems Support(v) 518 486-1697 Wadsworth Center(f) 518 473-6384 NYS Department of HealthHelp Desk 518 473-0773 IMPORTANT NOTICE: This e-mail and any attachments may contain confidential or sensitive information which is, or may be, legally privileged or otherwise protected by law from further disclosure. It is intended only for the addressee. If you received this in error or from someone who was not authorized to send it to you, please do not distribute, copy or use it or any attachments. Please notify the sender immediately by reply e-mail and delete this from your system. Thank you for your cooperation.
Re: client failure
On Thu, Apr 9, 2009 at 4:21 PM, Brian Cuttler wrote: > everest /images3 lev 0 FAILED [dumper1 died] Check the dumper debug logs on the server, rather than the client. Dustin -- Open Source Storage Engineer http://www.zmanda.com
Re: Client failure problem -answer
On Thu, Jan 06, 2005 at 10:46:30AM +, Keith Matthews wrote: > On Wed, 5 Jan 2005 15:15:01 -0500 > > Did you perchance increase the etimeout and dtimeout values first? > > > > Nope, partly as there is either no documentation on them or it's well > hidden. Yup, we keep the man page for amanda well hidden :)) > > ... . As long as the spindle numbers used are > > assigned to the individual drive per number, I've not even had any > > disk thrashing problems either. > > > > This was all one spindle. But did you tell amanda that? If you don't tell it, by indicating which DLE's are on the same spindle, amanda assumes each is a separate spindle and might try many simultaneous dumps from that spindle. -- Jon H. LaBadie [EMAIL PROTECTED] JG Computing 4455 Province Line Road(609) 252-0159 Princeton, NJ 08540-4322 (609) 683-7220 (fax)
Re: Client failure problem -answer
Keith Matthews wrote: > Further > testing revealed that it was quite happy with as many as 7 entries, more > would cause the first few to fail, and the whole set would cause the lot > to fail. Hadn't there been talk once on the list of UDP packet size problems too? Something like a specific operating system supporting only so many bytes in a single UDP packet, and when it got too big, things failed mysteriously? That would explain why there's so low a limit for this host. (I have several 10s of DLEs for a few hosts, and things work just perfectly.) Alex -- Alexander Jolk / BUF Compagnie tel +33-1 42 68 18 28 / fax +33-1 42 68 18 29
Re: Client failure problem -answer
On Wed, 5 Jan 2005 15:15:01 -0500 Gene Heskett <[EMAIL PROTECTED]> wrote: > >Replacing the above with one entry per filesystem (i.e wd0a, wd0e, > > wd0g) where the whole filesystem was needed, and the top level > > directory (/var) for the other case, with an exclude file to > > eliminate the unwanted had the whole set dumping correctly. I have > > no idea if this is a generic Amanda issue or one specific to the > > OpenBSD port. > > > >Debugging was complicated by the disk entries being tried in reverse > >order, something else that does not seem to be mentioned in the > >documentation. > > > >In case anyone wonders about the effect of 'inparrallel' I left it > > at the default of 4. > > Did you perchance increase the etimeout and dtimeout values first? > Nope, partly as there is either no documentation on them or it's well hidden. I'm not sure it would have had any effect anyway as the original problem situation had amandad failing very quickly (less than 5 seconds, I never managed to catch it running with ps) with status 1. > I have had as high as 53 entries for a single client in my disklist > without any problems. As long as the spindle numbers used are > assigned to the individual drive per number, I've not even had any > disk thrashing problems either. > > This was all one spindle.
Re: Client failure problem -answer
On Wednesday 05 January 2005 13:35, Keith Matthews wrote: >On Sat, 18 Dec 2004 10:01:41 + > >Keith Matthews <[EMAIL PROTECTED]> wrote: >> On Sat, 18 Dec 2004 09:17:30 + >> >> Keith Matthews <[EMAIL PROTECTED]> wrote: >> > In the light of messages just posted I'll report this in case it >> > didn't get out. Apologies to those who got it first time, I >> > don't like those who assume that no answer simply maans people >> > don't want to answer either. >> > >> > I'm having trouble getting a remote backup to work. >> > >> > The report states that the disk backup failed due to a timeout. >> > Examination of /var/log/messages at the client shows that >> > amandad exited'status 1' but gives no other indication of the >> > cause of the problem. This is happening with all disks on that >> > client. > >OK, for the sake of posterity I'd better post some more for this. > >The problem seems to be related to the number of entries in the > disklist for the relevant host. > >I originally had > > wd0a user-tar -1 > wd0e comp-user-tar -1 > wd0g comp-user-tar -1 > /var/amanda user-tar -1 > /var/backups user-tar -1 > /var/clamav user-tar -1 > /var/cron user-tar -1 > /var/mysql user-tar -1 > /var/named user-tar -1 > /var/spool comp-user-tar -1 > /var/www user-tar -1 > >(I've replaced the real, fqdn, hostname for security reasons). > >After some considerable amount of cut-and-try testing I discovered > that the system worked quite happily with just one disklist entry. > Further testing revealed that it was quite happy with as many as 7 > entries, more would cause the first few to fail, and the whole set > would cause the lot to fail. > >Replacing the above with one entry per filesystem (i.e wd0a, wd0e, > wd0g) where the whole filesystem was needed, and the top level > directory (/var) for the other case, with an exclude file to > eliminate the unwanted had the whole set dumping correctly. I have > no idea if this is a generic Amanda issue or one specific to the > OpenBSD port. > >Debugging was complicated by the disk entries being tried in reverse >order, something else that does not seem to be mentioned in the >documentation. > >In case anyone wonders about the effect of 'inparrallel' I left it > at the default of 4. Did you perchance increase the etimeout and dtimeout values first? I have had as high as 53 entries for a single client in my disklist without any problems. As long as the spindle numbers used are assigned to the individual drive per number, I've not even had any disk thrashing problems either. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.31% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved.
Re: Client failure problem -answer
On Sat, 18 Dec 2004 10:01:41 + Keith Matthews <[EMAIL PROTECTED]> wrote: > On Sat, 18 Dec 2004 09:17:30 + > Keith Matthews <[EMAIL PROTECTED]> wrote: > > > In the light of messages just posted I'll report this in case it > > didn't get out. Apologies to those who got it first time, I don't > > like those who assume that no answer simply maans people don't want > > to answer either. > > > > I'm having trouble getting a remote backup to work. > > > > The report states that the disk backup failed due to a timeout. > > Examination of /var/log/messages at the client shows that amandad > > exited'status 1' but gives no other indication of the cause of the > > problem. This is happening with all disks on that client. > > OK, for the sake of posterity I'd better post some more for this. The problem seems to be related to the number of entries in the disklist for the relevant host. I originally had wd0a user-tar -1 wd0e comp-user-tar -1 wd0g comp-user-tar -1 /var/amanda user-tar -1 /var/backups user-tar -1 /var/clamav user-tar -1 /var/cron user-tar -1 /var/mysql user-tar -1 /var/named user-tar -1 /var/spool comp-user-tar -1 /var/www user-tar -1 (I've replaced the real, fqdn, hostname for security reasons). After some considerable amount of cut-and-try testing I discovered that the system worked quite happily with just one disklist entry. Further testing revealed that it was quite happy with as many as 7 entries, more would cause the first few to fail, and the whole set would cause the lot to fail. Replacing the above with one entry per filesystem (i.e wd0a, wd0e, wd0g) where the whole filesystem was needed, and the top level directory (/var) for the other case, with an exclude file to eliminate the unwanted had the whole set dumping correctly. I have no idea if this is a generic Amanda issue or one specific to the OpenBSD port. Debugging was complicated by the disk entries being tried in reverse order, something else that does not seem to be mentioned in the documentation. In case anyone wonders about the effect of 'inparrallel' I left it at the default of 4.
Re: Client failure problem -part answer
On Sat, 18 Dec 2004 09:17:30 + Keith Matthews <[EMAIL PROTECTED]> wrote: > In the light of messages just posted I'll report this in case it > didn't get out. Apologies to those who got it first time, I don't like > those who assume that no answer simply maans people don't want to > answer either. > > I'm having trouble getting a remote backup to work. > > The report states that the disk backup failed due to a timeout. > Examination of /var/log/messages at the client shows that amandad > exited'status 1' but gives no other indication of the cause of the > problem. This is happening with all disks on that client. > > The failure seems to happen immediately (I was trying to check the > user that amandad was running under and the process did not last long > enough to show). > > There do not seem to be any logs on the client to give more > information and I was wondering if there is some sort of debug setting > I could invoke. > > It ran correctly for a test about two weeks ago but has since > consistently failed as above. Backups of the tape server host work > correctly. > > Server is Slackware 10, client OpenBSD 3.5 if that makes a difference. > > Anyone got a clue how I can find out what the client is objecting to ? OK. for the assistance of anyone who uses the archives I've found a set of files in /tmp/amanda which give some information. One is saying amandad:UNCOMPRESS_OPT="-dc" got packet: Amanda 2.4 REQ HANDLE 000-E0750608 SEQ 1103362411 SECURITY USER amanda SERVICE noop OPTIONS features=feff9ffe0f; sending nack: Amanda 2.4 NAK HANDLE 000-E0750608 SEQ 1103362411 ERROR unknown service: noop If I can work out what has changed between the original run and these later ones that could have caused that I'll be able to fix it.