Re: timeout waiting for REP
On Tue, Jun 1, 2010 at 9:39 AM, McGraw, Robert P rmcg...@purdue.edu wrote: I am seeing several of the following dumper: [request failed: timeout waiting for REP](too) errors when I run the amstatus command. Below is a snippet of one of the errors. Is the (too) really part of the error? Can someone tell me what might be causing this error? Timeouts? 1275368274.080161: dumper: (sockaddr_in *)fef206c4 = { 2, 10080, 128.210.3.201 } 1275368334.089824: dumper: security_seterror(handle=807bc10, driver=fef0ad20 (BSD) error=timeout waiting for REP) That's 60 seconds.. as I've said *many* times, the BSD auth is from the stone age. Use something at least slightly newer, like BSDTCP! Dustin P.S. Despite a lot of voiced interest, nobody is working on the ssl auth at the moment. Anyone want to pick that project up? -- Open Source Storage Engineer http://www.zmanda.com
timeout waiting for REP
I am running Amanda 2.6.1p2 on a Sun Solaris 10 I am seeing several of the following dumper: [request failed: timeout waiting for REP](too) errors when I run the amstatus command. Below is a snippet of one of the errors. There were 57 DLE backups of the 57 4 gave the above error. I ran amcheck on the 57 DLE and had no errors. Can someone tell me what might be causing this error? Thanks Robert ./sbin/amstatus daily --date : 20100531231421 hertz:/gauss/export/pkgs 0 dumper: [request failed: timeout waiting for REP](too) : less ./dumper.20100531231421004.debug : : 1275364553.448213: dumper: security_streaminit(stream=807d4d8, driver=fef0ad20 (BSD)) 1275364553.448345: dumper: make_socket opening socket with family 2 1275364553.448412: dumper: connect_port: Try port 1024: available - Success 1275364553.448485: dumper: connected to 128.210.3.201.49165 1275364553.448491: dumper: our side is 0.0.0.0.1024 1275364553.448499: dumper: try_socksize: send buffer size is 65536 1275364553.448507: dumper: try_socksize: receive buffer size is 65536 1275364553.448525: dumper: security_streaminit(stream=8085518, driver=fef0ad20 (BSD)) 1275364553.448611: dumper: make_socket opening socket with family 2 1275364553.448636: dumper: connect_port: Try port 1024: available - Success 1275364553.448665: dumper: connected to 128.210.3.201.49166 1275364553.448672: dumper: our side is 0.0.0.0.1024 1275364553.448679: dumper: try_socksize: send buffer size is 65536 1275364553.448685: dumper: try_socksize: receive buffer size is 65536 1275364553.448704: dumper: security_streaminit(stream=808d558, driver=fef0ad20 (BSD)) 1275364553.448785: dumper: make_socket opening socket with family 2 1275364553.448821: dumper: connect_port: Try port 1024: available - Success 1275364553.448849: dumper: connected to 128.210.3.201.49167 1275364553.448855: dumper: our side is 0.0.0.0.1024 1275364553.448861: dumper: try_socksize: send buffer size is 65536 1275364553.448868: dumper: try_socksize: receive buffer size is 65536 1275364553.448883: dumper: security_close(handle=8077560, driver=fef0ad20 (BSD)) 1275364553.491297: dumper: Building type 3 (FILE) header of size 32768 using: 1275364553.491329: dumper: Contents of *(dumpfile_t *)8069b60: 1275364553.491341: dumper: type = 3 (FILE) 1275364553.491346: dumper: datestamp= '20100531231421'1275364553.491350: dumper: dumplevel= 0 1275364553.491354: dumper: compressed = 0 1275364553.491357: dumper: encrypted= 0 1275364553.491361: dumper: comp_suffix = 'N' 1275364553.491365: dumper: encrypt_suffix = 'N' 1275364553.491368: dumper: name = 'hertz' 1275364553.491372: dumper: disk = '/gauss/export/pkgs' 1275364553.491376: dumper: program = '/usr/sfw/bin/gtar' 1275364553.491380: dumper: application = '' 1275364553.491384: dumper: srvcompprog = '' 1275364553.491387: dumper: clntcompprog = '' 1275364553.491391: dumper: srv_encrypt = '' 1275364553.491394: dumper: clnt_encrypt = '' 1275364553.491398: dumper: recover_cmd = '/usr/sfw/bin/gtar -xpGf - ...' 1275364553.491402: dumper: uncompress_cmd = '' 1275364553.491405: dumper: encrypt_cmd = '' 1275364553.491409: dumper: decrypt_cmd = '' 1275364553.491413: dumper: srv_decrypt_opt = '' 1275364553.491416: dumper: clnt_decrypt_opt = '' 1275364553.491420: dumper: cont_filename= '' 1275364553.491423: dumper: dle_str = dle programGNUTAR/program disk/gauss/export/pkgs/disk diskdevice/zvol/AMBKUPLOFS/gauss/export/pkgs/diskdevice level0/level authBSD/auth recordYES/record indexYES/index /dle 1275364553.491428: dumper: is_partial = 0 1275364553.491432: dumper: partnum = 0 1275364553.491436: dumper: totalparts = 0 1275364553.491439: dumper: blocksize= 32768 1275368153.491794: dumper: security_stream_close(807d4d8) 1275368153.491867: dumper: security_stream_close(8085518) 1275368153.491889: dumper: security_stream_close(808d558) 1275368153.491938: dumper: putresult: 10 FAILED 1275368153.987742: dumper: getcmd: PORT-DUMP 00-00018 53613 hertz 9ffe7f /gauss/export/pkgs /zvol/AMBKUPLOFS/gauss/export/pkgs 0 1970:1:1:0:0:0 GNUTAR X X X BSD | authBSD/auth\n recordYES/record\n indexYES/index\n 1275368153.990600: dumper: make_socket opening socket with family 2 1275368153.990650: dumper: connect_port: Try port 1024: available - Success 1275368153.990731: dumper: connected to 127.0.0.1.53613 1275368153.990737: dumper: our side is 0.0.0.0.1024 1275368153.990746: dumper: try_socksize: send buffer size is 65536 1275368153.990837: dumper: send request: SERVICE sendbackup OPTIONS features=9ffe7f;hostname=hertz;config=daily; dle programGNUTAR/program disk/gauss/export/pkgs/disk diskdevice/zvol/AMBKUPLOFS/gauss/export/pkgs
Re: request failed: timeout waiting for REP
Adjusting timers had no effect. It seems like its actually a problem with the dump utility hanging. There are a number of open bugs on the FreeBSD bug list related to the dump utility. I actually ended up upgraded to 2.6.0p2 and complied with DUMP_SNAPSHOT support and haven't had any problems since. Just thought I'd let the list know if anyone else runs into this problem. Jeremiah Dustin J. Mitchell wrote: It may be a simple timeout, since it only ever gets to 8.93% - have you tried increasing dtimeout? Dustin No virus found in this incoming message. Checked by AVG - http://www.avg.com Version: 8.0.176 / Virus Database: 270.10.9/1902 - Release Date: 1/19/2009 9:37 AM
Re: request failed: timeout waiting for REP
Thanks for the reply. No the DLE is not on a physically separate disk. This particular server is a straight raid 1 mirror with all the partitions I backup (/, /var, /tmp, /usr) all on that single mirror. The only one that is failing is /usr. I'm having a hard time wrapping my brain around why I'm never able to get it to dump. Any other suggestions or ideas are appreciated. Jeremiah Jim Summers wrote: Jeremiah Millay wrote: Greetings, I'm having issues backing up a single partition on one of my servers. I always see this in my amanda emails: FAILURE DUMP SUMMARY: echo /usr lev 0 FAILED [data timeout] echo /usr lev 0 FAILED [too many dumper retry: [request failed: timeout waiting for REP]] echo /usr lev 0 FAILED [cannot read header: got 0 bytes instead of 32768] FAILED DUMP DETAILS: /-- echo /usr lev 0 FAILED [data timeout]me sendbackup: start [echo:/usr level 0] sendbackup: info BACKUP=/sbin/dump sendbackup: info RECOVER_CMD=/sbin/restore -f - ... sendbackup: info end | DUMP: WARNING: should use -L when dumping live read-write filesystems! | DUMP: Date of this level 0 dump: Sun Jan 11 18:52:58 2009 | DUMP: Date of last level 0 dump: the epoch | DUMP: Dumping /dev/da0s1f (/usr) to standard output | DUMP: mapping (Pass I) [regular files] | DUMP: mapping (Pass II) [directories] | DUMP: estimated 9162049 tape blocks. | DUMP: dumping (Pass III) [directories] | DUMP: dumping (Pass IV) [regular files] | DUMP: 1.08% done, finished in 7:36 at Mon Jan 12 02:35:01 2009 | DUMP: 3.03% done, finished in 5:20 at Mon Jan 12 00:23:31 2009 | DUMP: 4.86% done, finished in 4:53 at Mon Jan 12 00:01:31 2009 | DUMP: 6.67% done, finished in 4:39 at Sun Jan 11 23:52:51 2009 | DUMP: 8.38% done, finished in 4:33 at Sun Jan 11 23:51:20 2009 | DUMP: 8.93% done, finished in 5:06 at Mon Jan 12 00:29:05 2009 \ When I log into the machine after the amdump has finished for all of the rest of the partitions on this particular server there are hung dump and restore processes on the server. I always have to kill them the next day to prevent problems with the next backup. As I mentioned I'm able to successfully backup all of my other partitions on this server so there is no issue with firewalls or anything like that. Looking through the list archives I've tried messing with etimeout and dtimeout on my amanda server but it hasn't resulted in anything useful: etimeout 3600 dtimeout 3600 Here is some output from sendbackup on the client: [r...@echo /tmp/amanda/client/DailySet1]# less sendbackup.2009085258.debug sendbackup: debug 1 pid 64539 ruid 2 euid 2: start at Sun Jan 11 18:52:58 2009 sendbackup: version 2.5.1p3 Could not open conf file /usr/local/etc/amanda/amanda-client.conf: No such file or directory Could not open conf file /usr/local/etc/amanda/DailySet1/amanda-client.conf: No such file or directory sendbackup: debug 1 pid 64539 ruid 2 euid 2: rename at Sun Jan 11 18:52:58 2009 sendbackup req: DUMP /usr 0 1970:1:1:0:0:0 OPTIONS |;auth=BSD;srvcomp-best;index; parsed request as: program `DUMP' disk `/usr' device `/usr' level 0 since 1970:1:1:0:0:0 options `|;auth=BSD;srvcomp-best;index;' sendbackup: start: echo:/usr lev 0 sendbackup: time 0.000: dumping device '/dev/da0s1f' with 'ufs' sendbackup: time 0.000: spawning /sbin/dump in pipeline sendbackup: argument list: dump 0ushf 1048576 0 - /dev/da0s1f sendbackup: time 0.000: started backup sendbackup: time 0.001: started index creator: /sbin/restore -tvf - 21 | sed -e ' s/^leaf[]*[0-9]*[ ]*\.// t /^dir[ ]/ { s/^dir[ ]*[0-9]*[ ]*\.// s%$%/% t } d ' sendbackup: time 0.002: 87: normal(|): DUMP: WARNING: should use -L when dumping live read-write filesystems! sendbackup: time 0.002: 87: normal(|): DUMP: Date of this level 0 dump: Sun Jan 11 18:52:58 2009 sendbackup: time 0.002: 87: normal(|): DUMP: Date of last level 0 dump: the epoch sendbackup: time 0.003: 87: normal(|): DUMP: Dumping /dev/da0s1f (/usr) to standard output sendbackup: time 0.044: 87: normal(|): DUMP: mapping (Pass I) [regular files] sendbackup: time 5.234: 87: normal(|): DUMP: mapping (Pass II) [directories] sendbackup: time 5.234: 87: normal(|): DUMP: estimated 9162049 tape blocks. sendbackup: time 8.527: 87: normal(|): DUMP: dumping (Pass III) [directories] sendbackup: time 249.068: 87: normal(|): DUMP: dumping (Pass IV) [regular files] sendbackup: time 304.544: 87: normal(|): DUMP: 1.08% done, finished in 7:36 at Mon Jan 12 02:35:01 2009 sendbackup: time 604.420: 87: normal(|): DUMP: 3.03% done, finished in 5:20 at Mon Jan 12 00:23:31 2009 sendbackup: time 904.583: 87: normal(|): DUMP: 4.86% done, finished in 4:53 at Mon Jan 12 00:01:31 2009 sendbackup: time 1204.442: 87: normal(|): DUMP: 6.67% done, finished in 4:39 at Sun Jan 11
Re: request failed: timeout waiting for REP
On Sat, Jan 17, 2009 at 9:36 PM, Jeremiah Millay mil...@sover.net wrote: Thanks for the reply. No the DLE is not on a physically separate disk. This particular server is a straight raid 1 mirror with all the partitions I backup (/, /var, /tmp, /usr) all on that single mirror. The only one that is failing is /usr. I'm having a hard time wrapping my brain around why I'm never able to get it to dump. It may be a simple timeout, since it only ever gets to 8.93% - have you tried increasing dtimeout? Dustin -- Storage Software Engineer http://www.zmanda.com
Re: request failed: timeout waiting for REP
Jeremiah Millay wrote: Greetings, I'm having issues backing up a single partition on one of my servers. I always see this in my amanda emails: FAILURE DUMP SUMMARY: echo /usr lev 0 FAILED [data timeout] echo /usr lev 0 FAILED [too many dumper retry: [request failed: timeout waiting for REP]] echo /usr lev 0 FAILED [cannot read header: got 0 bytes instead of 32768] FAILED DUMP DETAILS: /-- echo /usr lev 0 FAILED [data timeout]me sendbackup: start [echo:/usr level 0] sendbackup: info BACKUP=/sbin/dump sendbackup: info RECOVER_CMD=/sbin/restore -f - ... sendbackup: info end | DUMP: WARNING: should use -L when dumping live read-write filesystems! | DUMP: Date of this level 0 dump: Sun Jan 11 18:52:58 2009 | DUMP: Date of last level 0 dump: the epoch | DUMP: Dumping /dev/da0s1f (/usr) to standard output | DUMP: mapping (Pass I) [regular files] | DUMP: mapping (Pass II) [directories] | DUMP: estimated 9162049 tape blocks. | DUMP: dumping (Pass III) [directories] | DUMP: dumping (Pass IV) [regular files] | DUMP: 1.08% done, finished in 7:36 at Mon Jan 12 02:35:01 2009 | DUMP: 3.03% done, finished in 5:20 at Mon Jan 12 00:23:31 2009 | DUMP: 4.86% done, finished in 4:53 at Mon Jan 12 00:01:31 2009 | DUMP: 6.67% done, finished in 4:39 at Sun Jan 11 23:52:51 2009 | DUMP: 8.38% done, finished in 4:33 at Sun Jan 11 23:51:20 2009 | DUMP: 8.93% done, finished in 5:06 at Mon Jan 12 00:29:05 2009 \ When I log into the machine after the amdump has finished for all of the rest of the partitions on this particular server there are hung dump and restore processes on the server. I always have to kill them the next day to prevent problems with the next backup. As I mentioned I'm able to successfully backup all of my other partitions on this server so there is no issue with firewalls or anything like that. Looking through the list archives I've tried messing with etimeout and dtimeout on my amanda server but it hasn't resulted in anything useful: etimeout 3600 dtimeout 3600 Here is some output from sendbackup on the client: [r...@echo /tmp/amanda/client/DailySet1]# less sendbackup.2009085258.debug sendbackup: debug 1 pid 64539 ruid 2 euid 2: start at Sun Jan 11 18:52:58 2009 sendbackup: version 2.5.1p3 Could not open conf file /usr/local/etc/amanda/amanda-client.conf: No such file or directory Could not open conf file /usr/local/etc/amanda/DailySet1/amanda-client.conf: No such file or directory sendbackup: debug 1 pid 64539 ruid 2 euid 2: rename at Sun Jan 11 18:52:58 2009 sendbackup req: DUMP /usr 0 1970:1:1:0:0:0 OPTIONS |;auth=BSD;srvcomp-best;index; parsed request as: program `DUMP' disk `/usr' device `/usr' level 0 since 1970:1:1:0:0:0 options `|;auth=BSD;srvcomp-best;index;' sendbackup: start: echo:/usr lev 0 sendbackup: time 0.000: dumping device '/dev/da0s1f' with 'ufs' sendbackup: time 0.000: spawning /sbin/dump in pipeline sendbackup: argument list: dump 0ushf 1048576 0 - /dev/da0s1f sendbackup: time 0.000: started backup sendbackup: time 0.001: started index creator: /sbin/restore -tvf - 21 | sed -e ' s/^leaf[]*[0-9]*[ ]*\.// t /^dir[ ]/ { s/^dir[ ]*[0-9]*[ ]*\.// s%$%/% t } d ' sendbackup: time 0.002: 87: normal(|): DUMP: WARNING: should use -L when dumping live read-write filesystems! sendbackup: time 0.002: 87: normal(|): DUMP: Date of this level 0 dump: Sun Jan 11 18:52:58 2009 sendbackup: time 0.002: 87: normal(|): DUMP: Date of last level 0 dump: the epoch sendbackup: time 0.003: 87: normal(|): DUMP: Dumping /dev/da0s1f (/usr) to standard output sendbackup: time 0.044: 87: normal(|): DUMP: mapping (Pass I) [regular files] sendbackup: time 5.234: 87: normal(|): DUMP: mapping (Pass II) [directories] sendbackup: time 5.234: 87: normal(|): DUMP: estimated 9162049 tape blocks. sendbackup: time 8.527: 87: normal(|): DUMP: dumping (Pass III) [directories] sendbackup: time 249.068: 87: normal(|): DUMP: dumping (Pass IV) [regular files] sendbackup: time 304.544: 87: normal(|): DUMP: 1.08% done, finished in 7:36 at Mon Jan 12 02:35:01 2009 sendbackup: time 604.420: 87: normal(|): DUMP: 3.03% done, finished in 5:20 at Mon Jan 12 00:23:31 2009 sendbackup: time 904.583: 87: normal(|): DUMP: 4.86% done, finished in 4:53 at Mon Jan 12 00:01:31 2009 sendbackup: time 1204.442: 87: normal(|): DUMP: 6.67% done, finished in 4:39 at Sun Jan 11 23:52:51 2009 sendbackup: time 1504.647: 87: normal(|): DUMP: 8.38% done, finished in 4:33 at Sun Jan 11 23:51:20 2009 sendbackup: time 1804.618: 87: normal(|): DUMP: 8.93% done, finished in 5:06 at Mon Jan 12 00:29:05 2009 Output from amandad on the client: amandad: debug 1 pid 64538 ruid 2 euid 2: start at Sun Jan 11 18:52:58 2009 security_getdriver(name=BSD) returns 0x280ac020 amandad