Re: timeout waiting for REP

2010-06-07 Thread Dustin J. Mitchell
On Tue, Jun 1, 2010 at 9:39 AM, McGraw, Robert P rmcg...@purdue.edu wrote:
 I am seeing several of the following dumper: [request failed: timeout 
 waiting for REP](too) errors when I run the amstatus command. Below is a 
 snippet of one of the errors.

Is the (too) really part of the error?

 Can someone tell me what might be causing this error?

Timeouts?

 1275368274.080161: dumper: (sockaddr_in *)fef206c4 = { 2, 10080, 
 128.210.3.201 }
 1275368334.089824: dumper: security_seterror(handle=807bc10, driver=fef0ad20 
 (BSD) error=timeout waiting for REP)

That's 60 seconds..  as I've said *many* times, the BSD auth is from
the stone age.  Use something at least slightly newer, like BSDTCP!

Dustin

P.S. Despite a lot of voiced interest, nobody is working on the ssl
auth at the moment.  Anyone want to pick that project up?

-- 
Open Source Storage Engineer
http://www.zmanda.com


timeout waiting for REP

2010-06-01 Thread McGraw, Robert P
I am running Amanda 2.6.1p2 on a Sun Solaris 10 

I am seeing several of the following dumper: [request failed: timeout waiting 
for REP](too) errors when I run the amstatus command. Below is a snippet of 
one of the errors.

There were 57 DLE backups of the 57 4 gave the above error. 

I ran amcheck on the 57 DLE and had no errors.

Can someone tell me what might be causing this error?

Thanks

Robert


./sbin/amstatus daily --date

:
20100531231421 hertz:/gauss/export/pkgs 0  dumper: [request 
failed: timeout waiting for REP](too)
:

less ./dumper.20100531231421004.debug
:
:
1275364553.448213: dumper: security_streaminit(stream=807d4d8, driver=fef0ad20 
(BSD))
1275364553.448345: dumper: make_socket opening socket with family 2
1275364553.448412: dumper: connect_port: Try  port 1024: available - Success
1275364553.448485: dumper: connected to 128.210.3.201.49165
1275364553.448491: dumper: our side is 0.0.0.0.1024
1275364553.448499: dumper: try_socksize: send buffer size is 65536
1275364553.448507: dumper: try_socksize: receive buffer size is 65536
1275364553.448525: dumper: security_streaminit(stream=8085518, driver=fef0ad20 
(BSD))
1275364553.448611: dumper: make_socket opening socket with family 2
1275364553.448636: dumper: connect_port: Try  port 1024: available - Success
1275364553.448665: dumper: connected to 128.210.3.201.49166
1275364553.448672: dumper: our side is 0.0.0.0.1024
1275364553.448679: dumper: try_socksize: send buffer size is 65536
1275364553.448685: dumper: try_socksize: receive buffer size is 65536
1275364553.448704: dumper: security_streaminit(stream=808d558, driver=fef0ad20 
(BSD))
1275364553.448785: dumper: make_socket opening socket with family 2
1275364553.448821: dumper: connect_port: Try  port 1024: available - Success
1275364553.448849: dumper: connected to 128.210.3.201.49167
1275364553.448855: dumper: our side is 0.0.0.0.1024
1275364553.448861: dumper: try_socksize: send buffer size is 65536
1275364553.448868: dumper: try_socksize: receive buffer size is 65536
1275364553.448883: dumper: security_close(handle=8077560, driver=fef0ad20 (BSD))
1275364553.491297: dumper: Building type 3 (FILE) header of size 32768 using:
1275364553.491329: dumper: Contents of *(dumpfile_t *)8069b60:
1275364553.491341: dumper: type = 3 (FILE)
1275364553.491346: dumper: datestamp= 
'20100531231421'1275364553.491350: dumper: dumplevel= 0
1275364553.491354: dumper: compressed   = 0
1275364553.491357: dumper: encrypted= 0
1275364553.491361: dumper: comp_suffix  = 'N'
1275364553.491365: dumper: encrypt_suffix   = 'N'
1275364553.491368: dumper: name = 'hertz'
1275364553.491372: dumper: disk = '/gauss/export/pkgs'
1275364553.491376: dumper: program  = '/usr/sfw/bin/gtar'
1275364553.491380: dumper: application  = ''
1275364553.491384: dumper: srvcompprog  = ''
1275364553.491387: dumper: clntcompprog = ''
1275364553.491391: dumper: srv_encrypt  = ''
1275364553.491394: dumper: clnt_encrypt = ''
1275364553.491398: dumper: recover_cmd  = '/usr/sfw/bin/gtar -xpGf - 
...'
1275364553.491402: dumper: uncompress_cmd   = ''
1275364553.491405: dumper: encrypt_cmd  = ''
1275364553.491409: dumper: decrypt_cmd  = ''
1275364553.491413: dumper: srv_decrypt_opt  = ''
1275364553.491416: dumper: clnt_decrypt_opt = ''
1275364553.491420: dumper: cont_filename= ''
1275364553.491423: dumper: dle_str  = dle
  programGNUTAR/program
  disk/gauss/export/pkgs/disk
  diskdevice/zvol/AMBKUPLOFS/gauss/export/pkgs/diskdevice
  level0/level
  authBSD/auth
  recordYES/record
  indexYES/index
/dle

1275364553.491428: dumper: is_partial   = 0
1275364553.491432: dumper: partnum  = 0
1275364553.491436: dumper: totalparts   = 0
1275364553.491439: dumper: blocksize= 32768
1275368153.491794: dumper: security_stream_close(807d4d8)
1275368153.491867: dumper: security_stream_close(8085518)
1275368153.491889: dumper: security_stream_close(808d558)
1275368153.491938: dumper: putresult: 10 FAILED
1275368153.987742: dumper: getcmd: PORT-DUMP 00-00018 53613 hertz 
9ffe7f /gauss/export/pkgs /zvol/AMBKUPLOFS/gauss/export/pkgs 0 
1970:1:1:0:0:0 GNUTAR X X X BSD |  authBSD/auth\n  recordYES/record\n  
indexYES/index\n
1275368153.990600: dumper: make_socket opening socket with family 2
1275368153.990650: dumper: connect_port: Try  port 1024: available - Success
1275368153.990731: dumper: connected to 127.0.0.1.53613
1275368153.990737: dumper: our side is 0.0.0.0.1024
1275368153.990746: dumper: try_socksize: send buffer size is 65536
1275368153.990837: dumper: send request:

SERVICE sendbackup
OPTIONS features=9ffe7f;hostname=hertz;config=daily;
dle
  programGNUTAR/program
  disk/gauss/export/pkgs/disk
  diskdevice/zvol/AMBKUPLOFS/gauss/export/pkgs

Re: request failed: timeout waiting for REP

2009-02-03 Thread Jeremiah Millay
Adjusting timers had no effect. It seems like its actually a problem 
with the dump utility hanging. There are a number of open bugs on the 
FreeBSD bug list related to the dump utility. I actually ended up 
upgraded to 2.6.0p2 and complied with DUMP_SNAPSHOT support and haven't 
had any problems since. Just thought I'd let the list know if anyone 
else runs into this problem.

Jeremiah

Dustin J. Mitchell wrote:



It may be a simple timeout, since it only ever gets to 8.93% - have
you tried increasing dtimeout?

Dustin

  




No virus found in this incoming message.
Checked by AVG - http://www.avg.com 
Version: 8.0.176 / Virus Database: 270.10.9/1902 - Release Date: 1/19/2009 9:37 AM


  




Re: request failed: timeout waiting for REP

2009-01-17 Thread Jeremiah Millay
Thanks for the reply. No the DLE is not on a physically separate disk. 
This particular server is a straight raid 1 mirror with all the 
partitions I backup (/, /var, /tmp, /usr) all on that single mirror. The 
only one that is failing is /usr. I'm having a hard time wrapping my 
brain around why I'm never able to get it to dump.


Any other suggestions or ideas are appreciated.
Jeremiah

Jim Summers wrote:



Jeremiah Millay wrote:

Greetings,
I'm having issues backing up a single partition on one of my servers. 
I always see this in my amanda emails:


FAILURE DUMP SUMMARY:
  echo /usr lev 0  FAILED [data timeout]
  echo /usr lev 0  FAILED [too many dumper retry: [request failed: 
timeout waiting for REP]]
  echo /usr lev 0  FAILED [cannot read header: got 0 bytes instead of 
32768]



FAILED DUMP DETAILS:

/--  echo /usr lev 0 FAILED [data timeout]me
sendbackup: start [echo:/usr level 0]
sendbackup: info BACKUP=/sbin/dump
sendbackup: info RECOVER_CMD=/sbin/restore -f - ...
sendbackup: info end
|   DUMP: WARNING: should use -L when dumping live read-write 
filesystems!

|   DUMP: Date of this level 0 dump: Sun Jan 11 18:52:58 2009
|   DUMP: Date of last level 0 dump: the epoch
|   DUMP: Dumping /dev/da0s1f (/usr) to standard output
|   DUMP: mapping (Pass I) [regular files]
|   DUMP: mapping (Pass II) [directories]
|   DUMP: estimated 9162049 tape blocks.
|   DUMP: dumping (Pass III) [directories]
|   DUMP: dumping (Pass IV) [regular files]
|   DUMP: 1.08% done, finished in 7:36 at Mon Jan 12 02:35:01 2009
|   DUMP: 3.03% done, finished in 5:20 at Mon Jan 12 00:23:31 2009
|   DUMP: 4.86% done, finished in 4:53 at Mon Jan 12 00:01:31 2009
|   DUMP: 6.67% done, finished in 4:39 at Sun Jan 11 23:52:51 2009
|   DUMP: 8.38% done, finished in 4:33 at Sun Jan 11 23:51:20 2009
|   DUMP: 8.93% done, finished in 5:06 at Mon Jan 12 00:29:05 2009
\


When I log into the machine after the amdump has finished for all of 
the rest of the partitions on this particular server there are hung 
dump and restore processes on the server. I always have to kill them 
the next day to prevent problems with the next backup.


As I mentioned I'm able to successfully backup all of my other 
partitions on this server so there is no issue with firewalls or 
anything like that. Looking through the list archives I've tried 
messing with etimeout and dtimeout on my amanda server but it hasn't 
resulted in anything useful:


etimeout 3600
dtimeout 3600



Here is some output from sendbackup on the client:

[r...@echo /tmp/amanda/client/DailySet1]# less 
sendbackup.2009085258.debug
sendbackup: debug 1 pid 64539 ruid 2 euid 2: start at Sun Jan 11 
18:52:58 2009

sendbackup: version 2.5.1p3
Could not open conf file /usr/local/etc/amanda/amanda-client.conf: 
No such file or directory
Could not open conf file 
/usr/local/etc/amanda/DailySet1/amanda-client.conf: No such file or 
directory
sendbackup: debug 1 pid 64539 ruid 2 euid 2: rename at Sun Jan 11 
18:52:58 2009
 sendbackup req: DUMP /usr  0 1970:1:1:0:0:0 OPTIONS 
|;auth=BSD;srvcomp-best;index;

 parsed request as: program `DUMP'
disk `/usr'
device `/usr'
level 0
since 1970:1:1:0:0:0
options `|;auth=BSD;srvcomp-best;index;'
sendbackup: start: echo:/usr lev 0
sendbackup: time 0.000: dumping device '/dev/da0s1f' with 'ufs'
sendbackup: time 0.000: spawning /sbin/dump in pipeline
sendbackup: argument list: dump 0ushf 1048576 0 - /dev/da0s1f
sendbackup: time 0.000: started backup
sendbackup: time 0.001: started index creator: /sbin/restore -tvf - 
21 | sed -e '

s/^leaf[]*[0-9]*[   ]*\.//
t
/^dir[  ]/ {
s/^dir[ ]*[0-9]*[   ]*\.//
s%$%/%
t
}
d
'
sendbackup: time 0.002:  87:  normal(|):   DUMP: WARNING: should use 
-L when dumping live read-write filesystems!
sendbackup: time 0.002:  87:  normal(|):   DUMP: Date of this level 0 
dump: Sun Jan 11 18:52:58 2009
sendbackup: time 0.002:  87:  normal(|):   DUMP: Date of last level 0 
dump: the epoch
sendbackup: time 0.003:  87:  normal(|):   DUMP: Dumping /dev/da0s1f 
(/usr) to standard output
sendbackup: time 0.044:  87:  normal(|):   DUMP: mapping (Pass I) 
[regular files]
sendbackup: time 5.234:  87:  normal(|):   DUMP: mapping (Pass II) 
[directories]
sendbackup: time 5.234:  87:  normal(|):   DUMP: estimated 9162049 
tape blocks.
sendbackup: time 8.527:  87:  normal(|):   DUMP: dumping (Pass III) 
[directories]
sendbackup: time 249.068:  87:  normal(|):   DUMP: dumping (Pass IV) 
[regular files]
sendbackup: time 304.544:  87:  normal(|):   DUMP: 1.08% done, 
finished in 7:36 at Mon Jan 12 02:35:01 2009
sendbackup: time 604.420:  87:  normal(|):   DUMP: 3.03% done, 
finished in 5:20 at Mon Jan 12 00:23:31 2009
sendbackup: time 904.583:  87:  normal(|):   DUMP: 4.86% done, 
finished in 4:53 at Mon Jan 12 00:01:31 2009
sendbackup: time 1204.442:  87:  normal(|):   DUMP: 6.67% done, 
finished in 4:39 at Sun Jan 11

Re: request failed: timeout waiting for REP

2009-01-17 Thread Dustin J. Mitchell
On Sat, Jan 17, 2009 at 9:36 PM, Jeremiah Millay mil...@sover.net wrote:
 Thanks for the reply. No the DLE is not on a physically separate disk. This
 particular server is a straight raid 1 mirror with all the partitions I
 backup (/, /var, /tmp, /usr) all on that single mirror. The only one that is
 failing is /usr. I'm having a hard time wrapping my brain around why I'm
 never able to get it to dump.

It may be a simple timeout, since it only ever gets to 8.93% - have
you tried increasing dtimeout?

Dustin

-- 
Storage Software Engineer
http://www.zmanda.com


Re: request failed: timeout waiting for REP

2009-01-16 Thread Jim Summers



Jeremiah Millay wrote:

Greetings,
I'm having issues backing up a single partition on one of my servers. I 
always see this in my amanda emails:


FAILURE DUMP SUMMARY:
  echo /usr lev 0  FAILED [data timeout]
  echo /usr lev 0  FAILED [too many dumper retry: [request failed: 
timeout waiting for REP]]
  echo /usr lev 0  FAILED [cannot read header: got 0 bytes instead of 
32768]



FAILED DUMP DETAILS:

/--  echo /usr lev 0 FAILED [data timeout]me
sendbackup: start [echo:/usr level 0]
sendbackup: info BACKUP=/sbin/dump
sendbackup: info RECOVER_CMD=/sbin/restore -f - ...
sendbackup: info end
|   DUMP: WARNING: should use -L when dumping live read-write filesystems!
|   DUMP: Date of this level 0 dump: Sun Jan 11 18:52:58 2009
|   DUMP: Date of last level 0 dump: the epoch
|   DUMP: Dumping /dev/da0s1f (/usr) to standard output
|   DUMP: mapping (Pass I) [regular files]
|   DUMP: mapping (Pass II) [directories]
|   DUMP: estimated 9162049 tape blocks.
|   DUMP: dumping (Pass III) [directories]
|   DUMP: dumping (Pass IV) [regular files]
|   DUMP: 1.08% done, finished in 7:36 at Mon Jan 12 02:35:01 2009
|   DUMP: 3.03% done, finished in 5:20 at Mon Jan 12 00:23:31 2009
|   DUMP: 4.86% done, finished in 4:53 at Mon Jan 12 00:01:31 2009
|   DUMP: 6.67% done, finished in 4:39 at Sun Jan 11 23:52:51 2009
|   DUMP: 8.38% done, finished in 4:33 at Sun Jan 11 23:51:20 2009
|   DUMP: 8.93% done, finished in 5:06 at Mon Jan 12 00:29:05 2009
\


When I log into the machine after the amdump has finished for all of the 
rest of the partitions on this particular server there are hung dump and 
restore processes on the server. I always have to kill them the next day 
to prevent problems with the next backup.


As I mentioned I'm able to successfully backup all of my other 
partitions on this server so there is no issue with firewalls or 
anything like that. Looking through the list archives I've tried messing 
with etimeout and dtimeout on my amanda server but it hasn't resulted in 
anything useful:


etimeout 3600
dtimeout 3600



Here is some output from sendbackup on the client:

[r...@echo /tmp/amanda/client/DailySet1]# less 
sendbackup.2009085258.debug
sendbackup: debug 1 pid 64539 ruid 2 euid 2: start at Sun Jan 11 
18:52:58 2009

sendbackup: version 2.5.1p3
Could not open conf file /usr/local/etc/amanda/amanda-client.conf: No 
such file or directory
Could not open conf file 
/usr/local/etc/amanda/DailySet1/amanda-client.conf: No such file or 
directory
sendbackup: debug 1 pid 64539 ruid 2 euid 2: rename at Sun Jan 11 
18:52:58 2009
 sendbackup req: DUMP /usr  0 1970:1:1:0:0:0 OPTIONS 
|;auth=BSD;srvcomp-best;index;

 parsed request as: program `DUMP'
disk `/usr'
device `/usr'
level 0
since 1970:1:1:0:0:0
options `|;auth=BSD;srvcomp-best;index;'
sendbackup: start: echo:/usr lev 0
sendbackup: time 0.000: dumping device '/dev/da0s1f' with 'ufs'
sendbackup: time 0.000: spawning /sbin/dump in pipeline
sendbackup: argument list: dump 0ushf 1048576 0 - /dev/da0s1f
sendbackup: time 0.000: started backup
sendbackup: time 0.001: started index creator: /sbin/restore -tvf - 
21 | sed -e '

s/^leaf[]*[0-9]*[   ]*\.//
t
/^dir[  ]/ {
s/^dir[ ]*[0-9]*[   ]*\.//
s%$%/%
t
}
d
'
sendbackup: time 0.002:  87:  normal(|):   DUMP: WARNING: should use -L 
when dumping live read-write filesystems!
sendbackup: time 0.002:  87:  normal(|):   DUMP: Date of this level 0 
dump: Sun Jan 11 18:52:58 2009
sendbackup: time 0.002:  87:  normal(|):   DUMP: Date of last level 0 
dump: the epoch
sendbackup: time 0.003:  87:  normal(|):   DUMP: Dumping /dev/da0s1f 
(/usr) to standard output
sendbackup: time 0.044:  87:  normal(|):   DUMP: mapping (Pass I) 
[regular files]
sendbackup: time 5.234:  87:  normal(|):   DUMP: mapping (Pass II) 
[directories]
sendbackup: time 5.234:  87:  normal(|):   DUMP: estimated 9162049 tape 
blocks.
sendbackup: time 8.527:  87:  normal(|):   DUMP: dumping (Pass III) 
[directories]
sendbackup: time 249.068:  87:  normal(|):   DUMP: dumping (Pass IV) 
[regular files]
sendbackup: time 304.544:  87:  normal(|):   DUMP: 1.08% done, finished 
in 7:36 at Mon Jan 12 02:35:01 2009
sendbackup: time 604.420:  87:  normal(|):   DUMP: 3.03% done, finished 
in 5:20 at Mon Jan 12 00:23:31 2009
sendbackup: time 904.583:  87:  normal(|):   DUMP: 4.86% done, finished 
in 4:53 at Mon Jan 12 00:01:31 2009
sendbackup: time 1204.442:  87:  normal(|):   DUMP: 6.67% done, finished 
in 4:39 at Sun Jan 11 23:52:51 2009
sendbackup: time 1504.647:  87:  normal(|):   DUMP: 8.38% done, finished 
in 4:33 at Sun Jan 11 23:51:20 2009
sendbackup: time 1804.618:  87:  normal(|):   DUMP: 8.93% done, finished 
in 5:06 at Mon Jan 12 00:29:05 2009








Output from amandad on the client:


amandad: debug 1 pid 64538 ruid 2 euid 2: start at Sun Jan 11 18:52:58 2009
security_getdriver(name=BSD) returns 0x280ac020
amandad