Re: Q: 'all estimate timed out' error

2012-08-26 Thread Geert Uytterhoeven
On Thu, Aug 23, 2012 at 11:44 AM, Geert Uytterhoeven
ge...@linux-m68k.org wrote:
 On Mon, Oct 10, 2011 at 9:53 AM, Albrecht Dreß albrecht.dr...@arcor.de 
 wrote:
 I use amanda 2.5.2p1 on a Ubuntu 8.04 server to back up several machines.  
 The backup of /one/ disk from /one/ machine, which worked flawlessly for 
 years, now regularly throws the message

 FAILURE AND STRANGE DUMP SUMMARY:
   srv-erp3  /mnt1  lev 0  FAILED [disk /mnt1, all estimate timed out]
   planner: ERROR Request to srv-erp3 failed: timeout waiting for REP

 in the report, but the other disks are written properly:

 Did this ever got resolved? How?

 Since a few days, I'm getting the same error for one of my DLEs, which
 also worked
 flawlessly for years, and its contents haven't changed recently:

  FAILURE DUMP SUMMARY:
machine /path lev 0  FAILED Failed reading dump header.
machine /path lev 0  FAILED Failed reading dump header.
machine /path lev 0  FAILED [too many dumper retry: [request failed: 
 timeout
 waiting for REP]]

I discovered I had ca. 200 hanging backup processes, like:

backup3004  0.0  0.0  0 0 ?ZAug22   0:00
[amandad] defunct
backup3034  0.0  0.0  0 0 ?ZAug22   0:00
[sendbackup] defunct

and one like this:

backup   23748  0.0  0.0  40184   128 ?Ss   Aug19   1:00
amandad -auth=bsd amdump amindexd amidxtaped

After killing them, the next backup round completed succesfully...

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
-- Linus Torvalds



Re: Q: 'all estimate timed out' error

2012-08-23 Thread Geert Uytterhoeven
Hi Albrecht,

On Mon, Oct 10, 2011 at 9:53 AM, Albrecht Dreß albrecht.dr...@arcor.de wrote:
 I use amanda 2.5.2p1 on a Ubuntu 8.04 server to back up several machines.  
 The backup of /one/ disk from /one/ machine, which worked flawlessly for 
 years, now regularly throws the message

 FAILURE AND STRANGE DUMP SUMMARY:
   srv-erp3  /mnt1  lev 0  FAILED [disk /mnt1, all estimate timed out]
   planner: ERROR Request to srv-erp3 failed: timeout waiting for REP

 in the report, but the other disks are written properly:

Did this ever got resolved? How?

Since a few days, I'm getting the same error for one of my DLEs, which
also worked
flawlessly for years, and its contents haven't changed recently:

 FAILURE DUMP SUMMARY:
   machine /path lev 0  FAILED Failed reading dump header.
   machine /path lev 0  FAILED Failed reading dump header.
   machine /path lev 0  FAILED [too many dumper retry: [request failed: timeout
waiting for REP]]

The dumper log shows:

1345678162.592411: dumper: security_getdriver(name=BSD) returns 0x7f5180959900
1345678162.592421: dumper: security_handleinit(handle=0x1874000,
driver=0x7f5180959900 (BSD))
1345678162.592676: dumper: dgram_send_addr(addr=0x1874040, dgram=0x7f5180964188)
1345678162.592686: dumper: (sockaddr_in *)0x1874040 = { 2, 10080, 127.0.1.1 }
1345678162.592692: dumper: dgram_send_addr: 0x7f5180964188-socket = 4
1345678162.593013: dumper: dgram_recv(dgram=0x7f5180964188, timeout=0,
fromaddr=0x7f5180974180)
1345678162.593027: dumper: (sockaddr_in *)0x7f5180974180 = { 2, 10080,
127.0.1.1 }
1345678222.653129: dumper: dgram_send_addr(addr=0x1874040, dgram=0x7f5180964188)
1345678222.653163: dumper: (sockaddr_in *)0x1874040 = { 2, 10080, 127.0.1.1 }
1345678222.653170: dumper: dgram_send_addr: 0x7f5180964188-socket = 4
1345678222.653542: dumper: dgram_recv(dgram=0x7f5180964188, timeout=0,
fromaddr=0x7f5180974180)
1345678222.653556: dumper: (sockaddr_in *)0x7f5180974180 = { 2, 10080,
127.0.1.1 }
1345678282.671245: dumper: dgram_send_addr(addr=0x1874040, dgram=0x7f5180964188)
1345678282.671280: dumper: (sockaddr_in *)0x1874040 = { 2, 10080, 127.0.1.1 }
1345678282.671287: dumper: dgram_send_addr: 0x7f5180964188-socket = 4
1345678282.671645: dumper: dgram_recv(dgram=0x7f5180964188, timeout=0,
fromaddr=0x7f5180974180)
1345678282.671683: dumper: (sockaddr_in *)0x7f5180974180 = { 2, 10080,
127.0.1.1 }
1345678342.674842: dumper: security_seterror(handle=0x1874000,
driver=0x7f5180959900 (BSD) error=timeout waiting for REP)
1345678342.674897: dumper: security_close(handle=0x1874000,
driver=0x7f5180959900 (BSD))
1345678342.674938: dumper: putresult: 11 TRY-AGAIN

Amanda 2.6.1p1-2 on Ubuntu 10.04.4 LTS.

Thanks!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
-- Linus Torvalds