Re: failure "estimate of level x timed out"
Today I had put back the parameter estimate to "client" as it always had been, while etimeout is still at the new value of 14400. Backup failed again, even on 4 DLE's. I had changed "estimate" back because although the backup succeeded yesterday, it had this strange behaviour that it left 200Mb to be flushed, while it had only written 15% in total to the 20GB DDS4 backup tape, so ample space. Back to "estimate calcsize" and see if at least it runs correctly tomorrow. Regards, Charles On Tue, 11 Dec 2012 21:14:57 +0300 Alan Orth wrote: > Charles, > > Ah, I was mistaken that the error was not fatal due to the fact that > the summary email says "output size" (listing around 1TB of data over > 6 hours of backup time!) and "these dumps were to tape xxx." > > Well if they are indeed failures then the FAIL classification is > right indeed. Good to know! I really need to investigate my > estimate timeouts then... > > Cheers, > > Alan > > On 12/11/2012 12:56 PM, Charles Stroom wrote: > > The planner has an "ERROR" to make the estimate, but than later the > > dump itself FAILs as well. So no backup is made of that particular > > DLE. > > > > Regards, Charles > > > > > > > > On Tue, 11 Dec 2012 09:04:46 +0300 > > Alan Orth wrote: > > > >> Hi, All. > >> > >> It's good that you brought this up on the mailing list, I was just > >> about to ask! I've been having problems with estimation timeouts > >> lately too, so I'll try some of these tips to fix it. > >> > >> What confused me initially was why estimation failures are > >> classified as "FAIL"? It's quite worrying when you wake up in the > >> morning to find last night's backups have FAILED. Shouldn't the > >> classification be more similar to something like the STRANGE > >> errors (where files have changed during backup, for example)? > >> > >> Cheers, > >> > >> Alan > >> > >> On 12/10/2012 06:05 PM, Charles Stroom wrote: > >>> Hi, the forwarded email below was meant to go to the list, but I > >>> noticed later it was only to 1 recepient. Hence the forward. > >>> > >>> Regards, Charles > >>> > >>> > >>> > >>> Begin forwarded message: > >>> > >>> Date: Sat, 8 Dec 2012 22:08:48 +0100 > >>> From: Charles Stroom > >>> To: Jens Berg > >>> Subject: Re: failure "estimate of level x timed out" > >>> > >>> > >>> So far, so good. Since I have increased etimeout to 14400 AND set > >>> "estimate calcsize" I have had 2 backups without failures. The > >>> only thing I don't know yet which parameter did the trick. I > >>> keep my fingers crossed. > >>> > >>> Thanks both of you. > >>> > >>> Charles > >>> > >>> > >>> > >>> On Thu, 06 Dec 2012 09:49:28 +0100 > >>> Jens Berg wrote: > >>> > I would suggest to increase etimeout to a much bigger value, > let's say 14400 or so and see if the estimates finish at all > then. If they still fail, I would take a closer look on the > health of the hard discs... Another option could be to change > the estimate method for the dump type you are using, e.g. if you > are using "dumptype user-tar" for the DLEs, put an "estimate > calcsize" in the definition of "dumptype user-tar". The results > of that estimate method will be less accurate than the ones from > the default method but it executes faster. > > Best > Jens > > >> -- > >> Alan Orth > >> alan.o...@gmail.com > >> http://alaninkenya.org > >> http://mjanja.co.ke > >> "I have always wished for my computer to be as easy to use as my > >> telephone; my wish has come true because I can no longer figure out > >> how to use my telephone." -Bjarne Stroustrup, inventor of C++ > >> > > > > > -- > Alan Orth > alan.o...@gmail.com > http://alaninkenya.org > http://mjanja.co.ke > "I have always wished for my computer to be as easy to use as my > telephone; my wish has come true because I can no longer figure out > how to use my telephone." -Bjarne Stroustrup, inventor of C++ > -- Charles Stroom email: charles at no-spam.stremen.xs4all.nl (remove the "no-spam.")
amrecover fails if DLE was compressed
Hi all. I've just put amanda 2.6.1p2 and my existing (and long working) config files onto a new machine, and tested that it worked to both backup and recover. Then I uninstalled the 2.6.1p2 and installed amanda 3.3.2 Now I get a broken pipe in the amrecover log, exactly after I answer the "set owner/mode?" question and the amrecover window hangs until I control-C out of it. but only if the DLE includes compression.If I turn off compression and redo the backups, a recover will succeed. Even a recover which involved 2 tapes. versions: tar (GNU tar) 1.23 gzip 1.3.12 Any idea what the problem is? Deb Here are the amrecover and amandad logs, since they contain errors. I have other logs too if you need them, but I don't see any complaints in them. === amrecover.debug Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: pid 29473 ruid 0 euid 0 version 3.3.2: start at Tue Dec 11 13:11:11 2012 Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: config_overrides: conf daily Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: pid 29473 ruid 0 euid 0 version 3.3.2: rename at Tue Dec 11 13:11:11 2012 Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: security_getdriver(name=bsd) returns 0x7fc6596bd2e0 Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: security_handleinit(handle=0xfd0260, driver=0x7fc6596bd2e0 (BSD)) Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: dgram_bind: setting up a socket with family 2 Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: bind_portrange2: Skip port 848: Owned by gdoi. Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: bind_portrange2: Try port 849: Available - Success Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: dgram_bind: socket 3 bound to 0.0.0.0:849 Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: dgram_send_addr(addr=0xfd02a0, dgram=0x7fc6596c9da8) Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: (sockaddr_in *)0xfd02a0 = { 2, 10080, 131.225.121.103 } Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: dgram_send_addr: 0x7fc6596c9da8->socket = 3 Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: dgram_recv(dgram=0x7fc6596c9da8, timeout=0, fromaddr=0x7fc6596d9da0) Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: (sockaddr_in *)0x7fc6596d9da0 = { 2, 10080, 131.225.121.103 } Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: dgram_recv(dgram=0x7fc6596c9da8, timeout=0, fromaddr=0x7fc6596d9da0) Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: (sockaddr_in *)0x7fc6596d9da0 = { 2, 10080, 131.225.121.103 } Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: dgram_send_addr(addr=0xfd02a0, dgram=0x7fc6596c9da8) Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: (sockaddr_in *)0xfd02a0 = { 2, 10080, 131.225.121.103 } Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: dgram_send_addr: 0x7fc6596c9da8->socket = 3 Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: security_streaminit(stream=0xfd7840, driver=0x7fc6596bd2e0 (BSD)) Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: make_socket opening socket with family 2 Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: connect_port: Try port 5: available - Success Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: connected to 131.225.121.103:50006 Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: our side is 0.0.0.0:5 Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: try_socksize: send buffer size is 65536 Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: try_socksize: receive buffer size is 65536 Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: security_close(handle=0xfd0260, driver=0x7fc6596bd2e0 (BSD)) Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: sending: FEATURES 9efefbff1f Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: sending: DATE 2012-12-11 Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: sending: SCNF daily Tue Dec 11 13:11:11 2012: thd-0xfc4490: amrecover: sending: HOST mynode.fqdn Tue Dec 11 13:11:19 2012: thd-0xfc4490: amrecover: user command: 'setdate 2012-12-06' Tue Dec 11 13:11:19 2012: thd-0xfc4490: amrecover: sending: DATE 2012-12-06 Tue Dec 11 13:11:24 2012: thd-0xfc4490: amrecover: user command: 'setdisk /var' Tue Dec 11 13:11:24 2012: thd-0xfc4490: amrecover: sending: DISK /var Tue Dec 11 13:11:24 2012: thd-0xfc4490: amrecover: sending: OISD / Tue Dec 11 13:11:24 2012: thd-0xfc4490: amrecover: sending: OLSD / Tue Dec 11 13:11:24 2012: thd-0xfc4490: amrecover: add_dir_list_item: Adding "2012-12-06-12-56-36" "0" "adUXdaily-0daily-827test:3" "3" "/." Tue Dec 11 13:11:24 2012: thd-0xfc4490: amrecover: add_dir_list_item: Adding "2012-12-06-12-56-36" "0" "adUXdaily-0daily-827test:3" "3" "/account/" Tue Dec 11 13:11:24 2012: thd-0xfc4490: amrecover: add_dir_list_item: Adding "2012-12-06-12-56-36" "0" "adUXdaily-0daily-827test:3" "3" "/adm/" Tue Dec 11 13:11:24 2012: thd-0xfc4490: amrecover: add_dir_list_item: Adding "2012-12-06-12-56-36" "0" "adUXdaily-0daily-827test:3" "3" "/cach
Re: failure "estimate of level x timed out"
Charles, Ah, I was mistaken that the error was not fatal due to the fact that the summary email says "output size" (listing around 1TB of data over 6 hours of backup time!) and "these dumps were to tape xxx." Well if they are indeed failures then the FAIL classification is right indeed. Good to know! I really need to investigate my estimate timeouts then... Cheers, Alan On 12/11/2012 12:56 PM, Charles Stroom wrote: The planner has an "ERROR" to make the estimate, but than later the dump itself FAILs as well. So no backup is made of that particular DLE. Regards, Charles On Tue, 11 Dec 2012 09:04:46 +0300 Alan Orth wrote: Hi, All. It's good that you brought this up on the mailing list, I was just about to ask! I've been having problems with estimation timeouts lately too, so I'll try some of these tips to fix it. What confused me initially was why estimation failures are classified as "FAIL"? It's quite worrying when you wake up in the morning to find last night's backups have FAILED. Shouldn't the classification be more similar to something like the STRANGE errors (where files have changed during backup, for example)? Cheers, Alan On 12/10/2012 06:05 PM, Charles Stroom wrote: Hi, the forwarded email below was meant to go to the list, but I noticed later it was only to 1 recepient. Hence the forward. Regards, Charles Begin forwarded message: Date: Sat, 8 Dec 2012 22:08:48 +0100 From: Charles Stroom To: Jens Berg Subject: Re: failure "estimate of level x timed out" So far, so good. Since I have increased etimeout to 14400 AND set "estimate calcsize" I have had 2 backups without failures. The only thing I don't know yet which parameter did the trick. I keep my fingers crossed. Thanks both of you. Charles On Thu, 06 Dec 2012 09:49:28 +0100 Jens Berg wrote: I would suggest to increase etimeout to a much bigger value, let's say 14400 or so and see if the estimates finish at all then. If they still fail, I would take a closer look on the health of the hard discs... Another option could be to change the estimate method for the dump type you are using, e.g. if you are using "dumptype user-tar" for the DLEs, put an "estimate calcsize" in the definition of "dumptype user-tar". The results of that estimate method will be less accurate than the ones from the default method but it executes faster. Best Jens -- Alan Orth alan.o...@gmail.com http://alaninkenya.org http://mjanja.co.ke "I have always wished for my computer to be as easy to use as my telephone; my wish has come true because I can no longer figure out how to use my telephone." -Bjarne Stroustrup, inventor of C++ -- Alan Orth alan.o...@gmail.com http://alaninkenya.org http://mjanja.co.ke "I have always wished for my computer to be as easy to use as my telephone; my wish has come true because I can no longer figure out how to use my telephone." -Bjarne Stroustrup, inventor of C++
Re: failure "estimate of level x timed out"
The planner has an "ERROR" to make the estimate, but than later the dump itself FAILs as well. So no backup is made of that particular DLE. Regards, Charles On Tue, 11 Dec 2012 09:04:46 +0300 Alan Orth wrote: > Hi, All. > > It's good that you brought this up on the mailing list, I was just > about to ask! I've been having problems with estimation timeouts > lately too, so I'll try some of these tips to fix it. > > What confused me initially was why estimation failures are classified > as "FAIL"? It's quite worrying when you wake up in the morning to > find last night's backups have FAILED. Shouldn't the classification > be more similar to something like the STRANGE errors (where files > have changed during backup, for example)? > > Cheers, > > Alan > > On 12/10/2012 06:05 PM, Charles Stroom wrote: > > Hi, the forwarded email below was meant to go to the list, but I > > noticed later it was only to 1 recepient. Hence the forward. > > > > Regards, Charles > > > > > > > > Begin forwarded message: > > > > Date: Sat, 8 Dec 2012 22:08:48 +0100 > > From: Charles Stroom > > To: Jens Berg > > Subject: Re: failure "estimate of level x timed out" > > > > > > So far, so good. Since I have increased etimeout to 14400 AND set > > "estimate calcsize" I have had 2 backups without failures. The only > > thing I don't know yet which parameter did the trick. I keep my > > fingers crossed. > > > > Thanks both of you. > > > > Charles > > > > > > > > On Thu, 06 Dec 2012 09:49:28 +0100 > > Jens Berg wrote: > > > >> I would suggest to increase etimeout to a much bigger value, let's > >> say 14400 or so and see if the estimates finish at all then. If > >> they still fail, I would take a closer look on the health of the > >> hard discs... Another option could be to change the estimate > >> method for the dump type you are using, e.g. if you are using > >> "dumptype user-tar" for the DLEs, put an "estimate calcsize" in > >> the definition of "dumptype user-tar". The results of that > >> estimate method will be less accurate than the ones from the > >> default method but it executes faster. > >> > >> Best > >> Jens > >> > > -- > Alan Orth > alan.o...@gmail.com > http://alaninkenya.org > http://mjanja.co.ke > "I have always wished for my computer to be as easy to use as my > telephone; my wish has come true because I can no longer figure out > how to use my telephone." -Bjarne Stroustrup, inventor of C++ > -- Charles Stroom email: charles at no-spam.stremen.xs4all.nl (remove the "no-spam.")