Re: Question about "data timeout".
On Tue, 2005-08-23 at 11:24 -0400, Matt Hyclak wrote: > On Tue, Aug 23, 2005 at 05:04:02PM +0200, Erik P. Olsen enlightened us: > > > On Tue, Aug 23, 2005 at 11:19:59AM +0200, Erik P. Olsen wrote: > > > > I have recently added a set of disks (file systems) to my back-up set > > > > and that ended up with a failure due to "data timeout". I didn't even > > > > know there was a dtimeout value to be specified in amanda.conf. I have > > > > learnt that it is an idle time measured against the disks in question. > > > > > > > > My question is now, how is this idle time measured and where is it > > > > reported? > > > > > > > > Only by knowing what amanda sees of the idle time am I able to specify a > > > > reasonable dtimeout value. > > > > > > I may be totally wrong here, but I don't think it is tracking "idle" time. > > > I believe it is total time to dump. This would take care of "stuck" or > > > "runaway" dump scenarios. > > > > The documentation says: dtimeout int Default: 1800 seconds. Amount of > > idle time per disk on a given client that a dumper running from within > > amdump will wait before it fails with a data timeout error. > > Yes, and that "per disk" is important. If you have a machine with 3 Disklist > Entries (DLEs), it will wait 5400 seconds (90 minutes) for that machine. > Another machine with 1 DLE will only get 30 minutes to complete. I read it the way that each disk gets 1800 seconds idle (wait?) time before a time out. That is if disk 1 uses 1 second of that time the rest of 1799 seconds is "lost" and will not be added to the idle time of the two remaining disks. I have 13 DLEs that should give me 6H 30M if this theory is true, my data timeout happened after 3H 19M! I had hoped that amanda would report how much idle time had occurred for each disk. > -- Regards, Erik P. Olsen
Re: Question about "data timeout".
Jon LaBadie wrote: On Tue, Aug 23, 2005 at 11:19:59AM +0200, Erik P. Olsen wrote: I have recently added a set of disks (file systems) to my back-up set and that ended up with a failure due to "data timeout". I didn't even know there was a dtimeout value to be specified in amanda.conf. I have learnt that it is an idle time measured against the disks in question. My question is now, how is this idle time measured and where is it reported? Only by knowing what amanda sees of the idle time am I able to specify a reasonable dtimeout value. I may be totally wrong here, but I don't think it is tracking "idle" time. I believe it is total time to dump. This would take care of "stuck" or "runaway" dump scenarios. Correct me if I'm wrong -- the coffee machine is broken here, writing this on a diet of pure fresh water! Reading through the sources, it seems that dtimeout is used as timeout value on a select() call in dumper.c, around line 1356 (amanda 2.4.5 sources). The select waits for activity on the data stream or on the messages stream. That means that if there is no traffic received within dtimeout seconds on one of those streams, you get a "data timeout". The default 1800 seconds seems more than reasonable to me in that case. A pathological case could be a sequence of very compressable data (all "aaa"s or zero's, like an empty database file). Compressing such a sequence, together with some buffering on client and server, it could well take a long time before any bytes come out of such pipe. But 1800 seconds seems to me more than enough even for those cases. There is also one of the last "enhancements" in gnutar for handling sparse files, which could result in a large time without emiting any data (and some systems create sparse files with 64 bit sizes...): https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=154882 http://lists.gnu.org/archive/html/bug-tar/2005-07/msg00025.html But that is only when doing estimates, or does it also affect the backup itself? And of course firewall timeouts come into play too, blocking one of the streams (e.g. the messages stream has almost no traffic usually) resulting in never receiving the end-of-file indication on that stream. Which results after dtimetout seconds in "data timeout" too. -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, ^^, * * F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... * * ... "Are you sure?" ... YES ... Phew ... I'm out * ***
Re: Question about "data timeout".
Jon LaBadie wrote: The documentation says: dtimeout int Default: 1800 seconds. Amount of idle time per disk on a given client that a dumper running from within amdump will wait before it fails with a data timeout error. Glad I said I may be totally wrong :( Even though the document reads that way, I've found it to *behave* the way you described, John. When I added a new disk to a server recently that was over 200GB, I had to increase the timeout, otherwise the dump itself would trigger the timeout and cause it to abort. Is this expected behavior? If so, should the docs be modified? Graeme
Re: Question about "data timeout".
On Tue, Aug 23, 2005 at 05:04:02PM +0200, Erik P. Olsen wrote: > On Tue, 2005-08-23 at 09:38 -0400, Jon LaBadie wrote: > > > > I may be totally wrong here, but I don't think it is tracking "idle" time. > > I believe it is total time to dump. This would take care of "stuck" or > > "runaway" dump scenarios. > > The documentation says: dtimeout int Default: 1800 seconds. Amount of > idle time per disk on a given client that a dumper running from within > amdump will wait before it fails with a data timeout error. > Glad I said I may be totally wrong :( Thanks, -- Jon H. LaBadie [EMAIL PROTECTED] JG Computing 4455 Province Line Road(609) 252-0159 Princeton, NJ 08540-4322 (609) 683-7220 (fax)
Re: Question about "data timeout".
On Tue, Aug 23, 2005 at 05:04:02PM +0200, Erik P. Olsen enlightened us: > > On Tue, Aug 23, 2005 at 11:19:59AM +0200, Erik P. Olsen wrote: > > > I have recently added a set of disks (file systems) to my back-up set > > > and that ended up with a failure due to "data timeout". I didn't even > > > know there was a dtimeout value to be specified in amanda.conf. I have > > > learnt that it is an idle time measured against the disks in question. > > > > > > My question is now, how is this idle time measured and where is it > > > reported? > > > > > > Only by knowing what amanda sees of the idle time am I able to specify a > > > reasonable dtimeout value. > > > > I may be totally wrong here, but I don't think it is tracking "idle" time. > > I believe it is total time to dump. This would take care of "stuck" or > > "runaway" dump scenarios. > > The documentation says: dtimeout int Default: 1800 seconds. Amount of > idle time per disk on a given client that a dumper running from within > amdump will wait before it fails with a data timeout error. Yes, and that "per disk" is important. If you have a machine with 3 Disklist Entries (DLEs), it will wait 5400 seconds (90 minutes) for that machine. Another machine with 1 DLE will only get 30 minutes to complete. Matt -- Matt Hyclak Department of Mathematics Department of Social Work Ohio University (740) 593-1263 pgpAKPdkNThiJ.pgp Description: PGP signature
Re: Question about "data timeout".
On Tue, 2005-08-23 at 09:38 -0400, Jon LaBadie wrote: > On Tue, Aug 23, 2005 at 11:19:59AM +0200, Erik P. Olsen wrote: > > I have recently added a set of disks (file systems) to my back-up set > > and that ended up with a failure due to "data timeout". I didn't even > > know there was a dtimeout value to be specified in amanda.conf. I have > > learnt that it is an idle time measured against the disks in question. > > > > My question is now, how is this idle time measured and where is it > > reported? > > > > Only by knowing what amanda sees of the idle time am I able to specify a > > reasonable dtimeout value. > > I may be totally wrong here, but I don't think it is tracking "idle" time. > I believe it is total time to dump. This would take care of "stuck" or > "runaway" dump scenarios. The documentation says: dtimeout int Default: 1800 seconds. Amount of idle time per disk on a given client that a dumper running from within amdump will wait before it fails with a data timeout error. > -- Regards, Erik P. Olsen
Re: Question about "data timeout".
On Tue, Aug 23, 2005 at 11:19:59AM +0200, Erik P. Olsen wrote: > I have recently added a set of disks (file systems) to my back-up set > and that ended up with a failure due to "data timeout". I didn't even > know there was a dtimeout value to be specified in amanda.conf. I have > learnt that it is an idle time measured against the disks in question. > > My question is now, how is this idle time measured and where is it > reported? > > Only by knowing what amanda sees of the idle time am I able to specify a > reasonable dtimeout value. I may be totally wrong here, but I don't think it is tracking "idle" time. I believe it is total time to dump. This would take care of "stuck" or "runaway" dump scenarios. -- Jon H. LaBadie [EMAIL PROTECTED] JG Computing 4455 Province Line Road(609) 252-0159 Princeton, NJ 08540-4322 (609) 683-7220 (fax)
Question about "data timeout".
I have recently added a set of disks (file systems) to my back-up set and that ended up with a failure due to "data timeout". I didn't even know there was a dtimeout value to be specified in amanda.conf. I have learnt that it is an idle time measured against the disks in question. My question is now, how is this idle time measured and where is it reported? Only by knowing what amanda sees of the idle time am I able to specify a reasonable dtimeout value. -- Regards, Erik P. Olsen