Re: Question about "data timeout".

2005-08-23 Thread Erik P. Olsen
On Tue, 2005-08-23 at 11:24 -0400, Matt Hyclak wrote:
> On Tue, Aug 23, 2005 at 05:04:02PM +0200, Erik P. Olsen enlightened us:
> > > On Tue, Aug 23, 2005 at 11:19:59AM +0200, Erik P. Olsen wrote:
> > > > I have recently added a set of disks (file systems) to my back-up set
> > > > and that ended up with a failure due to "data timeout". I didn't even
> > > > know there was a dtimeout value to be specified in amanda.conf. I have
> > > > learnt that it is an idle time measured against the disks in question.
> > > > 
> > > > My question is now, how is this idle time measured and where is it
> > > > reported? 
> > > > 
> > > > Only by knowing what amanda sees of the idle time am I able to specify a
> > > > reasonable dtimeout value.
> > > 
> > > I may be totally wrong here, but I don't think it is tracking "idle" time.
> > > I believe it is total time to dump.  This would take care of "stuck" or
> > > "runaway" dump scenarios.
> > 
> > The documentation says: dtimeout int Default: 1800 seconds. Amount of
> > idle time per disk on a given client that a dumper running from within
> > amdump will wait before it fails with a data timeout error.
> 
> Yes, and that "per disk" is important. If you have a machine with 3 Disklist
> Entries (DLEs), it will wait 5400 seconds (90 minutes) for that machine.
> Another machine with 1 DLE will only get 30 minutes to complete.

I read it the way that each disk gets 1800 seconds idle (wait?) time
before a time out. That is if disk 1 uses 1 second of that time the rest
of 1799 seconds is "lost" and will not be added to the idle time of the
two remaining disks. I have 13 DLEs that should give me 6H 30M if this
theory is true, my data timeout happened after 3H 19M!

I had hoped that amanda would report how much idle time had occurred for
each disk.
> 
-- 
Regards,
Erik P. Olsen



Re: Question about "data timeout".

2005-08-23 Thread Paul Bijnens

Jon LaBadie wrote:

On Tue, Aug 23, 2005 at 11:19:59AM +0200, Erik P. Olsen wrote:


I have recently added a set of disks (file systems) to my back-up set
and that ended up with a failure due to "data timeout". I didn't even
know there was a dtimeout value to be specified in amanda.conf. I have
learnt that it is an idle time measured against the disks in question.

My question is now, how is this idle time measured and where is it
reported? 


Only by knowing what amanda sees of the idle time am I able to specify a
reasonable dtimeout value.



I may be totally wrong here, but I don't think it is tracking "idle" time.
I believe it is total time to dump.  This would take care of "stuck" or
"runaway" dump scenarios.




Correct me if I'm wrong -- the coffee machine is broken here, writing
this on a diet of pure fresh water!

Reading through the sources, it seems that dtimeout is used as
timeout value on a select() call in dumper.c, around line 1356 (amanda
2.4.5 sources).  The select waits for activity on the data stream or
on the messages stream.
That means that if there is no traffic received within dtimeout seconds
on one of those streams, you get a "data timeout".

The default 1800 seconds seems more than reasonable to me in that case.

A pathological case could be a sequence of very compressable data (all
"aaa"s or zero's, like an empty database file). Compressing
such a sequence, together with some buffering on client and server,
it could well take a long time before any bytes come out of such pipe.
But 1800 seconds seems to me more than enough even for those cases.

There is also one of the last "enhancements" in gnutar for handling
sparse files, which could result in a large time without emiting any 
data (and some systems create sparse files with 64 bit sizes...):


https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=154882
http://lists.gnu.org/archive/html/bug-tar/2005-07/msg00025.html

But that is only when doing estimates, or does it also affect the
backup itself?

And of course firewall timeouts come into play too, blocking one of
the streams (e.g. the messages stream has almost no traffic usually)
resulting in never receiving the end-of-file indication on that stream.
Which results after dtimetout seconds in "data timeout" too.

--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  *
***




Re: Question about "data timeout".

2005-08-23 Thread Graeme Humphries




Jon LaBadie wrote:

  
The documentation says: dtimeout int Default: 1800 seconds. Amount of
idle time per disk on a given client that a dumper running from within
amdump will wait before it fails with a data timeout error.

  
  Glad I said I may be totally wrong :(
  

Even though the document reads that way, I've found it to *behave* the
way you described, John. When I added a new disk to a server recently
that was over 200GB, I had to increase the timeout, otherwise the dump
itself would trigger the timeout and cause it to abort. Is this
expected behavior? If so, should the docs be modified?

Graeme




Re: Question about "data timeout".

2005-08-23 Thread Jon LaBadie
On Tue, Aug 23, 2005 at 05:04:02PM +0200, Erik P. Olsen wrote:
> On Tue, 2005-08-23 at 09:38 -0400, Jon LaBadie wrote:
> > 
> > I may be totally wrong here, but I don't think it is tracking "idle" time.
> > I believe it is total time to dump.  This would take care of "stuck" or
> > "runaway" dump scenarios.
> 
> The documentation says: dtimeout int Default: 1800 seconds. Amount of
> idle time per disk on a given client that a dumper running from within
> amdump will wait before it fails with a data timeout error.
> 

Glad I said I may be totally wrong :(

Thanks,
-- 
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)


Re: Question about "data timeout".

2005-08-23 Thread Matt Hyclak
On Tue, Aug 23, 2005 at 05:04:02PM +0200, Erik P. Olsen enlightened us:
> > On Tue, Aug 23, 2005 at 11:19:59AM +0200, Erik P. Olsen wrote:
> > > I have recently added a set of disks (file systems) to my back-up set
> > > and that ended up with a failure due to "data timeout". I didn't even
> > > know there was a dtimeout value to be specified in amanda.conf. I have
> > > learnt that it is an idle time measured against the disks in question.
> > > 
> > > My question is now, how is this idle time measured and where is it
> > > reported? 
> > > 
> > > Only by knowing what amanda sees of the idle time am I able to specify a
> > > reasonable dtimeout value.
> > 
> > I may be totally wrong here, but I don't think it is tracking "idle" time.
> > I believe it is total time to dump.  This would take care of "stuck" or
> > "runaway" dump scenarios.
> 
> The documentation says: dtimeout int Default: 1800 seconds. Amount of
> idle time per disk on a given client that a dumper running from within
> amdump will wait before it fails with a data timeout error.

Yes, and that "per disk" is important. If you have a machine with 3 Disklist
Entries (DLEs), it will wait 5400 seconds (90 minutes) for that machine.
Another machine with 1 DLE will only get 30 minutes to complete.

Matt

-- 
Matt Hyclak
Department of Mathematics 
Department of Social Work
Ohio University
(740) 593-1263


pgpAKPdkNThiJ.pgp
Description: PGP signature


Re: Question about "data timeout".

2005-08-23 Thread Erik P. Olsen
On Tue, 2005-08-23 at 09:38 -0400, Jon LaBadie wrote:
> On Tue, Aug 23, 2005 at 11:19:59AM +0200, Erik P. Olsen wrote:
> > I have recently added a set of disks (file systems) to my back-up set
> > and that ended up with a failure due to "data timeout". I didn't even
> > know there was a dtimeout value to be specified in amanda.conf. I have
> > learnt that it is an idle time measured against the disks in question.
> > 
> > My question is now, how is this idle time measured and where is it
> > reported? 
> > 
> > Only by knowing what amanda sees of the idle time am I able to specify a
> > reasonable dtimeout value.
> 
> I may be totally wrong here, but I don't think it is tracking "idle" time.
> I believe it is total time to dump.  This would take care of "stuck" or
> "runaway" dump scenarios.

The documentation says: dtimeout int Default: 1800 seconds. Amount of
idle time per disk on a given client that a dumper running from within
amdump will wait before it fails with a data timeout error.

> 
-- 
Regards,
Erik P. Olsen



Re: Question about "data timeout".

2005-08-23 Thread Jon LaBadie
On Tue, Aug 23, 2005 at 11:19:59AM +0200, Erik P. Olsen wrote:
> I have recently added a set of disks (file systems) to my back-up set
> and that ended up with a failure due to "data timeout". I didn't even
> know there was a dtimeout value to be specified in amanda.conf. I have
> learnt that it is an idle time measured against the disks in question.
> 
> My question is now, how is this idle time measured and where is it
> reported? 
> 
> Only by knowing what amanda sees of the idle time am I able to specify a
> reasonable dtimeout value.

I may be totally wrong here, but I don't think it is tracking "idle" time.
I believe it is total time to dump.  This would take care of "stuck" or
"runaway" dump scenarios.

-- 
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)


Question about "data timeout".

2005-08-23 Thread Erik P. Olsen
I have recently added a set of disks (file systems) to my back-up set
and that ended up with a failure due to "data timeout". I didn't even
know there was a dtimeout value to be specified in amanda.conf. I have
learnt that it is an idle time measured against the disks in question.

My question is now, how is this idle time measured and where is it
reported? 

Only by knowing what amanda sees of the idle time am I able to specify a
reasonable dtimeout value.

-- 
Regards,
Erik P. Olsen