what does etimeout really represent

2002-03-25 Thread Uncle George

does etimeout specify the time that sendsize will use to estimate the
time needed to do 'its thing', or does it represent the sum total of
time ( estimate  dumping ) of a filesystem ?



Re: what does etimeout really represent

2002-03-25 Thread Joshua Baker-LePain

On Mon, 25 Mar 2002 at 11:53am, Uncle George wrote

 does etimeout specify the time that sendsize will use to estimate the
 time needed to do 'its thing', or does it represent the sum total of
 time ( estimate  dumping ) of a filesystem ?

etimeout specifies that amount of time amdump will wait to hear back after 
sending a sendsize request to a particular host.  At least, I think it's 
per-host.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University




Re: what does etimeout really represent

2002-03-25 Thread John R. Jackson

etimeout specifies that amount of time amdump will wait to hear back after 
sending a sendsize request to a particular host.  At least, I think it's 
per-host.

This is covered in the man page:

  Default:  300 seconds.  Amount of time per  disk  on  a
  given  client that the planner step of amdump will wait
  to get the dump size estimates.  For instance, with the
  default  of  300  seconds  and  four disks on client A,
  planner will wait up to 20 minutes for that machine.  A
  negative value will be interpreted as a total amount of
  time, instead of a per-disk value.

Note that, when positive, it is per disk.  When negative, it is per
host (and the man page needs a minor tweak to make that point).

Joshua Baker-LePain

John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]



Re: what does etimeout really represent

2002-03-25 Thread Uncle George

Then my current impression is that the feature does not work - exactly
as stated.  There are a few file systems on one 'errant' system, where
one filesystem has taken over 224 minuts( wall time ) to complete just
the estimate. I had changed the time to be some 3600 ( an hour ) which,
if one beleives the man page, would leave some 8hours ( wall time ) for
all 8 partitions to complete. I think the log had 2 timeout errors of
some 3800. The longest, and the next to longest partition did not
complete :-{

I dont think that anyone wants to know who the 'planner' is, or
'sendsize' or even their relationship at a user, or administrative
level. But i suspect that one has to give the 'highest' possible
estimate on a per partition basis. 

Its not too rational for the observer program 'planner?' to multiply the
estimate if u can have multiple ( or even a single ) 'sendsize's running
on the client machine ( now set at 8 * 6hrs ) Its just too long to wait
for some failed communication! ( it also ruins the concept of a daily
backup )


John R. Jackson wrote:
 
 etimeout specifies that amount of time amdump will wait to hear back after
 sending a sendsize request to a particular host.  At least, I think it's
 per-host.
 
 This is covered in the man page:
 
   Default:  300 seconds.  Amount of time per  disk  on  a
   given  client that the planner step of amdump will wait
   to get the dump size estimates.  For instance, with the
   default  of  300  seconds  and  four disks on client A,
   planner will wait up to 20 minutes for that machine.  A
   negative value will be interpreted as a total amount of
   time, instead of a per-disk value.
 
 Note that, when positive, it is per disk.  When negative, it is per
 host (and the man page needs a minor tweak to make that point).
 
 Joshua Baker-LePain
 
 John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]



Re: what does etimeout really represent

2002-03-25 Thread John R. Jackson

... I had changed the time to be some 3600 ( an hour ) which,
if one beleives the man page, would leave some 8hours ( wall time ) for
all 8 partitions to complete.  ...

Right.  That's the way it's supposed to work, and the way it has worked
for myself and others.

Do you have the corresponding amandad*.debug and sendsize*.debug files
still laying around?

Its not too rational for the observer program 'planner?' to multiply the
estimate if u can have multiple ( or even a single ) 'sendsize's running
on the client machine ( now set at 8 * 6hrs ) Its just too long to wait
for some failed communication! ( it also ruins the concept of a daily
backup )

So what do you suggest?

John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]



Re: what does etimeout really represent

2002-03-25 Thread Uncle George

Sorry, for every NEW run, i would like a set of new logs just for that
run just so that i know are from just that run. from my 'novice' eyes,
its just to much data to figure out where the previous run completed,
and the new one began.

But if i ( ever ) get a complete backup ( after setting it for -6hrs ) i
will go back and put it back to 3600.

/gat

John R. Jackson wrote:
 
 ... I had changed the time to be some 3600 ( an hour ) which,
 if one beleives the man page, would leave some 8hours ( wall time ) for
 all 8 partitions to complete.  ...
 
 Right.  That's the way it's supposed to work, and the way it has worked
 for myself and others.
 

 
 So what do you suggest?

From an admin point of view I would like every disk on my disklist to be
backed up. The time that it takes to do it is irrelevant. Time becomes
relevant if the avenues of communication is severed between the
parent/children ( maxdumps  1 )  of sendsize, and the communication
between client and server becomes severed. How do u know that
communication has been lost ? would a 'ping' or keep-alive concept have
any use here ? 
But there are also times when the 'sizer' program may be stuck, spinning
to no usefull end. I suppose that in this unusual case/scenario it would
be up to the administrator to take the extraordinary action to determine
what is causing the 'sizer' failure ( ie is it a bug? is tar backing up
/dev/zero ?  )

dtimeout represents idle time, why cant etimeout also represent idle
time?



Re: what does etimeout really represent

2002-03-25 Thread John R. Jackson

Sorry, for every NEW run, i would like a set of new logs just for that
run just so that i know are from just that run.  ...

Which log files are you talking about?  As of 2.4.2p2, every file should
have a unique name, most of them based on a datestamp.

But if i ( ever ) get a complete backup ( after setting it for -6hrs ) i
will go back and put it back to 3600.

You might consider commenting out some disklist entries and doing a
smaller subset to get things going.  Then gradually add things back in.

Have you looked into why the estimates are taking so long for that client?
That would seem to be the root of the issue.

From an admin point of view I would like every disk on my disklist to be
backed up.  ...

Makes sense :-).

The time that it takes to do it is irrelevant.  ...

I don't necessarily agree, but moving on ...

Time becomes
relevant if the avenues of communication is severed between the
parent/children ... How do u know that
communication has been lost ? would a 'ping' or keep-alive concept have
any use here ? 

Possibly.

dtimeout represents idle time, why cant etimeout also represent idle
time?

Because during dtimeout there should be data moving all the time.
During estimates, there isn't anything going back to the server until
everything is done and then there's a single response packet.

But I sort of get your point.  You'd like ping values and as long as
the client is still responding, even though it isn't sending any data
(be it the estimates or the actual dump image), just keep waiting for
it to get its act together.  Does that sum it up?

Based on my experience on this list, I think not putting an upper bound
on the length of time to wait would be a problem.  Clients just plain go
silly sometimes, and that would cause the whole run to come to a halt
which goes against the everything should get backed up principle.
You've got to decide when to cut your losses, which is what the timeout
values are supposed to do.

/gat

John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]