what does etimeout really represent
does etimeout specify the time that sendsize will use to estimate the time needed to do 'its thing', or does it represent the sum total of time ( estimate dumping ) of a filesystem ?
Re: what does etimeout really represent
On Mon, 25 Mar 2002 at 11:53am, Uncle George wrote does etimeout specify the time that sendsize will use to estimate the time needed to do 'its thing', or does it represent the sum total of time ( estimate dumping ) of a filesystem ? etimeout specifies that amount of time amdump will wait to hear back after sending a sendsize request to a particular host. At least, I think it's per-host. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
Re: what does etimeout really represent
etimeout specifies that amount of time amdump will wait to hear back after sending a sendsize request to a particular host. At least, I think it's per-host. This is covered in the man page: Default: 300 seconds. Amount of time per disk on a given client that the planner step of amdump will wait to get the dump size estimates. For instance, with the default of 300 seconds and four disks on client A, planner will wait up to 20 minutes for that machine. A negative value will be interpreted as a total amount of time, instead of a per-disk value. Note that, when positive, it is per disk. When negative, it is per host (and the man page needs a minor tweak to make that point). Joshua Baker-LePain John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]
Re: what does etimeout really represent
Then my current impression is that the feature does not work - exactly as stated. There are a few file systems on one 'errant' system, where one filesystem has taken over 224 minuts( wall time ) to complete just the estimate. I had changed the time to be some 3600 ( an hour ) which, if one beleives the man page, would leave some 8hours ( wall time ) for all 8 partitions to complete. I think the log had 2 timeout errors of some 3800. The longest, and the next to longest partition did not complete :-{ I dont think that anyone wants to know who the 'planner' is, or 'sendsize' or even their relationship at a user, or administrative level. But i suspect that one has to give the 'highest' possible estimate on a per partition basis. Its not too rational for the observer program 'planner?' to multiply the estimate if u can have multiple ( or even a single ) 'sendsize's running on the client machine ( now set at 8 * 6hrs ) Its just too long to wait for some failed communication! ( it also ruins the concept of a daily backup ) John R. Jackson wrote: etimeout specifies that amount of time amdump will wait to hear back after sending a sendsize request to a particular host. At least, I think it's per-host. This is covered in the man page: Default: 300 seconds. Amount of time per disk on a given client that the planner step of amdump will wait to get the dump size estimates. For instance, with the default of 300 seconds and four disks on client A, planner will wait up to 20 minutes for that machine. A negative value will be interpreted as a total amount of time, instead of a per-disk value. Note that, when positive, it is per disk. When negative, it is per host (and the man page needs a minor tweak to make that point). Joshua Baker-LePain John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]
Re: what does etimeout really represent
... I had changed the time to be some 3600 ( an hour ) which, if one beleives the man page, would leave some 8hours ( wall time ) for all 8 partitions to complete. ... Right. That's the way it's supposed to work, and the way it has worked for myself and others. Do you have the corresponding amandad*.debug and sendsize*.debug files still laying around? Its not too rational for the observer program 'planner?' to multiply the estimate if u can have multiple ( or even a single ) 'sendsize's running on the client machine ( now set at 8 * 6hrs ) Its just too long to wait for some failed communication! ( it also ruins the concept of a daily backup ) So what do you suggest? John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]
Re: what does etimeout really represent
Sorry, for every NEW run, i would like a set of new logs just for that run just so that i know are from just that run. from my 'novice' eyes, its just to much data to figure out where the previous run completed, and the new one began. But if i ( ever ) get a complete backup ( after setting it for -6hrs ) i will go back and put it back to 3600. /gat John R. Jackson wrote: ... I had changed the time to be some 3600 ( an hour ) which, if one beleives the man page, would leave some 8hours ( wall time ) for all 8 partitions to complete. ... Right. That's the way it's supposed to work, and the way it has worked for myself and others. So what do you suggest? From an admin point of view I would like every disk on my disklist to be backed up. The time that it takes to do it is irrelevant. Time becomes relevant if the avenues of communication is severed between the parent/children ( maxdumps 1 ) of sendsize, and the communication between client and server becomes severed. How do u know that communication has been lost ? would a 'ping' or keep-alive concept have any use here ? But there are also times when the 'sizer' program may be stuck, spinning to no usefull end. I suppose that in this unusual case/scenario it would be up to the administrator to take the extraordinary action to determine what is causing the 'sizer' failure ( ie is it a bug? is tar backing up /dev/zero ? ) dtimeout represents idle time, why cant etimeout also represent idle time?
Re: what does etimeout really represent
Sorry, for every NEW run, i would like a set of new logs just for that run just so that i know are from just that run. ... Which log files are you talking about? As of 2.4.2p2, every file should have a unique name, most of them based on a datestamp. But if i ( ever ) get a complete backup ( after setting it for -6hrs ) i will go back and put it back to 3600. You might consider commenting out some disklist entries and doing a smaller subset to get things going. Then gradually add things back in. Have you looked into why the estimates are taking so long for that client? That would seem to be the root of the issue. From an admin point of view I would like every disk on my disklist to be backed up. ... Makes sense :-). The time that it takes to do it is irrelevant. ... I don't necessarily agree, but moving on ... Time becomes relevant if the avenues of communication is severed between the parent/children ... How do u know that communication has been lost ? would a 'ping' or keep-alive concept have any use here ? Possibly. dtimeout represents idle time, why cant etimeout also represent idle time? Because during dtimeout there should be data moving all the time. During estimates, there isn't anything going back to the server until everything is done and then there's a single response packet. But I sort of get your point. You'd like ping values and as long as the client is still responding, even though it isn't sending any data (be it the estimates or the actual dump image), just keep waiting for it to get its act together. Does that sum it up? Based on my experience on this list, I think not putting an upper bound on the length of time to wait would be a problem. Clients just plain go silly sometimes, and that would cause the whole run to come to a halt which goes against the everything should get backed up principle. You've got to decide when to cut your losses, which is what the timeout values are supposed to do. /gat John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]