Re: Amanda 3.5.3-1.fc38 aborts dump on EOF error to planner
Jose, Indeed I would be interested! Certainly worth a try. Thanks, Lou On 2023-07-25 9:34 a.m., Jose M Calhariz wrote: Hi, If I understand well your problem I found it in 3.5.1 and I have a patch that fix it, from the previous owner of amanda. The patch is being in used by amanda in Debian for several years. I can publish the patch here if you are interested. Kind regards Jose M Calhariz On Wed, Jul 19, 2023 at 09:34:21AM -0700, Lou Hafer wrote: Nuno, Thanks for the reply! And apologies for being not quite clear. I'm quite sure the offending hosts are powered down, so no chance of partial response. When I look at the planner..debug log, I can see sendsize requests going out to the hosts that are powered up and responsive, and I can see their responses arrive. There are two hosts powered down, gallifrey and jpt. The requests go out to gallifrey, then jpt. When the request to gallifrey times out, planner sees 255 status from SSH and aborts with the EOF error. Doesn't even wait around for the timeout on jpt. If I go back and look at some old logs, I can see planner continue past the `EOF on read' error. So I'm really starting to think this is a new bug in 3.5.3. For what it's worth, I'd interpret your error, ERROR Request to MACHINE failed: Connection refused as the machine was powered up and responsive but actively refused the connection for some reason. I'm puzzled by another thing: we're using the same version of amanda (3.5.3) and I run backups to disk, no tape drive involved, but I've never seen the error you mention in April: backup aborts after first machine/disk in disklist. The obvious difference is Fedora 37 versus Fedora 38, but really that shouldn't cause this much difference in behaviour. Bah! Sometimes staying up-to-date is a bit painful. I'll see if anyone else chimes in before I report this as a bug. Spent two weeks in Porto and the Douro Valley in Fall 2022. Loved the country! Lou On 2023-07-19 2:17 a.m., Nuno Dias wrote: Hi Lou, I'm using the same version as you, although in Fedora 37 amanda-3.5.3-1.fc37.x86_64 and I don't see that behaviour, I have some machines that are down and the rest of the backups were made. In my case I have this planner: ERROR Request to MACHINE failed: Connection refused From what you wrote, it seems gallifrey.ivriel is not down is responding, but has some problems reporting the size. Maybe this page will help https://www.zmanda.com/knowledge-base/eof-on-read-error-from-a-client/ Although if is aborting all the planner it seems a bug, or there are other reasons for aborting all the planner, maybe checking if the etimeout is not very low. Cheers, Nuno On Tue, 2023-07-18 at 13:50 -0700, Lou Hafer wrote: Folks, I've been using amanda for several years on a simple home network. Hosts are often powered down. Up through amanda 3.5.2, this worked like a charm. If the host didn't respond, it was simply skipped. Hosts that responded were properly backed up. With amanda 3.5.3, the behaviour has changed. If a host doesn't respond to the planner size request, the planner aborts the entire backup with the error planner: ERROR Request to gallifrey.ivriel failed: EOF on read from gallifrey.ivriel I've confirmed that my configuration is generally correct --- as long as all hosts in the disklist respond to the size request, the backup succeeds. Is this a bug? Do I need to change some parameter in my configuration to persuade planner to soldier on? Any thoughts would be appreciated. As context, this problem came about with an upgrade from Fedora 37 to Fedora 38, with a matching upgrade from amanda 3.5.2 to amanda 3.5.3. Thanks, Lou
Re: Amanda 3.5.3-1.fc38 aborts dump on EOF error to planner
Nuno, Thanks for the reply! And apologies for being not quite clear. I'm quite sure the offending hosts are powered down, so no chance of partial response. When I look at the planner..debug log, I can see sendsize requests going out to the hosts that are powered up and responsive, and I can see their responses arrive. There are two hosts powered down, gallifrey and jpt. The requests go out to gallifrey, then jpt. When the request to gallifrey times out, planner sees 255 status from SSH and aborts with the EOF error. Doesn't even wait around for the timeout on jpt. If I go back and look at some old logs, I can see planner continue past the `EOF on read' error. So I'm really starting to think this is a new bug in 3.5.3. For what it's worth, I'd interpret your error, ERROR Request to MACHINE failed: Connection refused as the machine was powered up and responsive but actively refused the connection for some reason. I'm puzzled by another thing: we're using the same version of amanda (3.5.3) and I run backups to disk, no tape drive involved, but I've never seen the error you mention in April: backup aborts after first machine/disk in disklist. The obvious difference is Fedora 37 versus Fedora 38, but really that shouldn't cause this much difference in behaviour. Bah! Sometimes staying up-to-date is a bit painful. I'll see if anyone else chimes in before I report this as a bug. Spent two weeks in Porto and the Douro Valley in Fall 2022. Loved the country! Lou On 2023-07-19 2:17 a.m., Nuno Dias wrote: Hi Lou, I'm using the same version as you, although in Fedora 37 amanda-3.5.3-1.fc37.x86_64 and I don't see that behaviour, I have some machines that are down and the rest of the backups were made. In my case I have this planner: ERROR Request to MACHINE failed: Connection refused From what you wrote, it seems gallifrey.ivriel is not down is responding, but has some problems reporting the size. Maybe this page will help https://www.zmanda.com/knowledge-base/eof-on-read-error-from-a-client/ Although if is aborting all the planner it seems a bug, or there are other reasons for aborting all the planner, maybe checking if the etimeout is not very low. Cheers, Nuno On Tue, 2023-07-18 at 13:50 -0700, Lou Hafer wrote: Folks, I've been using amanda for several years on a simple home network. Hosts are often powered down. Up through amanda 3.5.2, this worked like a charm. If the host didn't respond, it was simply skipped. Hosts that responded were properly backed up. With amanda 3.5.3, the behaviour has changed. If a host doesn't respond to the planner size request, the planner aborts the entire backup with the error planner: ERROR Request to gallifrey.ivriel failed: EOF on read from gallifrey.ivriel I've confirmed that my configuration is generally correct --- as long as all hosts in the disklist respond to the size request, the backup succeeds. Is this a bug? Do I need to change some parameter in my configuration to persuade planner to soldier on? Any thoughts would be appreciated. As context, this problem came about with an upgrade from Fedora 37 to Fedora 38, with a matching upgrade from amanda 3.5.2 to amanda 3.5.3. Thanks, Lou
Re: Amanda 3.5.3-1.fc38 aborts dump on EOF error to planner
Hi Lou, I'm using the same version as you, although in Fedora 37 amanda-3.5.3-1.fc37.x86_64 and I don't see that behaviour, I have some machines that are down and the rest of the backups were made. In my case I have this planner: ERROR Request to MACHINE failed: Connection refused From what you wrote, it seems gallifrey.ivriel is not down is responding, but has some problems reporting the size. Maybe this page will help https://www.zmanda.com/knowledge-base/eof-on-read-error-from-a-client/ Although if is aborting all the planner it seems a bug, or there are other reasons for aborting all the planner, maybe checking if the etimeout is not very low. Cheers, Nuno On Tue, 2023-07-18 at 13:50 -0700, Lou Hafer wrote: > Folks, > > I've been using amanda for several years on a simple home > network. > Hosts are often powered down. Up through amanda 3.5.2, this worked > like > a charm. If the host didn't respond, it was simply skipped. Hosts > that > responded were properly backed up. > > With amanda 3.5.3, the behaviour has changed. If a host doesn't > respond to the planner size request, the planner aborts the entire > backup with the error > > planner: ERROR Request to gallifrey.ivriel failed: > EOF on read from gallifrey.ivriel > > I've confirmed that my configuration is generally correct --- as long > as > all hosts in the disklist respond to the size request, the backup > succeeds. > > Is this a bug? Do I need to change some parameter in my configuration > to > persuade planner to soldier on? Any thoughts would be appreciated. > > As context, this problem came about with an upgrade from Fedora > 37 > to Fedora 38, with a matching upgrade from amanda 3.5.2 to amanda > 3.5.3. > > Thanks, > Lou > -- Nuno Dias LIP
Amanda 3.5.3-1.fc38 aborts dump on EOF error to planner
Folks, I've been using amanda for several years on a simple home network. Hosts are often powered down. Up through amanda 3.5.2, this worked like a charm. If the host didn't respond, it was simply skipped. Hosts that responded were properly backed up. With amanda 3.5.3, the behaviour has changed. If a host doesn't respond to the planner size request, the planner aborts the entire backup with the error planner: ERROR Request to gallifrey.ivriel failed: EOF on read from gallifrey.ivriel I've confirmed that my configuration is generally correct --- as long as all hosts in the disklist respond to the size request, the backup succeeds. Is this a bug? Do I need to change some parameter in my configuration to persuade planner to soldier on? Any thoughts would be appreciated. As context, this problem came about with an upgrade from Fedora 37 to Fedora 38, with a matching upgrade from amanda 3.5.2 to amanda 3.5.3. Thanks, Lou