>Up to and including 171 disklist entries of type root-tar, everything is
>ok.  ...
>If I add some more disklist entries of the same type, amcheck hangs for
>a minute (ctimeout 60) and then reports "selfcheck request timed out. 
>Host down?"

Wow.  If it's what I think it is, that bug has been around forever.
Sheesh!  :-)

Please give the following patch a try and let me know if it solves the
problem.

Basically, there is a deadlock between amandad and the child process
it starts (selfcheck, in your case).  Amandad gets the request packet,
creates one pipe to write the request and one to read the result, then
forks the child.  It then writes the whole packet to the child, and
that's where the problem lies.  If the pipeline cannot handle that much
data, the write loop will hang and amandad will never clear out the data
filling up the read pipe, so the child stops and does not read any more.

This patch moves the write loop into the select loop that was already set
up to read the child result.  I did a minimal test and it didn't seem to
break anything.  Well, it didn't after I got it to stop dropping core :-).

I haven't looked at this w.r.t. 2.5 yet.  I suspect things are much
different there.

John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]

amandad.diff

Reply via email to