Sebastian, would you be willing to try the current patches in svn? This might solve the problem.
M Sebastian Hetze wrote: > Hi *, > > at least one sort of "MAXFD, check for defunct children" problem still > exists in version 2.2.8 of cfengine. > Here is what I found: > > cfengine wants to limit the number of parallel pipes to a hardcoded > number (MAXFD=20). To achieve this, every new pipe is checked for its > fileno and if that is higher than 20, the error message appears and > the pipe is somehow ignored (cfengine does not close the pipe properly > in that case, which is a bug of its own). > > Now look at this real live example: > [EMAIL PROTECTED]:~# ls -l /proc/13406/fd > lr-x------ 1 root root 64 2008-08-24 20:39 0 -> /dev/null > l-wx------ 1 root root 64 2008-08-24 20:39 1 -> /var/log/cfengine/cfrun.13400 > l-wx------ 1 root root 64 2008-08-24 20:39 2 -> /dev/null > lr-x------ 1 root root 64 2008-08-24 20:39 3 -> /dev/urandom > lr-x------ 1 root root 64 2008-08-24 20:39 4 -> /etc/cfengine/cfagent.conf > lrwx------ 1 root root 64 2008-08-24 20:39 5 -> socket:[18203] > lrwx------ 1 root root 64 2008-08-24 20:39 6 -> socket:[18204] > lr-x------ 1 root root 64 2008-08-24 20:39 7 -> /proc/loadavg > lrwx------ 1 root root 64 2008-08-24 20:39 8 -> socket:[18954] > lrwx------ 1 root root 64 2008-08-24 20:39 9 -> socket:[18207] > lrwx------ 1 root root 64 2008-08-24 20:39 10 -> socket:[18974] > lrwx------ 1 root root 64 2008-08-24 20:39 11 -> socket:[18209] > lrwx------ 1 root root 64 2008-08-24 20:39 12 -> socket:[18215] > lrwx------ 1 root root 64 2008-08-24 20:39 13 -> socket:[18221] > lrwx------ 1 root root 64 2008-08-24 20:39 14 -> socket:[18217] > lrwx------ 1 root root 64 2008-08-24 20:39 15 -> socket:[18227] > lrwx------ 1 root root 64 2008-08-24 20:39 16 -> socket:[18223] > lrwx------ 1 root root 64 2008-08-24 20:39 17 -> socket:[18234] > lrwx------ 1 root root 64 2008-08-24 20:39 18 -> socket:[18229] > lr-x------ 1 root root 64 2008-08-24 20:39 19 -> pipe:[149711] > lrwx------ 1 root root 64 2008-08-24 20:39 20 -> socket:[18236] > lr-x------ 1 root root 64 2008-08-24 20:39 21 -> pipe:[150154] > lr-x------ 1 root root 64 2008-08-24 20:39 22 -> pipe:[150163] > lr-x------ 1 root root 64 2008-08-24 20:39 23 -> pipe:[150173] > lr-x------ 1 root root 64 2008-08-24 20:39 24 -> pipe:[150184] > lr-x------ 1 root root 64 2008-08-24 20:39 25 -> pipe:[150194] > lr-x------ 1 root root 64 2008-08-24 20:39 26 -> pipe:[150205] > lr-x------ 1 root root 64 2008-08-24 20:39 27 -> pipe:[150215] > lr-x------ 1 root root 64 2008-08-24 20:39 28 -> pipe:[150225] > lr-x------ 1 root root 64 2008-08-24 20:39 29 -> pipe:[150236] > lr-x------ 1 root root 64 2008-08-24 20:39 30 -> pipe:[150249] > lr-x------ 1 root root 64 2008-08-24 20:39 31 -> pipe:[150269] > lr-x------ 1 root root 64 2008-08-24 20:39 32 -> pipe:[150283] > lr-x------ 1 root root 64 2008-08-24 20:39 33 -> pipe:[150293] > lr-x------ 1 root root 64 2008-08-24 20:39 34 -> pipe:[150302] > lr-x------ 1 root root 64 2008-08-24 20:39 35 -> pipe:[150396] > > > You see, that there are 14 sockets open for cfagent. In this > particular case, these sockets belong to heartbeat, which happens > to have started this instance of cfagent. Maybe not the most > common case, but definitely something cfagent should work with. > Since these sockets all count for fileno, there is simply no > fileno for popen left. Or -- even worse -- there is only one > fileno left and the bug hits only occasionally if one pipe > does not return fast enough. > > As a workaround, I changed MAXFD to 40. But I think, using a > proper counter for open pipes would be more appropriate. > It looks like, you already started to use the CHILD[] array > to keep track of free pipe slots?! > If you guide me the direction you want to go, I will happily > help you coding and testing. > > Best regards, > > Sebastian Hetze > _______________________________________________ > Bug-cfengine mailing list > [email protected] > https://cfengine.org/mailman/listinfo/bug-cfengine -- Mark Burgess Web: http://www.iu.hio.no/~mark Tlf: +47 22453272 _______________________________________________ Bug-cfengine mailing list [email protected] https://cfengine.org/mailman/listinfo/bug-cfengine
