Re: Disabling the SIGCHLD handler

2008-01-15 Thread Marc Lehmann
On Tue, Jan 15, 2008 at 10:47:33AM -0500, Chris Shoemaker [EMAIL PROTECTED] 
wrote:
 It's O(N) in the number of child watchers, and runs only upon SIGCHLD.

Also, it just occured to me that this is totally misleading:

Your algorithm is O(c²) where c is the number of children in the number
of waitpid calls, where libev is O(c). If you take into account the number
of watchers, then your approach is O(w c²) where w is the number of
watchers, while libev's approach is O(w c) with a much smaller constant
factor due to the hashing.

As an example, with just 50 children, in the worst case, you require over
1200 waitpid calls where libev only requires 100, and each waitpid call,
if implemented correctly, would have to check all active watchers in your
case (leading to over 3 iterations overall), and only the potentially
matching ones with libev (leading to roughly 150 iterations overall).

Performance quickly degrades from that point on.

It makes sense to implement it that way for you because you are required
to emulate a non-event-based API (waitpid) and there is little to do, but
punishing programs written in an event-based way makes no sense to me,
especially in an event-library: It is only natural that non-event-based
approaches might require less efficient algorithms.

Sure, 50 children is not typical, but it happens with some regularity,
just as 1000 or more file descriptors is not typical, but it happens with
some regularity.

 Is that really so bad?  I'd love to know of a more efficient way, but
 efficiency is secondard to correctness,
   
libev *is* correct, it just isn't suitable for emulating waitpid, as it is a
generic event library. When talking about correctness, I'd say the correct
approach is to deliver data to the registered watchers, and not just some of
them.

(thinking about all this, I might be able to reduce the number of waitpid
calls further).

In any case, libev is commited to support event-based programing
efficiently, it is not meant to emulate the traditional UNIX api in a
different way: if you want waitpid, you know where to find it.

-- 
The choice of a   Deliantra, the free code+content MORPG
  -==- _GNU_  http://www.deliantra.net
  ==-- _   generation
  ---==---(_)__  __   __  Marc Lehmann
  --==---/ / _ \/ // /\ \/ /  [EMAIL PROTECTED]
  -=/_/_//_/\_,_/ /_/\_\

___
libev mailing list
libev@lists.schmorp.de
http://lists.schmorp.de/cgi-bin/mailman/listinfo/libev


Re: Disabling the SIGCHLD handler

2008-01-15 Thread Marc Lehmann
On Tue, Jan 15, 2008 at 08:28:34PM -0700, Tony Arcieri [EMAIL PROTECTED] 
wrote:
 Marc, could you check out the Rubinius code?  I know when you looked at Rev
 you spotted some obvious problems right away.  If there's a problem with the
 way Rubinius is doing SIGCHLD maybe you could correct it and move it on to
 ev_child.  Or, perhaps you'll discover some irreconcilable or otherwise
 difficult problem.

I do not think I have the time for that: this has nothing to do with libev,
so my personal interest is low, and I am very low on time in general. Sorry.

(I only looked at Rev to see how one would do that in ruby, out of interest,
and because seeing bugs might hint at what could be improved, for example,
in the documentation).

-- 
The choice of a   Deliantra, the free code+content MORPG
  -==- _GNU_  http://www.deliantra.net
  ==-- _   generation
  ---==---(_)__  __   __  Marc Lehmann
  --==---/ / _ \/ // /\ \/ /  [EMAIL PROTECTED]
  -=/_/_//_/\_,_/ /_/\_\

___
libev mailing list
libev@lists.schmorp.de
http://lists.schmorp.de/cgi-bin/mailman/listinfo/libev


Re: Disabling the SIGCHLD handler

2008-01-14 Thread Chris Shoemaker
I think you might be taking this personally.  Please don't.  I'd
really like a productive dialog here.  Perhaps we're talking past each
other.  Perhaps we're not looking at the same version of libev.  (I'm
looking at 2.01.)  Perhaps I'm an idiot.  However, even if I'm an
idiot, I'm not attacking you.  A little politeness doesn't cost you
much.

On Mon, Jan 14, 2008 at 03:06:41AM +0100, Marc Lehmann wrote:
 On Sun, Jan 13, 2008 at 06:05:18PM -0500, Chris Shoemaker [EMAIL PROTECTED] 
 wrote:
   As I explained, if for whatever reason it is inconvinient to use it,
   don't. Libev doesn't force the use of its child watcher.
  
  Well, it doesn't force the use of ev_child, but it does force an
  application to choose between either:
  
a) not having access to the exit status of children that exited before a
  ev_child was started, OR
 
 As I said earlier, this is untrue. Why do you kepe repeating it? libev
 *explicitly* allows registering an ev_child handler after the child has
 exited already, anythign else would be pointless indeed.

I will try to be more explicit about why I believe that this doesn't work.

 This is actually used in practise, so if it doesn't work, I'd like to know
 why.

I will also try to explain why it doesn't work.

b) not being able to use ev_signal
 
 That is also untrue. Nothing precludes libev users form using ev_signal
 but install their own sigchld handler, as was explained before.

I guess I've failed to effectively communicate the problem, so I will
try some different approaches.

  I'd like to use ev_signal (for unrelated purposes, mostly) AND have access
  to exit status of children that exited before I knew I would want their
  exit status, (for which I know I need to use my old SIGCHLD handler, that
  I used with libevent).
 
 I will never understand how people can know factually wrong things.
 
 You made this claim repeatedly, please back it up, or take it back. Thanks.

Okay.  I guess the simplest proof is in the code:

ev.c has a childcb SIGCHLD handler containing:
  if (0 = (pid = waitpid (-1, status, WNOHANG | WUNTRACED | WCONTINUED)))
if (!WCONTINUED
|| errno != EINVAL
|| 0 = (pid = waitpid (-1, status, WNOHANG | WUNTRACED)))
  return;
...
  child_reap (EV_A_ sw, pid, pid, status);

Notice that nothing prevents the waitpid from reaping any child at all.

child_reap() contains:
  for (w = (ev_child *)childs [chain  (EV_PID_HASHSIZE - 1)]; w; w = (ev_child 
*)((WL)w)-next)
if (w-pid == pid || !w-pid)
  {
 ... [ feed the event ]
  }

Notice that if the conditional (w-pid == pid || !w-pid) is false for all
ev_childs, then
  a) we have just reaped a child, and have its status.
  b) we will not be generating any event, since no ev_child matches.
  c) we will never again see the status for this child

All that is then required is for someone to start a ev_child for the
child we just reaped.  That event will never trigger.


If you'd prefer a more empirical approach, you can run this program:
*
#include ev.h
#include stdio.h
#include unistd.h
#include signal.h
#include stdlib.h
#include sys/wait.h

struct ev_child child_watcher;

struct ev_signal signal_watcher;

static void signal_cb (EV_P_ struct ev_signal *w, int revents) {
  printf (signal\n);
  ev_unloop (EV_A_ EVUNLOOP_ONE); /* leave one loop call */
}

static void cb (EV_P_ struct ev_child *w, int revents) {
  printf (w-pid = %d; w-rstatus = %d\n, w-rpid,  WEXITSTATUS(w-rstatus));
  ev_unloop (EV_A_ EVUNLOOP_ONE); /* leave one loop call */
}

int main (int args, char *argv[]) {
  struct ev_loop *loop = ev_default_loop(0/*EV_FORK_ENABLE*/);
  pid_t pid;

  ev_signal_init (signal_watcher, signal_cb, SIGHUP);
  ev_signal_start (loop, signal_watcher);

  if ((pid = fork()) == 0) {
exit(99);
  }
  printf(spawned %d\n, pid);
  if (fork() == 0) {
sleep(5);
kill(pid, SIGHUP);
exit(98);
  }

  ev_loop(loop, EVLOOP_NONBLOCK);

  sleep(1);
  printf(STARTING LOOP %d\n, pid);
  ev_child_init (child_watcher, cb, pid);
  ev_child_start (loop, child_watcher);

  ev_loop(loop, 0);

  printf(done\n);
  return 0;
}


If I understand what you're claiming, then that program should print the
exit status 99, and then terminate.  In fact, it does neither.

An strace of the process shows why, and it agrees with the source code:

...
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x2aac7770) = 21046
--- SIGCHLD (Child exited) @ 0 (0) ---
write(4, \21, 1)  = 1
write(1, spawned 21046\n, 14) = 14
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x2aac7770) = 21047
epoll_ctl(5, EPOLL_CTL_ADD, 3, {EPOLLIN, {u32=3, u64=3}}) = 0
epoll_wait(5, {{EPOLLIN, {u32=3, u64=3}}}, 64, 0) = 1
read(3, \21, 1)   = 1
wait4(-1, [{WIFEXITED(s)  WEXITSTATUS(s) == 99}], 
WNOHANG|WSTOPPED|WCONTINUED, NULL) = 21046
wait4(-1, 0x7fffd639771c,