Re: [PATCH 6/9] MEDIUM: mworker: workers exit when the master leaves

Willy Tarreau Tue, 30 May 2017 03:40:21 -0700

So all the series looks quite good, and I must confess I'm impatient to
merge it so that we turn the page of the wrapper, and also because being
able to use nbproc in foreground during development can be nice.

But I have two comments first :

On Mon, May 29, 2017 at 05:42:09PM +0200, William Lallemand wrote:
> +void mworker_pipe_handler(int fd)
> +{
> +     char c;
> +
> +     if (read(fd, &c, 1) > -1) {
> +             fd_delete(fd);
> +             deinit();
> +             exit(EXIT_FAILURE);
> +     } else {
> +             /* should never happened */
> +             fd_delete(fd);
> +     }
> +
> +     return;
> +}

The test above is dangerous, it assumes that there's zero valid reason
for read() to return -1 but in fact there are eventhough they are corner
cases.

First, a good rule of thumb to consider is that a poller is not rocket
science and that it always relies on the callee to check for the reality
of event readiness. We have one well-known example in Linux in association
with UDP datagram reception. It's possible for poll() to say "there's
something to read" and when read()/recv() try to read, the packet checksum
is computed on the fly during the user_copy(), found to be bad, the packet
is destroyed and read() returns EAGAIN. Of course it's not what you have
above, but it's to say that whenever you have a poller, EAGAIN must be
tested to be completely safe.

The second (more likely) case is a signal causing read() to return -1 EINTR.
And in fact in the master worker, it could theorically happen like this :

    killall -SOMESIG haproxy

The master, not intercepting this signal, would die, closing the pipe.
The worker would be woken up on the detection of this closure, and while
trying to perform the read() would get the signal in turn, causing the
read() to return EINTR and to stop polling on this fd instead of exiting.

A much safer way to deal with this situation is to loop on EINTR and
return without doing anything on EAGAIN, approximately like this :

    while (read(fd) == -1) {
       if (errno == EINTR)
          continue;
       if (errno == EAGAIN) {
          fd_cant_recv(fd);
          return;
       }
       /* otherwise probably die ? */
       break;
    }

    deinit();
    exit();

> @@ -2478,6 +2508,28 @@ int main(int argc, char **argv)
>                               exit(0);
>               }
>  
> +             if (global.mode & MODE_MWORKER) {
> +                     if ((getenv(REEXEC_FLAG) == NULL)) {
> +                             char *msg = NULL;
> +                             /* master pipe to ensure the master is still 
> alive  */
> +                             ret = pipe(mworker_pipe);
> +                             if (ret < 0) {
> +                                     Warning("[%s.main()] Cannot create 
> master pipe.\n", argv[0]);
> +                             } else {
> +                                     memprintf(&msg, "%d", mworker_pipe[0]);
> +                                     setenv("MWORKER_PIPE_RD", msg, 1);
> +                                     memprintf(&msg, "%d", mworker_pipe[1]);
> +                                     setenv("MWORKER_PIPE_WR", msg, 1);

Regarding the environment variable names, it's preferable to prepend
"HAPROXY" in front of them as we've been doing for all other ones to
avoid namespace conflicts with anything used in other environments (I've
ween places where you had to run "env|grep" to find your variable). I've
seen another one called "WAIT_ONLY" in one of the first patches and
which should equally be renamed.

Otherwise the series looks quite good, it would be nice to get some
feedback especially from systemd hostages^Wusers (my comments above
will not affect their experience unless they're really unlucky).

Thanks!
Willy

Re: [PATCH 6/9] MEDIUM: mworker: workers exit when the master leaves

Reply via email to