Re: [PATCH] MEDIUM: global: add a 'hard-stop-after' option to cap the soft-stop time

Willy Tarreau Wed, 22 Mar 2017 23:44:03 -0700

Hi Cyril,

I have a few comments below :


> diff --git a/src/cfgparse.c b/src/cfgparse.c
> index 2eb25edb..9681e06b 100644
> --- a/src/cfgparse.c
> +++ b/src/cfgparse.c
> @@ -1011,6 +1011,23 @@ int cfg_parse_global(const char *file, int linenum, 
> char **args, int kwm)
>               }
>       }
>       /* end of user/group name handling*/
> +     else if (strcmp(args[0], "hard-stop-after") == 0) {
> +             const char *res;
> +
> +             if (!*args[1]) {
> +                     Alert("parsing [%s:%d] : '%s' expects <time> as 
> argument.\n",
> +                             file, linenum, args[0]);
> +                     err_code |= ERR_ALERT | ERR_FATAL;
> +                     goto out;
> +             }
> +             res = parse_time_err(args[1], &global.hard_stop_after, 
> TIME_UNIT_MS);
> +             if (res) {
> +                     Alert("parsing [%s:%d]: unexpected character '%c' in 
> argument to <%s>.\n",
> +                             file, linenum, *res, args[0]);
> +                     err_code |= ERR_ALERT | ERR_FATAL;
> +                     goto out;
> +             }
> +     }

You can place this directly as a keyword parser in proxy.c. For this
you can simply add the following line to cfg_kws :

   { CFG_GLOBAL, "hard-stop-after", proxy_parse_hard_stop_after },

This allows to make the parser more modular and one day we may even be
able to convert all remaining keywords in order to dump the config
language. But that's not a hard requirement, it's just a good practise
to adopt for future keywords.

> diff --git a/src/haproxy.c b/src/haproxy.c
> index 559b4811..71015cc1 100644
> --- a/src/haproxy.c
> +++ b/src/haproxy.c
> @@ -117,6 +117,7 @@ int  relative_pid = 1;            /* process id starting 
> at 1 */
>  
>  /* global options */
>  struct global global = {
> +     .hard_stop_after = TICK_ETERNITY,
>       .nbproc = 1,
>       .req_count = 0,
>       .logsrvs = LIST_HEAD_INIT(global.logsrvs),
> @@ -1225,7 +1226,7 @@ static void deinit_stick_rules(struct list *rules)
>       }
>  }
>  
> -static void deinit(void)
> +void deinit(void)
>  {
>       struct proxy *p = proxy, *p0;
>       struct cap_hdr *h,*h_next;
> diff --git a/src/proxy.c b/src/proxy.c
> index 19eddcac..8a8b5406 100644
> --- a/src/proxy.c
> +++ b/src/proxy.c
> @@ -914,6 +914,29 @@ struct task *manage_proxy(struct task *t)
>  }
>  
>  
> +struct task *hard_stop(struct task *t)
> +{
> +     struct proxy *p;
> +     struct stream *s;
> +
> +     Warning("soft-stop running for too long, performing a hard-stop.\n");
> +     send_log(NULL, LOG_WARNING, "soft-stop running for too long, performing 
> a hard-stop.\n");
> +     p = proxy;
> +     while (p) {
> +             if (!(p->cap & PR_CAP_FE))
> +                     continue;
> +
> +             list_for_each_entry(s, &streams, list) {
> +                     stream_shutdown(s, SF_ERR_KILLED);
> +             }
> +             p = p->next;
> +     }

You don't use "p" in this loop, I think you started using it initially
and now you don't need it anymore, so in fact when you visit the first
proxy you shut down all streams and then nothing is done when looping
over the remaining ones. So all the streams are visited once per
frontend.

That said, I'm not convinced by the benefit of killing all streams like
this given that at the moment it does nothing, it only wakes up their
task and immediately dies a few lines later in exit() without giving
them any chance to run. However I admit that if we later want to give a
chance to these tasks to complete (eg: by waiting one extra second and
coming back here again to exit for real) then it could be useful to do
it.

I'd be tempted to suggest that we set a flag and re-arm the timer to
come back here to exit so that tasks can stop cleanly, this would give
an opportunity to get a log for each of them, but on the other hand
almost all the tasks being killed late will be purely TCP and most of
the time users don't log pure TCP connections, so the benefit becomes
very low. Otherwise we can simply exit and not even kill them.

Hmmm just thinking, in fact there could be a benefit in letting them
run : ensuring that we close outgoing connections with an RST to avoid
accumulating TIME_WAITs.

What do you think about trying something like this (not even compile-tested) :

        static int killed;
        struct stream *s;

        if (killed) {
                Warning("Some tasks resisted to hard-stop, exiting now.\n");
                send_log(NULL, LOG_WARNING, "Some tasks resisted to hard-stop, 
exiting now.\n");
                /* Do some cleanup and explicitely quit */
                deinit();
                exit(0);
        }

        Warning("soft-stop running for too long, performing a hard-stop.\n");
        send_log(NULL, LOG_WARNING, "soft-stop running for too long, performing 
a hard-stop.\n");

        /* kill the tasks and give them one second to terminate cleanly */
        list_for_each_entry(s, &streams, list) {
                stream_shutdown(s, SF_ERR_KILLED);
        }

        killed = 1;
        t->expire = tick_add(now_ms, MS_TO_TICKS(1000));
        return t;

Normally if all tasks properly die, we'll automatically break out of
run_poll_loops(). If we have to come back here, it means we didn't
quit so it's just a safety belt to exit here.

Additionally, since it helps improving the reliability of the service
during reloads without adding particular features and the patch is
reasonably small and isolated, I was thinking we could backport it to
1.7 and even possibly 1.6. What do you and others think ?

Thanks!
Willy

Re: [PATCH] MEDIUM: global: add a 'hard-stop-after' option to cap the soft-stop time

Reply via email to