Re: Late(r) stop of children processes on restart

2021-08-25 Thread Rainer Jung

Thanks for the headroom explanation Yann, good reading!

Rainer

Am 25.08.2021 um 13:23 schrieb Yann Ylavic:

On Tue, Jun 29, 2021 at 3:00 PM Rainer Jung  wrote:


Am 29.06.2021 um 14:31 schrieb Stefan Eissing:

Can comment really on the diff, but totally agree on the goal to minimize the 
unresponsive time and make graceful less disruptive.

So +1 for that.


+1 on the intention as well.


Checked in trunk (r1892587 + r1892595).



Not sure, whether that means people would need more headroom in the
scoreboard (which would probably warrant a sentence in CHANGES or docs
about that) or whether it just means the duration during which that
headroom is used changes (which I wouldn't care about).


The restart delay between stop and start is now minimal (no reload in
between), but the headroom needed does not change AIUI.
We still have the situation where connections (worker threads) are
active for both the new and old generations of children processes, and
its duration depends mainly on the actual lifetime of the connections.
So the current tunings still hold I think.

What changes now is that for both graceful and ungraceful restarts the
main process fully consumes one CPU (to reload) while children are
actively running (the old generation keeps accepting/processing
connections during reload), whereas before the children were tearing
down thus easing the CPUs (but filling the sockets backlogs,
potentially until exhaustion..).
So there might be a greater load spike (overall) than before on reload.

A note on the headroom while at it:
mpm_event is possibly less consumer of children (hence scoreboard
slots) on restart, because when a child is dying it stops (and thus
doesn't account for) the worker threads above the remaining number of
connections, which will accurately create children of the new
generation to scale. mpm_worker never stops threads (this improvement
never made it there AFAICT), thus by accounting for inactive threads
as active it will finally create more children of the new generation
as connections arrive (eventually reaching the limits earlier, or
blocking/waiting for worker threads in the new generation of children
overflowed by incoming connections which the main process thinks are
evenly distributed across all the children, including old
generation's).
I don't know how hard/worthy it is to align mpm_worker with mpm_event
on this, just a note..


Cheers;
Yann.


Re: Late(r) stop of children processes on restart

2021-08-25 Thread Yann Ylavic
On Tue, Jun 29, 2021 at 3:00 PM Rainer Jung  wrote:
>
> Am 29.06.2021 um 14:31 schrieb Stefan Eissing:
> > Can comment really on the diff, but totally agree on the goal to minimize 
> > the unresponsive time and make graceful less disruptive.
> >
> > So +1 for that.
>
> +1 on the intention as well.

Checked in trunk (r1892587 + r1892595).

>
> Not sure, whether that means people would need more headroom in the
> scoreboard (which would probably warrant a sentence in CHANGES or docs
> about that) or whether it just means the duration during which that
> headroom is used changes (which I wouldn't care about).

The restart delay between stop and start is now minimal (no reload in
between), but the headroom needed does not change AIUI.
We still have the situation where connections (worker threads) are
active for both the new and old generations of children processes, and
its duration depends mainly on the actual lifetime of the connections.
So the current tunings still hold I think.

What changes now is that for both graceful and ungraceful restarts the
main process fully consumes one CPU (to reload) while children are
actively running (the old generation keeps accepting/processing
connections during reload), whereas before the children were tearing
down thus easing the CPUs (but filling the sockets backlogs,
potentially until exhaustion..).
So there might be a greater load spike (overall) than before on reload.

A note on the headroom while at it:
mpm_event is possibly less consumer of children (hence scoreboard
slots) on restart, because when a child is dying it stops (and thus
doesn't account for) the worker threads above the remaining number of
connections, which will accurately create children of the new
generation to scale. mpm_worker never stops threads (this improvement
never made it there AFAICT), thus by accounting for inactive threads
as active it will finally create more children of the new generation
as connections arrive (eventually reaching the limits earlier, or
blocking/waiting for worker threads in the new generation of children
overflowed by incoming connections which the main process thinks are
evenly distributed across all the children, including old
generation's).
I don't know how hard/worthy it is to align mpm_worker with mpm_event
on this, just a note..


Cheers;
Yann.


Re: Late(r) stop of children processes on restart

2021-06-29 Thread Rainer Jung

Am 29.06.2021 um 14:31 schrieb Stefan Eissing:

Can comment really on the diff, but totally agree on the goal to minimize the 
unresponsive time and make graceful less disruptive.

So +1 for that.


+1 on the intention as well.

Not sure, whether that means people would need more headroom in the 
scoreboard (which would probably warrant a sentence in CHANGES or docs 
about that) or whether it just means the duration during which that 
headroom is used changes (which I wouldn't care about).


Thanks and regards,

Rainer


Am 28.06.2021 um 16:25 schrieb Yann Ylavic :

When the MPM event/worker is restarting, it first signals the
children's processes to stop (via POD), then reload the configuration,
and finally start the new generation.

This may be problematic when the reload takes some time to complete
because incoming connections are no longer processed.
A module at day $job is loading quite some regexes and JSON schemas
for each vhost, and I have seen restarts take tens of seconds to
complete with a large number of vhosts. I suppose this can happen with
many RewriteRules too.

How about we wait for the reload to complete before stopping the old
generation, like in the attached patch (MPM event only for now,
changes in worker would be quite similar)?

This is achieved by creating the PODs and listeners buckets from a
generation pool (gen_pool), with a different lifetime than pconf.
gen_pool survives restarts and is created/cleared after the old
generation is stopped, entirely in the run_mpm hook, so the stop and
PODs and buckets handling is moved there (most changes are cut/paste).

WDYT?

Regards;
Yann.



Re: Late(r) stop of children processes on restart

2021-06-29 Thread Stefan Eissing
Can comment really on the diff, but totally agree on the goal to minimize the 
unresponsive time and make graceful less disruptive.

So +1 for that.

> Am 28.06.2021 um 16:25 schrieb Yann Ylavic :
> 
> When the MPM event/worker is restarting, it first signals the
> children's processes to stop (via POD), then reload the configuration,
> and finally start the new generation.
> 
> This may be problematic when the reload takes some time to complete
> because incoming connections are no longer processed.
> A module at day $job is loading quite some regexes and JSON schemas
> for each vhost, and I have seen restarts take tens of seconds to
> complete with a large number of vhosts. I suppose this can happen with
> many RewriteRules too.
> 
> How about we wait for the reload to complete before stopping the old
> generation, like in the attached patch (MPM event only for now,
> changes in worker would be quite similar)?
> 
> This is achieved by creating the PODs and listeners buckets from a
> generation pool (gen_pool), with a different lifetime than pconf.
> gen_pool survives restarts and is created/cleared after the old
> generation is stopped, entirely in the run_mpm hook, so the stop and
> PODs and buckets handling is moved there (most changes are cut/paste).
> 
> WDYT?
> 
> Regards;
> Yann.
> 



Late(r) stop of children processes on restart

2021-06-28 Thread Yann Ylavic
When the MPM event/worker is restarting, it first signals the
children's processes to stop (via POD), then reload the configuration,
and finally start the new generation.

This may be problematic when the reload takes some time to complete
because incoming connections are no longer processed.
A module at day $job is loading quite some regexes and JSON schemas
for each vhost, and I have seen restarts take tens of seconds to
complete with a large number of vhosts. I suppose this can happen with
many RewriteRules too.

How about we wait for the reload to complete before stopping the old
generation, like in the attached patch (MPM event only for now,
changes in worker would be quite similar)?

This is achieved by creating the PODs and listeners buckets from a
generation pool (gen_pool), with a different lifetime than pconf.
gen_pool survives restarts and is created/cleared after the old
generation is stopped, entirely in the run_mpm hook, so the stop and
PODs and buckets handling is moved there (most changes are cut/paste).

WDYT?

Regards;
Yann.
Index: server/listen.c
===
--- server/listen.c	(revision 1890695)
+++ server/listen.c	(working copy)
@@ -853,7 +853,7 @@ AP_DECLARE(apr_status_t) ap_duplicate_listeners(ap
 if (val > 1) {
 *num_buckets = val;
 }
-ap_log_perror(APLOG_MARK, APLOG_INFO, 0, p, APLOGNO(02819)
+ap_log_error(APLOG_MARK, APLOG_INFO, 0, s, APLOGNO(02819)
   "Using %i listeners bucket(s) based on %i "
   "online CPU cores and a ratio of %i",
   *num_buckets, num_online_cores,
@@ -862,7 +862,7 @@ AP_DECLARE(apr_status_t) ap_duplicate_listeners(ap
 else
 #endif
 if (!warn_once) {
-ap_log_perror(APLOG_MARK, APLOG_WARNING, 0, p, APLOGNO(02820)
+ap_log_error(APLOG_MARK, APLOG_INFO, 0, s, APLOGNO(02820)
   "ListenCoresBucketsRatio ignored without "
   "SO_REUSEPORT and _SC_NPROCESSORS_ONLN "
   "support: using a single listeners bucket");
Index: server/mpm/event/event.c
===
--- server/mpm/event/event.c	(revision 1890695)
+++ server/mpm/event/event.c	(working copy)
@@ -390,6 +390,11 @@ typedef struct socket_callback_baton
 unsigned int signaled :1;
 } socket_callback_baton_t;
 
+typedef struct event_child_bucket {
+ap_pod_t *pod;
+ap_listen_rec *listeners;
+} event_child_bucket;
+
 /* data retained by event across load/unload of the module
  * allocated on first call to pre-config hook; located on
  * subsequent calls to pre-config hook
@@ -397,6 +402,9 @@ typedef struct socket_callback_baton
 typedef struct event_retained_data {
 ap_unixd_mpm_retained_data *mpm;
 
+apr_pool_t *gen_pool; /* generation pool (start->stop children lifetime) */
+event_child_bucket *buckets; /* children buckets (reset per generation) */
+
 int first_server_limit;
 int first_thread_limit;
 int sick_child_detected;
@@ -431,12 +439,7 @@ typedef struct event_retained_data {
 } event_retained_data;
 static event_retained_data *retained;
 
-typedef struct event_child_bucket {
-ap_pod_t *pod;
-ap_listen_rec *listeners;
-} event_child_bucket;
-static event_child_bucket *all_buckets, /* All listeners buckets */
-  *my_bucket;   /* Current child bucket */
+static event_child_bucket *my_bucket;  /* Current child bucket */
 
 struct event_srv_cfg_s {
 struct timeout_queue *wc_q,
@@ -2774,8 +2777,8 @@ static void child_main(int child_num_arg, int chil
 /* close unused listeners and pods */
 for (i = 0; i < retained->mpm->num_buckets; i++) {
 if (i != child_bucket) {
-ap_close_listeners_ex(all_buckets[i].listeners);
-ap_mpm_podx_close(all_buckets[i].pod);
+ap_close_listeners_ex(retained->buckets[i].listeners);
+ap_mpm_podx_close(retained->buckets[i].pod);
 }
 }
 
@@ -2800,6 +2803,9 @@ static void child_main(int child_num_arg, int chil
 clean_child_exit(APEXIT_CHILDFATAL);
 }
 
+/* For rand() users (e.g. skiplist). */
+srand((unsigned int)apr_time_now());
+
 ap_run_child_init(pchild, ap_server_conf);
 
 if (ap_max_requests_per_child) {
@@ -2944,7 +2950,7 @@ static int make_child(server_rec * s, int slot, in
 }
 
 if (one_process) {
-my_bucket = _buckets[0];
+my_bucket = >buckets[0];
 
 event_note_child_started(slot, getpid());
 child_main(slot, 0);
@@ -2973,7 +2979,7 @@ static int make_child(server_rec * s, int slot, in
 }
 
 if (!pid) {
-my_bucket = _buckets[bucket];
+my_bucket = >buckets[bucket];
 
 #ifdef HAVE_BINDPROCESSOR
 /* By default, AIX binds to a single