On Fri, Apr 12, 2013 at 7:36 AM, Arkadiusz Miśkiewicz <[email protected]>wrote:
> On Friday 12 of April 2013, Eric Covener wrote:
> > > diff -ur httpd-2.4.4.org/server/mpm_unix.chttpd-2.4.4/server/mpm_unix.c
> > > --- httpd-2.4.4.org/server/mpm_unix.c 2012-07-03 21:38:58.000000000
> > > +0200 +++ httpd-2.4.4/server/mpm_unix.c 2013-04-12
> > > 09:14:58.282929959 +0200 @@ -604,7 +604,17 @@
> > >
> > > len = strlen(data);
> > >
> > > }
> > >
> > > - apr_socket_send(sock, data, &len);
> > > + rv = apr_socket_send(sock, data, &len);
> > > + if (rv == APR_SUCCESS) {
> > > + char *buffer[10]; // dummy buffer to see if child is alive
> > > +
> > > + len = sizeof(buffer);
> > > + rv = apr_socket_recv(sock, buffer, &len);
> > > + if (rv != APR_SUCCESS) {
> > > + ap_log_error(APLOG_MARK, APLOG_DEBUG, rv, ap_server_conf,
> > > APLOGNO(00057) + "receiving response to dummy
> > > data to listener on %pI", lp->bind_addr); + }
> > > + }
> > >
> > > apr_socket_close(sock);
> > > apr_pool_destroy(p);
>
> End of this is:
>
> apr_socket_close(sock);
> apr_pool_destroy(p);
>
> return rv;
> >
> > Interesting analysis, I'm confused on how the patch helps though if it
> > doesn't actually do anything different when there is no reply (which
> > is more what I expected from the bullets)
>
> It returns error in rv
The implication of the recv was quite confusing to me initially when I
reviewed this again. Rüdiger had reviewed an strace and determined that
connect() was taking longer and longer over time (presumably as the queue
in the kernel built up). So with a timeout of 3 seconds you might have
connects that took increasing amounts of time all under 3 seconds that
could add up to extremely long delays. Add the recv call and you cap the
total delay at essentially 3 seconds since getting connect+recv to work
within the timeout forces there to be a non-zombie, non-busy child on the
other end.
I don't see that Jim's proposal for the 2.4 branch matches Arkadiusz's
exact scenario/PR or suggested fix though I do understand that it could
quicken graceful restart for some scenarios. I'll remove my note from
2.4.x/STATUS.
> which is propagated up to ap_mpm_pod_killpg
> which does
> for (i = 0; i < num && rv == APR_SUCCESS; i++) {
> rv = dummy_connection(pod);
> }
>
> and on error will break the loop and stop doing dummy connections as there
> are
> no children anyway. graceful restart then can proceed without a delay.
>
> --
> Arkadiusz Miśkiewicz, arekm / maven.pl
>
--
Born in Roswell... married an alien...
http://emptyhammock.com/