On Fri, Nov 4, 2011 at 4:14 PM, Jim Jagielski <j...@jagunet.com> wrote: > > On Nov 4, 2011, at 4:23 AM, Rüdiger Plüm wrote: > >> >> >> Am 03.11.2011 20:00, schrieb Jim Jagielski: >>> >>> On Nov 3, 2011, at 2:37 PM, Jeff Trawick wrote: >>>> >>>> I'm not disputing that there is some undiagnosed situation where >>>> APR_ETIMEUP is seen. >>>> >>>> I am looking for confirmation that APR_ETIMEUP is the expected value. >>>> >>> >>> It's hard to diagnose what the value should be... all I know >>> is that what is being returned thru APR is EAGAIN, and this >>> causes issues during the prefetch phase. >> >> But I agree with Jeff that this looks like a bug in APR that should be fixed >> there. We should NOT get an EAGAIN here. Only a timeout or something more >> fatal (like a closed socket). >> > > I agree with that... But again looking at APR as an "external" > dependency, we know that APR does, at times, return a EAGAIN > when it shouldn't, and so httpd should work around that.
You have a reproducible testcase, right? Can you put some sort of tracing somewhere in apr_status_t apr_socket_recv(apr_socket_t *sock, char *buf, apr_size_t *len) { ... do { rv = read(sock->socketdes, buf, (*len)); } while (rv == -1 && errno == EINTR); while ((rv == -1) && (errno == EAGAIN || errno == EWOULDBLOCK) && (sock->timeout > 0)) { do_select: arv = apr_wait_for_io_or_timeout(NULL, sock, 1); if (arv != APR_SUCCESS) { *len = 0; return arv; } else { do { rv = read(sock->socketdes, buf, (*len)); } while (rv == -1 && errno == EINTR); } } to see if sock->timeout isn't what we think (perhaps httpd's fault) or if it is returning EAGAIN for some other reason? > > It all depends on how tightly we, as the httpd pmc, wish to > be "bound" by the APR pmc, if you get my meaning. > > >>> >>> For sure, even if we allow EAGAIN, if the underlying condition >>> still causes a read error, we'll hit it when we really start >>> reading in the body. >>> >>> I guess the main idea is that if we're going to prefetch, and >>> I'm trying to remember why we do, then we should be more >>> lenient on what we determine as an "unrecoverable" error. If >>> we hit EAGAIN and/or TIMEUP, I'm find with logging it and then >>> breaking out of that loop, even without any retries. >> >> Fine with me for TIMEUP and as a temporary fix for EAGAIN. But we >> should find the root cause for EAGAIN. >> > > +1... (obviously ;) ) > > -- Born in Roswell... married an alien...