On 06/14/2008 10:42 PM, William A. Rowe, Jr. wrote:
Guys, if anyone is looking at this, I'll hold off from tagging a bit longer,
as I'd rather have apr-1.3.1 address all the platform quirks we identified
in preparing 2.2.9 for release. But if I hear nothing, I'll have to just move ahead :)

Bill

Paul Querna wrote:

On aurora.apache.org, shortly after installing the new version, we hit a problem with apr_pollset_poll:

[Thu Jun 12 05:36:51 2008] [error] (70007)The timeout specified has expired: apr_pollset_poll: (listen)
[Thu Jun 12 05:36:52 2008] [notice] caught SIGTERM, shutting down

If you look in worker.c, around line 687, you can see that if do a graceful shutdown if we get an unexpected error from apr_pollset_poll.

This appears to be a regression caused by r641661:
https://svn.apache.org/viewvc?view=rev&revision=641661

Which was a fix for PR 42580: https://issues.apache.org/bugzilla/show_bug.cgi?id=42580

This appears to be an relative edge case on Solaris 10 -- it hasn't happened again, and it is a regression in APR, but relatively small, so I am still +1 for httpd-2.2.9 shipping.

Is this really a regression in APR or were we just as lucky before as we
were after?

Code from httpd

               rv = apr_pollset_poll(pollset, -1, &numdesc, &pdesc);
                if (rv != APR_SUCCESS) {
                    if (APR_STATUS_IS_EINTR(rv)) {
                        continue;
                    }

                    /* apr_pollset_poll() will only return errors in 
catastrophic
                     * circumstances. Let's try exiting gracefully, for now. */
                    ap_log_error(APLOG_MARK, APLOG_ERR, rv,
                                 (const server_rec *) ap_server_conf,
                                 "apr_pollset_poll: (listen)");


So we the error message logged if apr_pollset_poll returns anything different 
then
APR_SUCCESS or APR_EINTR.

So lets have a look at r641661:

--- apr/apr/trunk/poll/unix/port.c      2008/03/27 00:31:21     641660
+++ apr/apr/trunk/poll/unix/port.c      2008/03/27 00:46:05     641661
@@ -295,12 +295,7 @@

     if (ret == -1) {
         (*num) = 0;
-        if (errno == ETIME || errno == EINTR) {
-            rv = APR_TIMEUP;
-        }
-        else {
-            rv = APR_EGENERAL;
-        }
+        rv = apr_get_netos_error();
     }
     else if (nget == 0) {
         rv = APR_TIMEUP;

So the code before said that if port_getn returns -1 (== fails) we return 
APR_TIMEUP
if the error is ETIME or EINTR and APR_EGENERAL.
So IMHO the error message (in this IMHO the same) would have been shown with 
the old
code.
What is more strange to me is that we get a timeout error ((70007)The timeout 
specified has
expired: apr_pollset_poll:) even thought we called apr_pollset_poll with -1 as 
timeout which
means wait indefinitely or no timeout. The implementation of apr_pollset_poll 
seems to be
correct as it ensures that we supply NULL in this case to port_getn. But OTOH 
the man page
for port_get / port_getn documents timeout behaviour only for port_get (setting 
timeout parameter
to null means not timeout) not for port_getn. So couldn't this be a Solaris bug?

Regards

Rüdiger


Reply via email to