Had a bit more time, here is a patch that should work for Unix which have "apr_wait_for_io_or_timeout" available. I can't test windows/others so that's the reason for the ifdef.
ProxyPass / balance://hotcluster/ <Proxy balance://hotcluster> # defaultish tomcat BalancerMember ajp://10.136.130.111:7009 loadfactor=1 connectiontimeout=2 # below IP is not reachable, acts like a down box BalancerMember ajp://192.168.0.7:7010 loadfactor=1 connectiontimeout=2 </Proxy> Index: modules/proxy/proxy_util.c =================================================================== --- modules/proxy/proxy_util.c (revision 703219) +++ modules/proxy/proxy_util.c (working copy) @@ -2358,9 +2358,18 @@ "proxy: %s: fam %d socket created to connect to %s", proxy_function, backend_addr->family, worker->hostname); + /* use non blocking for connect timeouts to work. The ifdef + limits to unix systems which have apr_wait_for_io_or_timeout. + TODO: remove the ifdef and see what works/breaks */ +#ifdef USE_WAIT_FOR_IO + apr_socket_opt_set(newsock, APR_SO_NONBLOCK, 1); +#endif /* make the connection out of the socket */ rv = apr_socket_connect(newsock, backend_addr); +#ifdef USE_WAIT_FOR_IO + apr_socket_opt_set(newsock, APR_SO_NONBLOCK, 0); +#endif /* if an error occurred, loop round and try again */ if (rv != APR_SUCCESS) { apr_socket_close(newsock); Regards matt ----- Original Message ---- From: Matt Stevenson <[EMAIL PROTECTED]> To: dev@httpd.apache.org Sent: Wednesday, October 8, 2008 9:55:43 PM Subject: proxy_ajp connect timeout fix. Hi, I've used mod_jk (1/2) for years. I've always had an issue when a backend server goes down, not tomcat/jboss stopped but dead. Recently some people I work with have been using mod_proxy and mod_proxy_ajp. This seems to have the same issue. The code (proxy_util.c) assumes that apr_socket_timeout_set works for all connects. I don't believe it does, not unless it is in non blocking mode. I wrote the code below for"ap_proxy_connect_backend" before I looked deeper into the apr libs (sorry its not in diff format, and for the hard 2 sec timeout). The code seems to work fine for linux (and probably other unix). I've basically redone the apr code in apr_wait_for_io_or_timeout (should have dug deeper first). Anyway the current release code doesn't seem to work for me for down boxes (to test point an ajp proxy at a non existant IP on the network and a live server). I think if you put the socket in non-blocking mode first and with a timeout apr will try to handle a connect timeout (I haven't had a chance to try), switch back to non blocking after connect. Regards Matt if (worker->keepalive) { if ((rv = apr_socket_opt_set(newsock, APR_SO_KEEPALIVE, 1)) != APR_SUCCESS) { ap_log_error(APLOG_MARK, APLOG_ERR, rv, s, "apr_socket_opt_set(SO_KEEPALIVE): Failed to set" " Keepalive"); } } ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, s, "proxy: %s: fam %d socket created to connect to %s", proxy_function, backend_addr->family, worker->hostname); apr_socket_opt_set(newsock, APR_SO_NONBLOCK, 1); apr_socket_timeout_set(newsock, 0); rv = apr_socket_connect(newsock, backend_addr); if( rv != APR_SUCCESS && APR_STATUS_IS_EINPROGRESS(rv)){ apr_pollfd_t pfds[1]; apr_status_t status; apr_int32_t nfds; pfds[0].reqevents = APR_POLLOUT; pfds[0].desc_type = APR_POLL_SOCKET; pfds[0].desc.s = newsock; rv = apr_poll(&pfds[0], 1, &nfds, apr_time_from_sec(2)); } /* if an error occurred, loop round and try again */ if (rv != APR_SUCCESS) { apr_socket_close(newsock); loglevel = backend_addr->next ? APLOG_DEBUG : APLOG_ERR; ap_log_error(APLOG_MARK, loglevel, rv, s, "proxy: %s: attempt to connect to %pI (%s) failed", proxy_function, backend_addr, worker->hostname); backend_addr = backend_addr->next; continue; } apr_socket_opt_set(newsock, APR_SO_NONBLOCK, 0); /* Set a timeout on the socket */