Re: 2 second delays in mod_jk while "maintaining workers"
Hello, I've spent a week looking into mod_jk, Apache, etc., and the problem appears to be due to a bug in the AJP connector within JBoss: https://jira.jboss.org/browse/JBPAPP-366 This bug is not fixed in the latest release of JBoss 4.2.3, which is rather disappointing. After downloading the jbossweb-2.0.1-GA source from subversion, patching, recompiling with Java 1.5 (if you want it to work on the Java 1.5 and 1.6 versions of JBoss 4.2.3), the problem disappears. While solving this problem, I identified the need for a debug statement to be generated from mod_jk when the poll(..) fails, i.e. waited 2 seconds and discovered the socket was not being shut properly. Mladen has kindly added this to the next release. Thanks to everyone who's contributed and if I have any further updates, I will let you know. John - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: 2 second delays in mod_jk while "maintaining workers"
> From: John Baker [mailto:jba...@javasystemsolutions.com] > Subject: Re: 2 second delays in mod_jk while "maintaining workers" > some of our JBoss instances are showing a large number > ajp threads that seem to be in keep alive mode but are > well beyond the connectionTimeout defined in server.xml My understanding is that connectionTimeout only applies to the period between the connection being established and the arrival of the *first* request. Tomcat 6.0 has a keepAliveTimeout attribute, but older Tomcats do not, and I don't know what they default to. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On Thursday 09 September 2010 16:08:04 you wrote: > On 09/09/2010 04:48 PM, John Baker wrote: > > > > > > On Thursday 09 September 2010 15:45:44 you wrote: > >> Nice. > > > > I spoke too soon. I'm now trying to figure out how to print out the IP > > address of the socket (I don't really do C) so I can log the socket that > > caused the poll to timeout, and compare with a tcpdump. > > > > Try with getsockname, but it's not a trivial task. > > Perhaps using /proc/pid/fd > and matching the socket# we have with the actual info > eg. > 32 -> socket:[port] Isn't the address stored with jk_sock_t so I can use jk_dump_hinfo, as I can see is in the open socket method? - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On 09/09/2010 04:48 PM, John Baker wrote: On Thursday 09 September 2010 15:45:44 you wrote: Nice. I spoke too soon. I'm now trying to figure out how to print out the IP address of the socket (I don't really do C) so I can log the socket that caused the poll to timeout, and compare with a tcpdump. Try with getsockname, but it's not a trivial task. Perhaps using /proc/pid/fd and matching the socket# we have with the actual info eg. 32 -> socket:[port] Regards -- ^TM - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
Interestingly, some of our JBoss instances are showing a large number ajp threads that seem to be in keep alive mode but are well beyond the connectionTimeout defined in server.xml (which is set to 9): Max threads: 40 Current thread count: 40 Current thread busy: 40 Max processing time: 5563 ms Processing time: 2552.538 s Request count: 75522 Error count: 6 Bytes received: 0.00 MB Bytes sent: 879.42 MB Stage TimeB Sent B Recv Client VHost Request K 508896 ms ? ? 10.113.168.4? ? K 93386 ms? ? 10.113.168.4? ? K 7147 ms ? ? 10.113.168.4? ? K 3994 ms ? ? 10.113.168.4? ? K 660 ms ? ? 10.113.168.4? ? K 735822 ms ? ? 10.113.168.4? ? K 71565 ms? ? 10.113.168.4? ? Almost as if connectionTimeout is being ignored in some cases - this is JBoss 4.2.3, so an oldish version of Tomcat. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
I would be happy to share all my evidence and write a report once we get to the bottom of this problem. Any hints on printing out the socket IP (i.e. of Tomcat)? On Thursday 09 September 2010 15:47:33 you wrote: > On 09/09/2010 03:22 PM, John Baker wrote: > > > > Do you fancy putting that change into the next release of mod_jk? > > > > BTW, do you fancy opening BZ report fill in what we > tried so far and the actual solution, so we can track that down > for the next releases and patches? > > Regards - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On Thursday 09 September 2010 15:45:44 you wrote: > Nice. I spoke too soon. I'm now trying to figure out how to print out the IP address of the socket (I don't really do C) so I can log the socket that caused the poll to timeout, and compare with a tcpdump. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On 09/09/2010 03:22 PM, John Baker wrote: Do you fancy putting that change into the next release of mod_jk? BTW, do you fancy opening BZ report fill in what we tried so far and the actual solution, so we can track that down for the next releases and patches? Regards -- ^TM - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On 09/09/2010 03:22 PM, John Baker wrote: On Thursday 09 September 2010 13:59:50 you wrote: If that doesn't help, it's obvious the Tomcat doesn't close the socket, so should be investigated why. Like said before, either the Tomcat doesn't respond to shutdown or the shutdown's FIN packet isn't send to the Tomcat or back to mod_jk, due to some weird OS networking issues. I think there's been rather a lot of speculation on this problem :-) The change to fds.events appears to be working! Thanks. Nice. I'm somewhat surprised this hasn't been picked up anytime before? We can't be the only people deploying hundreds of worker threads/JBoss instances, and there are no firewalls between mod_jk/JBoss. I guess it's possible the AJP thread on one sever was 'stuck'. It depends on the OS networking layer thought. POLLIN should be just enough to fire the poll if the socket was closed. Do you fancy putting that change into the next release of mod_jk? Sure. I'll have to make some faulty AJP connector in Tomcat that will close the socket after some timeout, and see if that event wasn't propagated from our socket. Regards -- ^TM - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On Thursday 09 September 2010 13:59:50 you wrote: > > > > If that doesn't help, it's obvious the Tomcat > > doesn't close the socket, so should be investigated > > why. Like said before, either the Tomcat doesn't > > respond to shutdown or the shutdown's FIN packet > > isn't send to the Tomcat or back to mod_jk, > > due to some weird OS networking issues. > > I think there's been rather a lot of speculation on this problem :-) The change to fds.events appears to be working! Thanks. I'm somewhat surprised this hasn't been picked up anytime before? We can't be the only people deploying hundreds of worker threads/JBoss instances, and there are no firewalls between mod_jk/JBoss. I guess it's possible the AJP thread on one sever was 'stuck'. Do you fancy putting that change into the next release of mod_jk? - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On Sep 9, 2010, at 8:34 AM, Mladen Turk wrote: > On 09/09/2010 02:09 PM, John Baker wrote: >> Thanks for the feedback. >> >> Can you tell me why this if statement exists: >> >> if (poll(&fds, 1, timeout)> 0) >>{ >>... >>} >>else >> break; >> >> It appears to be at fault. >> > > poll returns positive number if there is some data to > read on the socket. > It returns 0 in case of timeout (two seconds in this case). > > > Try to change > fds.events = POLLIN; > to > fds.events = POLLIN | POLLHUP | POLLERR; > > > If that doesn't help, it's obvious the Tomcat > doesn't close the socket, so should be investigated > why. Like said before, either the Tomcat doesn't > respond to shutdown or the shutdown's FIN packet > isn't send to the Tomcat or back to mod_jk, > due to some weird OS networking issues. > +1 - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On 09/09/2010 02:09 PM, John Baker wrote: Thanks for the feedback. Can you tell me why this if statement exists: if (poll(&fds, 1, timeout)> 0) { ... } else break; It appears to be at fault. poll returns positive number if there is some data to read on the socket. It returns 0 in case of timeout (two seconds in this case). Try to change fds.events = POLLIN; to fds.events = POLLIN | POLLHUP | POLLERR; If that doesn't help, it's obvious the Tomcat doesn't close the socket, so should be investigated why. Like said before, either the Tomcat doesn't respond to shutdown or the shutdown's FIN packet isn't send to the Tomcat or back to mod_jk, due to some weird OS networking issues. Regards -- ^TM - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
Thanks for the feedback. Can you tell me why this if statement exists: if (poll(&fds, 1, timeout) > 0) { ... } else break; It appears to be at fault. John - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On 09/08/2010 05:51 PM, John Baker wrote: I think it would be helpful if you could walk us through the code. It is very simple actually. 1. shutdown write end of our connection to Tomcat This should cause the soket.read() in Tomcat to throw exception which is used to close the socket 2. use poll with 2 second timeout for POLLIN event poll() will return immediately if either the socket is closed by Tomcat or there is a data to read in which case read to a dummy buffer (drain) and wait for more data until the poll returns error or read returns error 3. close the socket So the reason why the poll breaks after 2 seconds is the timeout which shouldn't happen because we informed the Tomcat the our side of connection is closed. So it should eventually (after some latency depending on the CPU throttle close his side). Now, since this works most of the times, it's either that poll loop is running but the read returns 0 A simple debug message after rc = read(sd, &dummy[0], sizeof(dummy)); eg. jk_log(l, JK_LOG_DEBUG, "readed %d bytes from socked %d", rc, sd); Would help eliminate that situation. The fix would be to add if (rc == 0) { rc = -1; break; } Also you can add jk_log(l, JK_LOG_DEBUG, "socked %d shutdown write side", sd); just before that 'do {' so we can have the info how long the shutdown call took. It would be helpful if you can apply that code and recompile. Regards -- ^TM - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
I think it would be helpful if you could walk us through the code. The shutdown function provides three error codes - the jk function doesn't check them, and the most likely is 'socket closed' (ie jboss closed it?), hence I'm unsure what the drain code is doing. --Original Message-- From: John Baker To: Tomcat Users List ReplyTo: Tomcat Users List Subject: Re: 2 second delays in mod_jk while "maintaining workers" Sent: 8 Sep 2010 16:41 > On 09/08/2010 05:08 PM, John Baker wrote: > The code *is* required. > It is used when the client disconnects while the backend > still has some data in the AJP buffer. Drain is needed > to read that excess data. Why does it always report 0 bytes read? > If you can compile mod_jk and test, try adding some debug messages inside > the poll loop. Anyhow, seems to me there is some sort of network > problem you are facing. That's next on my list. > Have you been able to see the traffic using Wireshark? Not yet. > > > Regards > -- > ^TM > > - > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > > - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
> On 09/08/2010 05:08 PM, John Baker wrote: > The code *is* required. > It is used when the client disconnects while the backend > still has some data in the AJP buffer. Drain is needed > to read that excess data. Why does it always report 0 bytes read? > If you can compile mod_jk and test, try adding some debug messages inside > the poll loop. Anyhow, seems to me there is some sort of network > problem you are facing. That's next on my list. > Have you been able to see the traffic using Wireshark? Not yet. > > > Regards > -- > ^TM > > - > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > > - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
please can you remove me off the mailing list many thanks On Wed, Sep 8, 2010 at 4:29 PM, Mladen Turk wrote: > On 09/08/2010 05:08 PM, John Baker wrote: > >> Hello, >> >> I don't thikn the shutdown call is to blame - I think it's the large pile >> of code below. i.e. everything below the if (shutdown(..)) below. The >> question is, what does it all do and does it actually work? It appears to >> be the 'drain' code, but given it often results in this message: >> >> "Shutdown socket .. and read 0 lingering bytes in 2 sec." >> >> then it would suggest it doesn't work. The comments also suggest the 2 >> seconds is "guess work". >> >> I've sent SECONDS_TO_LINGER and MAX_SECS_TO_LINGER to 0 and the problem >> goes away, and I've seen this discussed elsewhere on the forum. >> >> So, if this code isn't needed, why is it still there? And if it is >> needed, why does it seem not to work and what is the correct figure for >> SECONDS_TO_LINGER? If this is being performed inline then I'd suggest 2s >> is far too long - clients don't want to wait 2s for a request to complete >> (ideally, socket clean up should not be done in this thread if any linger >> is required). >> >> > The code *is* required. > It is used when the client disconnects while the backend > still has some data in the AJP buffer. Drain is needed > to read that excess data. > > If you can compile mod_jk and test, try adding some debug messages inside > the poll loop. Anyhow, seems to me there is some sort of network > problem you are facing. > Have you been able to see the traffic using Wireshark? > > > Regards > -- > ^TM > > > - > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > >
Re: 2 second delays in mod_jk while "maintaining workers"
On 09/08/2010 05:08 PM, John Baker wrote: Hello, I don't thikn the shutdown call is to blame - I think it's the large pile of code below. i.e. everything below the if (shutdown(..)) below. The question is, what does it all do and does it actually work? It appears to be the 'drain' code, but given it often results in this message: "Shutdown socket .. and read 0 lingering bytes in 2 sec." then it would suggest it doesn't work. The comments also suggest the 2 seconds is "guess work". I've sent SECONDS_TO_LINGER and MAX_SECS_TO_LINGER to 0 and the problem goes away, and I've seen this discussed elsewhere on the forum. So, if this code isn't needed, why is it still there? And if it is needed, why does it seem not to work and what is the correct figure for SECONDS_TO_LINGER? If this is being performed inline then I'd suggest 2s is far too long - clients don't want to wait 2s for a request to complete (ideally, socket clean up should not be done in this thread if any linger is required). The code *is* required. It is used when the client disconnects while the backend still has some data in the AJP buffer. Drain is needed to read that excess data. If you can compile mod_jk and test, try adding some debug messages inside the poll loop. Anyhow, seems to me there is some sort of network problem you are facing. Have you been able to see the traffic using Wireshark? Regards -- ^TM - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
Hello, I don't thikn the shutdown call is to blame - I think it's the large pile of code below. i.e. everything below the if (shutdown(..)) below. The question is, what does it all do and does it actually work? It appears to be the 'drain' code, but given it often results in this message: "Shutdown socket .. and read 0 lingering bytes in 2 sec." then it would suggest it doesn't work. The comments also suggest the 2 seconds is "guess work". I've sent SECONDS_TO_LINGER and MAX_SECS_TO_LINGER to 0 and the problem goes away, and I've seen this discussed elsewhere on the forum. So, if this code isn't needed, why is it still there? And if it is needed, why does it seem not to work and what is the correct figure for SECONDS_TO_LINGER? If this is being performed inline then I'd suggest 2s is far too long - clients don't want to wait 2s for a request to complete (ideally, socket clean up should not be done in this thread if any linger is required). John .. MAX_SECS_TO_LINGER= 30 SECONDS_TO_LINGER= 2 ... int jk_shutdown_socket(jk_sock_t sd, jk_logger_t *l) { ... /* Shut down the socket for write, which will send a FIN * to the peer. */ if (shutdown(sd, SHUT_WR)) { rc = jk_close_socket(sd, l); if (JK_IS_DEBUG_LEVEL(l)) jk_log(l, JK_LOG_DEBUG, "Failed sending SHUT_WR for socket %d", sd); errno = save_errno; JK_TRACE_EXIT(l); return rc; } do { #ifdef HAVE_POLL struct pollfd fds; fds.fd = sd; fds.events = POLLIN; #else fd_set rs; FD_ZERO(&rs); /* Read all data from the peer until we reach "end-of-file" * (FIN from peer) or we've exceeded our overall timeout. If the * backend does not send us bytes within 2 seconds * (a value pulled from Apache 1.3 which seems to work well), * close the connection. */ FD_SET(sd, &rs); tv.tv_sec = timeout / 1000; tv.tv_usec = (timeout % 1000) * 1000; #endif rp = 0; #ifdef HAVE_POLL if (poll(&fds, 1, timeout) > 0) #else if (select((int)sd + 1, &rs, NULL, NULL, &tv) > 0) #endif { do { #if defined(WIN32) || (defined(NETWARE) && defined(__NOVELL_LIBC__)) rc = recv(sd, &dummy[0], sizeof(dummy), 0); if (JK_IS_SOCKET_ERROR(rc)) JK_GET_SOCKET_ERRNO(); #else rc = read(sd, &dummy[0], sizeof(dummy)); #endif if (rc > 0) rp += rc; } while (JK_IS_SOCKET_ERROR(rc) && (errno == EINTR || errno == EAGAIN)); if (rc < 0) break; } else break; rd += rp; if (rp < sizeof(dummy)) { if (timeout > SECONDS_TO_LINGER) { /* Try once again with 1000 times smaller timeout * In our case 2 msec. */ timeout = SECONDS_TO_LINGER; continue; } /* We have read less then size of buffer * It's a good chance there will be no more data * to read. */ if ((rc = sononblock(sd))) { rc = jk_close_socket(sd, l); if (JK_IS_DEBUG_LEVEL(l)) jk_log(l, JK_LOG_DEBUG, "error setting socket %d to nonblocking", sd); errno = save_errno; JK_TRACE_EXIT(l); return rc; } if (JK_IS_DEBUG_LEVEL(l)) jk_log(l, JK_LOG_DEBUG, "shutting down the read side of socket %d", sd); shutdown(sd, SHUT_RD); break; } timeout = SECONDS_TO_LINGER * 1000; } while (difftime(time(NULL), start) < MAX_SECS_TO_LINGER); rc = jk_close_socket(sd, l); if (JK_IS_DEBUG_LEVEL(l)) jk_log(l, JK_LOG_DEBUG, "Shutdown socket %d and read %d lingering bytes in %d sec.", sd, rd, (int)difftime(time(NULL), start)); errno = save_errno; JK_TRACE_EXIT(l); return rc; } - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
Hello We've just noticed that the maintenance mode operates on all workers, so having one worker run maintenance for the rest is making tracing the problem difficult. Reading down the logs, we can see it finds a worker and iterates through workers perforing maintenance. Is this correct? Moving on, consider the following debug: [Wed Sep 08 10:36:16.907 2010] [2205:47701852022112] [trace] ajp_abort_endpoint::jk_ajp_common.c (739): enter [Wed Sep 08 10:36:16.907 2010] [2205:47701852022112] [trace] jk_shutdown_socket::jk_connect.c (713): enter [Wed Sep 08 10:36:16.907 2010] [2205:47701852022112] [debug] jk_shutdown_socket::jk_connect.c (722): About to shutdown socket The next entry in the log with [2205:47701852022112] is the following: [Wed Sep 08 10:36:18.907 2010] [2205:47701852022112] [trace] jk_close_socket::jk_connect.c (650): enter [Wed Sep 08 10:36:18.908 2010] [2205:47701852022112] [trace] jk_close_socket::jk_connect.c (665): exit [Wed Sep 08 10:36:18.908 2010] [2205:47701852022112] [debug] jk_shutdown_socket::jk_connect.c (813): Shutdown socket 156 and read 0 lingering bytes in 2 sec. [Wed Sep 08 10:36:18.908 2010] [2205:47701852022112] [trace] jk_shutdown_socket::jk_connect.c (817): exit Looking at the code in jk_connect.c: int jk_shutdown_socket(jk_sock_t sd, jk_logger_t *l) { ... if (JK_IS_DEBUG_LEVEL(l)) jk_log(l, JK_LOG_DEBUG, "About to shutdown socket %d", sd); /* Shut down the socket for write, which will send a FIN * to the peer. */ if (shutdown(sd, SHUT_WR)) { rc = jk_close_socket(sd, l); And the jk_close_socket function: int jk_close_socket(jk_sock_t sd, jk_logger_t *l) { int rc; int save_errno; JK_TRACE_ENTER(l); Tieing up with the logs suggests the shutdown(sd, SHUT_WR) method is blocking. Any thoughts, John - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On 07.09.2010 19:46, John Baker wrote: Where can I find documentation on JkWatchdog? General documntation on mod_jk: http://tomcat.apache.org/connectors-doc/ JK directives: http://tomcat.apache.org/connectors-doc/reference/apache.html workers.properties: http://tomcat.apache.org/connectors-doc/reference/workers.html Timeouts: http://tomcat.apache.org/connectors-doc/generic_howto/timeouts.html Regards, Rainer - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
Where can I find documentation on JkWatchdog? On Tuesday 07 September 2010 13:51:23 you wrote: > - As a workaround: using a JkWatchdog moves the maintain into a separate > thread. But during the socket closing a lock is held, which blocks other > threads from accessing the same worker. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On 07.09.2010 12:40, Rainer Jung wrote: On 07.09.2010 12:32, John Baker wrote: Rainer, On Tuesday 07 September 2010 11:09:46 you wrote: I don't like "socket_timeout" ... worker.basic.socket_timeout=90 but I would like socket_connect_timeout. The next two are possibly a bit short, because if the backend e.g. does a Java Garbage Collection which miht take longer than 1 second, tose timeouts will fire and take the node out of load balancing. worker.basic.connect_timeout=1000 worker.basic.prepost_timeout=1000 I've change socket_timeout=90 to socket_connect_timeout=9000, and connect_timeout/prepost_timeout to 5000, and the problem persists: [Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug] jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 152 [Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug] jk_shutdown_socket::jk_connect.c (813): Shutdown socket 152 and read 0 lingering bytes in 2 sec. [Tue Sep 07 11:29:24.868 2010] [364:47921619424608] [debug] wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration OK, so it would be helpful to see a more complete log, e.g. grep 364:47921619424608 LOGFILE | egrep -e '11:29:2[012345]' If there's data in it which is not meant for public posting, you can also send it to me va direct mail. Thanks for the log snippets. I can now see: - the socket close *does* happen during maintain - this wasn't obvious from the previous log snippert, because more "Maintain" lines for other workers were left out Some relevant lines are: [Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug] wc_maintain::jk_worker.c (339): Maintaining worker lb-... ... [Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug] ajp_reset_endpoint::jk_ajp_common.c (757): (worker1) resetting endpoint with sd = 152 (socket shutdown) ... [Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug] jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 152 ... [Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug] jk_shutdown_socket::jk_connect.c (813): Shutdown socket 152 and read 0 lingering bytes in 2 sec. ... [Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug] ajp_maintain::jk_ajp_common.c (3190): cleaning pool slot=0 elapsed 185 in 2 [Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug] ajp_maintain::jk_ajp_common.c (3197): reached pool min size 1 from 1 cache slots [Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug] ajp_maintain::jk_ajp_common.c (3237): recycled 1 sockets in 2 seconds from 1 pool slots ... So now we can discuss, how to fix this. - Can we detect more reliably, whether we want to drain before closing or not - As a workaround: using a JkWatchdog moves the maintain into a separate thread. But during the socket closing a lock is held, which blocks other threads from accessing the same worker. We would need to rewrite the logic in ajp_maintain() to only hold the lock (critical section) while pulling sockets out of the ep_cache or entering them back, not while actually using the sockets. I still don't know, *why* the socket close blocks, i.e. why the backend doesn't respond to us wanting to close the socket. Here a network sniff like indicated by Mladen would help. Regards, Rainer - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On 07.09.2010 12:32, John Baker wrote: Rainer, On Tuesday 07 September 2010 11:09:46 you wrote: I don't like "socket_timeout" ... worker.basic.socket_timeout=90 but I would like socket_connect_timeout. The next two are possibly a bit short, because if the backend e.g. does a Java Garbage Collection which miht take longer than 1 second, tose timeouts will fire and take the node out of load balancing. worker.basic.connect_timeout=1000 worker.basic.prepost_timeout=1000 I've change socket_timeout=90 to socket_connect_timeout=9000, and connect_timeout/prepost_timeout to 5000, and the problem persists: [Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug] jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 152 [Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug] jk_shutdown_socket::jk_connect.c (813): Shutdown socket 152 and read 0 lingering bytes in 2 sec. [Tue Sep 07 11:29:24.868 2010] [364:47921619424608] [debug] wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration OK, so it would be helpful to see a more complete log, e.g. grep 364:47921619424608 LOGFILE | egrep -e '11:29:2[012345]' If there's data in it which is not meant for public posting, you can also send it to me va direct mail. Regards, Rainer - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On 09/07/2010 12:20 PM, John Baker wrote: On Tuesday 07 September 2010 11:13:07 you wrote: It's obvious that shutdown(socket, SHUT_WR) poll(socket, 2 seconds) close(socket) caused poll call to time out, meaning that the JBoss side didn't respond to the shutdown(socket, SHUT_WR) call by closing it's side of the connection. BTW, do you have "lingering bytes in 0 sec" messages in your log or lower then 2 sec? Yes. There's plenty of lingering 0 sec. Nice, it means that for most of the times the maintain works as it should. JBoss closes the socket and poll returns immediately cause of closed socket. You can use Wireshark and see what's actually going on in those cases. Regards -- ^TM - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On 07.09.2010 12:29, Mladen Turk wrote: On 09/07/2010 12:09 PM, Rainer Jung wrote: Are we sure, that those log lines are correct? This was generated by mod_jk 1.2.30 without any code changes, right? The other option would be to set JkWatchdogInterval 60 In that case wc_maintain would happen in separate thread without observable request delay. However the JK_LOCK would still block the connection. Think we should fix that by just invalidating the sockets, and then closing them in the watchdog thread. WDYT? I'm going to answer his other mail. Since he has trace log, I'm still interested to understand, why the socket close was logged *before* the maintain. After this is understood, moving the whole thing to the maintain could be a good solution (and not holding the lock while shutting down the socket). Regards, Rainer - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
Rainer, On Tuesday 07 September 2010 11:09:46 you wrote: > I don't like "socket_timeout" ... > > > worker.basic.socket_timeout=90 > > but I would like socket_connect_timeout. > > The next two are possibly a bit short, because if the backend e.g. does > a Java Garbage Collection which miht take longer than 1 second, tose > timeouts will fire and take the node out of load balancing. > > > worker.basic.connect_timeout=1000 > > worker.basic.prepost_timeout=1000 I've change socket_timeout=90 to socket_connect_timeout=9000, and connect_timeout/prepost_timeout to 5000, and the problem persists: [Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug] jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 152 [Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug] jk_shutdown_socket::jk_connect.c (813): Shutdown socket 152 and read 0 lingering bytes in 2 sec. [Tue Sep 07 11:29:24.868 2010] [364:47921619424608] [debug] wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration John - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On 09/07/2010 12:09 PM, Rainer Jung wrote: Are we sure, that those log lines are correct? This was generated by mod_jk 1.2.30 without any code changes, right? The other option would be to set JkWatchdogInterval 60 In that case wc_maintain would happen in separate thread without observable request delay. However the JK_LOCK would still block the connection. Think we should fix that by just invalidating the sockets, and then closing them in the watchdog thread. WDYT? Regards -- ^TM - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On Tuesday 07 September 2010 11:13:07 you wrote: > It's obvious that > > shutdown(socket, SHUT_WR) > poll(socket, 2 seconds) > close(socket) > > caused poll call to time out, meaning that > the JBoss side didn't respond to the > shutdown(socket, SHUT_WR) call by > closing it's side of the connection. > > BTW, do you have "lingering bytes in 0 sec" messages > in your log or lower then 2 sec? Yes. There's plenty of lingering 0 sec. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
Rainer, On Tuesday 07 September 2010 11:09:46 you wrote: > > [Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] > > find_match::jk_uri_worker_map.c (850): Attempting to map context URI > > '/*=lb-jboss51-integration' source 'JkMount' > > [Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] > > find_match::jk_uri_worker_map.c (863): Found a wildchar match > > '/*=lb-jboss51-integration' > > [Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] > > jk_handler::mod_jk.c (2462): Into handler jakarta-servlet > > worker=lb-jboss51-integration r->proxyreq=0 > > [Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] > > wc_get_worker_for_name::jk_worker.c (116): found a worker > > lb-jboss51-integration > > [Tue Sep 07 10:20:20.618 2010] [18806:46962404156768] [debug] > > jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 154 > > [Tue Sep 07 10:20:22.619 2010] [18806:46962404156768] [debug] > > jk_shutdown_socket::jk_connect.c (813): Shutdown socket 154 and read 0 > > lingering bytes in 2 sec. > > Hmmm, that's strange. Why is there a connection close between > wc_get_worker_for_name() and wc_maintain()? The first call doing > something with the connections would be wc_maintain(). And if it decides > to close sockets, it would log addiional statements. Nevertheless we can > see, that closing the socket really took 2 seconds (which is in fact a > builtin mod_jk timeout). That's interesting, because I thought this bug: https://issues.apache.org/bugzilla/show_bug.cgi?id=48169 Was addressed in 1.2.30 and this discusses a 2 second problem. > Are we sure, that those log lines are correct? This was generated by > mod_jk 1.2.30 without any code changes, right? I compiled it fresh from source without any change: [Tue Sep 07 10:07:56.683 2010] [17424:46962404156768] [info] init_jk::mod_jk.c (3189): mod_jk/1.2.30 initialized > Log level trace would show us more precisely, how we came to close the > socket, but it blows up log volume a lot. I'm using trace and greping for whatever you need to see. if you want to see more debug, tell me what keyword to grep (ideally, so I don't catch the hundreds of other lines of debug being generated by other worker threads). > > To confirm the settings: > > > > worker.basic.connection_pool_timeout=90 > > You don't usually want to set a pool_size. "1" would be appropriate when > using the prefork Apache MPM, but in that case mod_jk automatically sets > it to "1". I can remove it if you want - I only set it to 1 to ensure it was set (yes, I am using prefork). > > > worker.basic.connection_pool_size=1 > > worker.basic.socket_keepalive=1 > > I don't like "socket_timeout" ... Why? > > worker.basic.socket_timeout=90 > > but I would like socket_connect_timeout. I can change it. > The next two are possibly a bit short, because if the backend e.g. does > a Java Garbage Collection which miht take longer than 1 second, tose > timeouts will fire and take the node out of load balancing. > > > worker.basic.connect_timeout=1000 > > worker.basic.prepost_timeout=1000 > > You might want to add max_reply_timeouts, otherwise one single reply > timeout can take a node out of load balancing. > > > worker.basic.reply_timeout=9 > > # Stop recoery attempts when JBoss instances do not respond. > > worker.basic.retries=1 > > worker.basic.recovery_options=27 > Don't want sticky sessions? Nope. The app clusters. Although it's set for other workers. John - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On 09/07/2010 11:53 AM, John Baker wrote: TM, On Tuesday 07 September 2010 09:47:17 you wrote: If you have enough resources, try to disable connectionTimeout on AJP connector and see weather the same will happen again. I've removed the connectionTimeout attribute and the problem persists - I can see the "lingering bytes in 2 sec" messages as before. OK. The next one would be the tcpdump during that interval to see actually the status of the sockets both on mod_jk and JBoss boxes. It's obvious that shutdown(socket, SHUT_WR) poll(socket, 2 seconds) close(socket) caused poll call to time out, meaning that the JBoss side didn't respond to the shutdown(socket, SHUT_WR) call by closing it's side of the connection. BTW, do you have "lingering bytes in 0 sec" messages in your log or lower then 2 sec? Regards -- ^TM - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On 07.09.2010 11:36, John Baker wrote: Rainer, Is this acceptable? I am using a tail and an egrep to match the various statements you wish to see. if it's missing anything, plesae let me know what to add to the grep. Yes, that's better! [Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] find_match::jk_uri_worker_map.c (850): Attempting to map context URI '/*=lb-jboss51-integration' source 'JkMount' [Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] find_match::jk_uri_worker_map.c (863): Found a wildchar match '/*=lb-jboss51-integration' [Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] jk_handler::mod_jk.c (2462): Into handler jakarta-servlet worker=lb-jboss51-integration r->proxyreq=0 [Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] wc_get_worker_for_name::jk_worker.c (116): found a worker lb-jboss51-integration [Tue Sep 07 10:20:20.618 2010] [18806:46962404156768] [debug] jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 154 [Tue Sep 07 10:20:22.619 2010] [18806:46962404156768] [debug] jk_shutdown_socket::jk_connect.c (813): Shutdown socket 154 and read 0 lingering bytes in 2 sec. Hmmm, that's strange. Why is there a connection close between wc_get_worker_for_name() and wc_maintain()? The first call doing something with the connections would be wc_maintain(). And if it decides to close sockets, it would log addiional statements. Nevertheless we can see, that closing the socket really took 2 seconds (which is in fact a builtin mod_jk timeout). Are we sure, that those log lines are correct? This was generated by mod_jk 1.2.30 without any code changes, right? Log level trace would show us more precisely, how we came to close the socket, but it blows up log volume a lot. [Tue Sep 07 10:20:22.625 2010] [18806:46962404156768] [debug] wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration [Tue Sep 07 10:20:22.628 2010] [18806:46962404156768] [debug] get_most_suitable_worker::jk_lb_worker.c (1001): found best worker jboss51-integration1 (jboss51-integration1) using method 'Request' [Tue Sep 07 10:20:22.628 2010] [18806:46962404156768] [debug] service::jk_lb_worker.c (1161): service worker=jboss51-integration1 route=jboss51-integration1 [Tue Sep 07 10:20:22.628 2010] [18806:46962404156768] [debug] ajp_service::jk_ajp_common.c (2376): processing jboss51-integration1 with 1 retries [Tue Sep 07 10:20:22.628 2010] [18806:46962404156768] [debug] ajp_send_request::jk_ajp_common.c (1579): (jboss51-integration1) all endpoints are disconnected. [Tue Sep 07 10:20:22.631 2010] [18806:46962404156768] [debug] ajp_send_request::jk_ajp_common.c (1639): (jboss51-integration1) request body to send 0 - request body to resend 0 [Tue Sep 07 10:20:22.637 2010] [18806:46962404156768] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 00305F 2E 6A 62 6F 73 73 35 31 2D 69 6E 74 65 67 72 - _.jboss51-integr [Tue Sep 07 10:20:22.638 2010] [18806:46962404156768] [debug] ajp_unmarshal_response::jk_ajp_common.c (723): Header[0] [Set-Cookie] = [JSESSIONID=cM2w+W7dIfDERfQpzO6BDw__.jboss51-integration1; Path=/xx] [Tue Sep 07 10:20:22.640 2010] [18806:46962404156768] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 01605F 2E 6A 62 6F 73 73 35 31 2D 69 6E 74 65 67 72 - _.jboss51-integr .. [Tue Sep 07 10:20:22.651 2010] [18806:46962404156768] [debug] ajp_reset_endpoint::jk_ajp_common.c (757): (jboss51-integration1) resetting endpoint with sd = 154 [Tue Sep 07 10:20:22.651 2010] [18806:46962404156768] [debug] ajp_done::jk_ajp_common.c (3010): recycling connection pool slot=0 for worker jboss51-integration1 [Tue Sep 07 10:20:22.651 2010] [18806:46962404156768] [debug] jk_handler::mod_jk.c (2602): Service finished with status=200 for worker=lb-jboss51-integration To confirm the settings: worker.basic.connection_pool_timeout=90 You don't usually want to set a pool_size. "1" would be appropriate when using the prefork Apache MPM, but in that case mod_jk automatically sets it to "1". worker.basic.connection_pool_size=1 worker.basic.socket_keepalive=1 I don't like "socket_timeout" ... worker.basic.socket_timeout=90 but I would like socket_connect_timeout. The next two are possibly a bit short, because if the backend e.g. does a Java Garbage Collection which miht take longer than 1 second, tose timeouts will fire and take the node out of load balancing. worker.basic.connect_timeout=1000 worker.basic.prepost_timeout=1000 You might want to add max_reply_timeouts, otherwise one single reply timeout can take a node out of load balancing. worker.basic.reply_timeout=9 # Stop recoery attempts when JBoss instances do not respond. worker.basic.retries=1 worker.basic.recovery_options=27 worker.jboss51-integration1.reference=worker.basic worker.jboss51-integration1.host=xx worker.jboss51-integration1.port=13802 worker.jboss51-integration1.type=ajp13 worker.jbo
Re: 2 second delays in mod_jk while "maintaining workers"
TM, On Tuesday 07 September 2010 09:47:17 you wrote: > If you have enough resources, try to disable > connectionTimeout on AJP connector and see weather > the same will happen again. I've removed the connectionTimeout attribute and the problem persists - I can see the "lingering bytes in 2 sec" messages as before. John - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
Rainer, Is this acceptable? I am using a tail and an egrep to match the various statements you wish to see. if it's missing anything, plesae let me know what to add to the grep. [Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] find_match::jk_uri_worker_map.c (850): Attempting to map context URI '/*=lb-jboss51-integration' source 'JkMount' [Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] find_match::jk_uri_worker_map.c (863): Found a wildchar match '/*=lb-jboss51-integration' [Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] jk_handler::mod_jk.c (2462): Into handler jakarta-servlet worker=lb-jboss51-integration r->proxyreq=0 [Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] wc_get_worker_for_name::jk_worker.c (116): found a worker lb-jboss51-integration [Tue Sep 07 10:20:20.618 2010] [18806:46962404156768] [debug] jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 154 [Tue Sep 07 10:20:22.619 2010] [18806:46962404156768] [debug] jk_shutdown_socket::jk_connect.c (813): Shutdown socket 154 and read 0 lingering bytes in 2 sec. [Tue Sep 07 10:20:22.625 2010] [18806:46962404156768] [debug] wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration [Tue Sep 07 10:20:22.628 2010] [18806:46962404156768] [debug] get_most_suitable_worker::jk_lb_worker.c (1001): found best worker jboss51-integration1 (jboss51-integration1) using method 'Request' [Tue Sep 07 10:20:22.628 2010] [18806:46962404156768] [debug] service::jk_lb_worker.c (1161): service worker=jboss51-integration1 route=jboss51-integration1 [Tue Sep 07 10:20:22.628 2010] [18806:46962404156768] [debug] ajp_service::jk_ajp_common.c (2376): processing jboss51-integration1 with 1 retries [Tue Sep 07 10:20:22.628 2010] [18806:46962404156768] [debug] ajp_send_request::jk_ajp_common.c (1579): (jboss51-integration1) all endpoints are disconnected. [Tue Sep 07 10:20:22.631 2010] [18806:46962404156768] [debug] ajp_send_request::jk_ajp_common.c (1639): (jboss51-integration1) request body to send 0 - request body to resend 0 [Tue Sep 07 10:20:22.637 2010] [18806:46962404156768] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 00305F 2E 6A 62 6F 73 73 35 31 2D 69 6E 74 65 67 72 - _.jboss51-integr [Tue Sep 07 10:20:22.638 2010] [18806:46962404156768] [debug] ajp_unmarshal_response::jk_ajp_common.c (723): Header[0] [Set-Cookie] = [JSESSIONID=cM2w+W7dIfDERfQpzO6BDw__.jboss51-integration1; Path=/xx] [Tue Sep 07 10:20:22.640 2010] [18806:46962404156768] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 01605F 2E 6A 62 6F 73 73 35 31 2D 69 6E 74 65 67 72 - _.jboss51-integr .. [Tue Sep 07 10:20:22.651 2010] [18806:46962404156768] [debug] ajp_reset_endpoint::jk_ajp_common.c (757): (jboss51-integration1) resetting endpoint with sd = 154 [Tue Sep 07 10:20:22.651 2010] [18806:46962404156768] [debug] ajp_done::jk_ajp_common.c (3010): recycling connection pool slot=0 for worker jboss51-integration1 [Tue Sep 07 10:20:22.651 2010] [18806:46962404156768] [debug] jk_handler::mod_jk.c (2602): Service finished with status=200 for worker=lb-jboss51-integration To confirm the settings: worker.basic.connection_pool_timeout=90 worker.basic.connection_pool_size=1 worker.basic.socket_keepalive=1 worker.basic.socket_timeout=90 worker.basic.connect_timeout=1000 worker.basic.prepost_timeout=1000 worker.basic.reply_timeout=9 # Stop recoery attempts when JBoss instances do not respond. worker.basic.retries=1 worker.basic.recovery_options=27 worker.jboss51-integration1.reference=worker.basic worker.jboss51-integration1.host=xx worker.jboss51-integration1.port=13802 worker.jboss51-integration1.type=ajp13 worker.jboss51-integration1.lbfactor=1 worker.jboss51-integration2.reference=worker.basic worker.jboss51-integration2.host=yy worker.jboss51-integration2.port=13802 worker.jboss51-integration2.type=ajp13 worker.jboss51-integration2.lbfactor=1 worker.lb-jboss51-integration.type=lb worker.lb-jboss51-integration.balance_workers=jboss51-integration1,jboss51-integration2 worker.lb-jboss51-integration.sticky_session=0 worker.list=lb-jboss51-integration Also, I've tried setting: worker.basic.lock=0 but that does not solve the problem. Thanks John - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On 09/07/2010 10:11 AM, John Baker wrote: On Tuesday 07 September 2010 08:59:27 you wrote: It means that socket shutdown failed. Do you have firewall between mod_jk and JBoss or some non-standard network driver (e.g running under some VM)? We are using VMs but there should be no firewall. I should probably re-iterate that the problem is occassional. See Rainer's comment on howto provide a useful log section. Hard to tell what's actually going underneath. If you have enough resources, try to disable connectionTimeout on AJP connector and see weather the same will happen again. Shooting blanks here cause transient errors are pretty hard to debug. It might be that the AJP connection is closed by JBoss side by connectionTimeout just at the time when maintain is about to do the same on the mod_jk side. Eliminating connectionTimeout would share some more light. Regards -- ^TM - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On Tuesday 07 September 2010 08:59:27 you wrote: > It means that socket shutdown failed. > Do you have firewall between mod_jk and JBoss or some non-standard > network driver (e.g running under some VM)? We are using VMs but there should be no firewall. I should probably re-iterate that the problem is occassional. > This can cause network FIN packet not getting transmitted over it > since we issued socket shutdown, but the other side didn't react on > that message. > > What kind of connector is on the JBoss side. Java or APR? APR? It's a standard JBoss 5.1.2 installation. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On 07.09.2010 09:05, John Baker wrote: On Tuesday 07 September 2010 06:28:33 you wrote: On 09/06/2010 11:59 PM, John Baker wrote: What's jk_maintain? Function that maintains the workers (closes excess connections inactive for a long time) Anyhow, like Rainer said, if that's the case you should have something like this in the log (DEBUG) ... Shutdown socket 123 and read 123 lingering bytes in 2 sec. Is this what we're looking for? There's a lot of it in the logs during the period in which I switched on 'trace'. [Mon Sep 06 15:05:18.128 2010] [24061:1404667232] [debug] jk_shutdown_socket::jk_connect.c (732): Shutdown socket 158 and rea 0 lingering bytes If so, what does this mean and how do I solve the problem? You gave us the following log lines: [Mon Sep 06 15:05:14.128 2010] [24061:1404667232] [debug] wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration [Mon Sep 06 15:05:14.241 2010] [23945:1404667232] [debug] wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration [Mon Sep 06 15:05:16.644 2010] [23944:1404667232] [debug] wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration [Mon Sep 06 15:05:18.131 2010] [24061:1404667232] [debug] init_ws_service::mod_jk.c (977): Service protocol=HTTP/1.1 method=GET ssl=false host=(null) addr=10.117.112.1 name=xxx port=80 auth=(null) user=(null) laddr=10.117.112.1 raddr=10.117.112.1 uri=/index.jsp [Mon Sep 06 15:05:18.131 2010] [24061:1404667232] [debug] get_most_suitable_worker::jk_lb_worker.c (997): found best worker jboss51-integration2 (jboss51-integration2) using method 'Request' Those are written by three different processes and threads (see the IDS [PID:TID] behind the timestamp) and thus three different requests. Sorting by request: [Mon Sep 06 15:05:14.128 2010] [24061:1404667232] [debug] wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration [Mon Sep 06 15:05:18.131 2010] [24061:1404667232] [debug] init_ws_service::mod_jk.c (977): Service protocol=HTTP/1.1 method=GET ssl=false host=(null) addr=10.117.112.1 name=xxx port=80 auth=(null) user=(null) laddr=10.117.112.1 raddr=10.117.112.1 uri=/index.jsp [Mon Sep 06 15:05:18.131 2010] [24061:1404667232] [debug] get_most_suitable_worker::jk_lb_worker.c (997): found best worker jboss51-integration2 (jboss51-integration2) using method 'Request' [Mon Sep 06 15:05:14.241 2010] [23945:1404667232] [debug] wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration [Mon Sep 06 15:05:16.644 2010] [23944:1404667232] [debug] wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration So the first block shows something, the other two blocks only consist of one line, so we can not make any statements about durations. In the first block, there is a 4 seconds gap between the start of the worker maintenance and the start of the service. This is the gap you are referring to? The log message Mladen cited is written in line 813 (mod_jk version 1.2.30) of jk_connect.c with level DEBUG: "Shutdown socket ... and read ... lingering bytes in ... sec." Before the socket closing, there should also be messages like "About to shutdown socket ...". If those are not present, then the reason for the time gap is something else. Note that version 1.2.30 also ncludes the duration of the socket shutdown procedure, so it's better to use that version. If you can reproduce with trace log level (very high log volume!), then one log block from "Maintaining" until "Service protocol=" would be helpful. Regards, Rainer - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On 09/07/2010 09:05 AM, John Baker wrote: On Tuesday 07 September 2010 06:28:33 you wrote: On 09/06/2010 11:59 PM, John Baker wrote: What's jk_maintain? Function that maintains the workers (closes excess connections inactive for a long time) Anyhow, like Rainer said, if that's the case you should have something like this in the log (DEBUG) ... Shutdown socket 123 and read 123 lingering bytes in 2 sec. Is this what we're looking for? There's a lot of it in the logs during the period in which I switched on 'trace'. [Mon Sep 06 15:05:18.128 2010] [24061:1404667232] [debug] jk_shutdown_socket::jk_connect.c (732): Shutdown socket 158 and rea 0 lingering bytes If so, what does this mean and how do I solve the problem? It means that socket shutdown failed. Do you have firewall between mod_jk and JBoss or some non-standard network driver (e.g running under some VM)? This can cause network FIN packet not getting transmitted over it since we issued socket shutdown, but the other side didn't react on that message. What kind of connector is on the JBoss side. Java or APR? Regards -- ^TM - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On Tuesday 07 September 2010 06:28:33 you wrote: > On 09/06/2010 11:59 PM, John Baker wrote: > > What's jk_maintain? > > > > Function that maintains the workers > (closes excess connections inactive for a long time) > > Anyhow, like Rainer said, if that's the case > you should have something like this in the log (DEBUG) > > ... Shutdown socket 123 and read 123 lingering bytes in 2 sec. Is this what we're looking for? There's a lot of it in the logs during the period in which I switched on 'trace'. [Mon Sep 06 15:05:18.128 2010] [24061:1404667232] [debug] jk_shutdown_socket::jk_connect.c (732): Shutdown socket 158 and rea 0 lingering bytes If so, what does this mean and how do I solve the problem? - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On 09/06/2010 11:59 PM, John Baker wrote: On Monday 06 September 2010 22:57:21 you wrote: I didn't look at the code now, but the 2 seconds remind me of the connection draining during socket shutdown, which could be related to jk_maintain? What's jk_maintain? Function that maintains the workers (closes excess connections inactive for a long time) Anyhow, like Rainer said, if that's the case you should have something like this in the log (DEBUG) ... Shutdown socket 123 and read 123 lingering bytes in 2 sec. Regards -- ^TM - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On Monday 06 September 2010 22:57:21 you wrote: > I didn't look at the code now, but the 2 seconds remind me of the > connection draining during socket shutdown, which could be related to > jk_maintain? What's jk_maintain? - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On 06.09.2010 19:56, Mladen Turk wrote: On 09/06/2010 04:16 PM, John Baker wrote: I've set the Jk logging to trace and you can see the debug statements and the 2s delays: Do you use NFS share by any chance to store the mod_jk log directory data? I didn't look at the code now, but the 2 seconds remind me of the connection draining during socket shutdown, which could be related to jk_maintain? Regards, Rainer - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On Monday 06 September 2010 18:56:20 you wrote: > On 09/06/2010 04:16 PM, John Baker wrote: > > > > I've set the Jk logging to trace and you can see the debug statements and > > the 2s delays: > > > > Do you use NFS share by any chance to store the > mod_jk log directory data? Nope. All log files are written to local disc, and they are very quick discs.. > What happens if you set > worker.lb-jboss51-integration.lock=O I'll get back to you and let you know. By the way, there's a lot of activity on these Apache servers - do you suspect a file lock timeout? - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: 2 second delays in mod_jk while "maintaining workers"
On 09/06/2010 04:16 PM, John Baker wrote: I've set the Jk logging to trace and you can see the debug statements and the 2s delays: Do you use NFS share by any chance to store the mod_jk log directory data? What happens if you set worker.lb-jboss51-integration.lock=O Regards -- ^TM - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org