Re: 2 second delays in mod_jk while maintaining workers

2010-09-15 Thread John Baker
Hello,

I've spent a week looking into mod_jk, Apache, etc., and the problem appears to 
be due to a bug in the AJP connector within JBoss:

https://jira.jboss.org/browse/JBPAPP-366

This bug is not fixed in the latest release of JBoss 4.2.3, which is rather 
disappointing.  After downloading the jbossweb-2.0.1-GA source from subversion, 
patching, recompiling with Java 1.5 (if you want it to work on the Java 1.5 and 
1.6 versions of JBoss 4.2.3), the problem disappears.

While solving this problem, I identified the need for a debug statement to be 
generated from mod_jk when the poll(..) fails, i.e. waited 2 seconds and 
discovered the socket was not being shut properly.  Mladen has kindly added 
this to the next release.

Thanks to everyone who's contributed and if I have any further updates, I will 
let you know.


John

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-09 Thread John Baker
Thanks for the feedback.

Can you tell me why this if statement exists:

if (poll(fds, 1, timeout)  0) 
   {
   ...
   }
   else 
  break;

It appears to be at fault.


John

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-09 Thread Mladen Turk

On 09/09/2010 02:09 PM, John Baker wrote:

Thanks for the feedback.

Can you tell me why this if statement exists:

 if (poll(fds, 1, timeout)  0)
{
...
}
else
   break;

It appears to be at fault.



poll returns positive number if there is some data to
read on the socket.
It returns 0 in case of timeout (two seconds in this case).


Try to change
fds.events = POLLIN;
to
fds.events = POLLIN | POLLHUP | POLLERR;


If that doesn't help, it's obvious the Tomcat
doesn't close the socket, so should be investigated
why. Like said before, either the Tomcat doesn't
respond to shutdown or the shutdown's FIN packet
isn't send to the Tomcat or back to mod_jk,
due to some weird OS networking issues.


Regards
--
^TM

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-09 Thread Jim Jagielski

On Sep 9, 2010, at 8:34 AM, Mladen Turk wrote:

 On 09/09/2010 02:09 PM, John Baker wrote:
 Thanks for the feedback.
 
 Can you tell me why this if statement exists:
 
 if (poll(fds, 1, timeout)  0)
{
...
}
else
   break;
 
 It appears to be at fault.
 
 
 poll returns positive number if there is some data to
 read on the socket.
 It returns 0 in case of timeout (two seconds in this case).
 
 
 Try to change
 fds.events = POLLIN;
 to
 fds.events = POLLIN | POLLHUP | POLLERR;
 
 
 If that doesn't help, it's obvious the Tomcat
 doesn't close the socket, so should be investigated
 why. Like said before, either the Tomcat doesn't
 respond to shutdown or the shutdown's FIN packet
 isn't send to the Tomcat or back to mod_jk,
 due to some weird OS networking issues.
 

+1


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-09 Thread John Baker
On Thursday 09 September 2010 13:59:50 you wrote:
  
  If that doesn't help, it's obvious the Tomcat
  doesn't close the socket, so should be investigated
  why. Like said before, either the Tomcat doesn't
  respond to shutdown or the shutdown's FIN packet
  isn't send to the Tomcat or back to mod_jk,
  due to some weird OS networking issues.
  

I think there's been rather a lot of speculation on this problem :-)

The change to fds.events appears to be working!  Thanks.

I'm somewhat surprised this hasn't been picked up anytime before?  We can't be 
the only people deploying hundreds of worker threads/JBoss instances, and there 
are no firewalls between mod_jk/JBoss.  I guess it's possible the AJP thread on 
one sever was 'stuck'.

Do you fancy putting that change into the next release of mod_jk?

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-09 Thread Mladen Turk

On 09/09/2010 03:22 PM, John Baker wrote:

On Thursday 09 September 2010 13:59:50 you wrote:


If that doesn't help, it's obvious the Tomcat
doesn't close the socket, so should be investigated
why. Like said before, either the Tomcat doesn't
respond to shutdown or the shutdown's FIN packet
isn't send to the Tomcat or back to mod_jk,
due to some weird OS networking issues.



I think there's been rather a lot of speculation on this problem :-)

The change to fds.events appears to be working!  Thanks.



Nice.


I'm somewhat surprised this hasn't been picked up anytime before?  We can't be 
the only people deploying hundreds of worker threads/JBoss instances, and there 
are no firewalls between mod_jk/JBoss.  I guess it's possible the AJP thread on 
one sever was 'stuck'.



It depends on the OS networking layer thought.
POLLIN should be just enough to fire the poll if the socket was closed.


Do you fancy putting that change into the next release of mod_jk?



Sure.
I'll have to make some faulty AJP connector in Tomcat
that will close the socket after some timeout,
and see if that event wasn't propagated from our
socket.


Regards
--
^TM

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-09 Thread Mladen Turk

On 09/09/2010 03:22 PM, John Baker wrote:


Do you fancy putting that change into the next release of mod_jk?



BTW, do you fancy opening BZ report fill in what we
tried so far and the actual solution, so we can track that down
for the next releases and patches?

Regards
--
^TM

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-09 Thread John Baker


On Thursday 09 September 2010 15:45:44 you wrote:
 Nice.

I spoke too soon.  I'm now trying to figure out how to print out the IP address 
of the socket (I don't really do C) so I can log the socket that caused the 
poll to timeout, and compare with a tcpdump.
 

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-09 Thread John Baker
I would be happy to share all my evidence and write a report once we get to the 
bottom of this problem.

Any hints on printing out the socket IP (i.e. of Tomcat)?

On Thursday 09 September 2010 15:47:33 you wrote:
 On 09/09/2010 03:22 PM, John Baker wrote:
 
  Do you fancy putting that change into the next release of mod_jk?
 
 
 BTW, do you fancy opening BZ report fill in what we
 tried so far and the actual solution, so we can track that down
 for the next releases and patches?
 
 Regards

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-09 Thread John Baker
Interestingly, some of our JBoss instances are showing a large number ajp 
threads that seem to be in keep alive mode but are well beyond the 
connectionTimeout defined in server.xml (which is set to 9):

Max threads: 40 Current thread count: 40 Current thread busy: 40
Max processing time: 5563 ms Processing time: 2552.538 s Request count: 75522 
Error count: 6 Bytes received: 0.00 MB Bytes sent: 879.42 MB
Stage   TimeB Sent  B Recv  Client  VHost   Request
K   508896 ms   ?   ?   10.113.168.4?   ?
K   93386 ms?   ?   10.113.168.4?   ?
K   7147 ms ?   ?   10.113.168.4?   ?
K   3994 ms ?   ?   10.113.168.4?   ?
K   660 ms  ?   ?   10.113.168.4?   ?
K   735822 ms   ?   ?   10.113.168.4?   ?
K   71565 ms?   ?   10.113.168.4?   ?

Almost as if connectionTimeout is being ignored in some cases - this is JBoss 
4.2.3, so an oldish version of Tomcat.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-09 Thread Mladen Turk

On 09/09/2010 04:48 PM, John Baker wrote:



On Thursday 09 September 2010 15:45:44 you wrote:

Nice.


I spoke too soon.  I'm now trying to figure out how to print out the IP address 
of the socket (I don't really do C) so I can log the socket that caused the 
poll to timeout, and compare with a tcpdump.



Try with getsockname, but it's not a trivial task.

Perhaps using /proc/pid/fd
and matching the socket# we have with the actual info
eg.
32 - socket:[port]



Regards
--
^TM

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-09 Thread John Baker


On Thursday 09 September 2010 16:08:04 you wrote:
 On 09/09/2010 04:48 PM, John Baker wrote:
 
 
  On Thursday 09 September 2010 15:45:44 you wrote:
  Nice.
 
  I spoke too soon.  I'm now trying to figure out how to print out the IP 
  address of the socket (I don't really do C) so I can log the socket that 
  caused the poll to timeout, and compare with a tcpdump.
 
 
 Try with getsockname, but it's not a trivial task.
 
 Perhaps using /proc/pid/fd
 and matching the socket# we have with the actual info
 eg.
 32 - socket:[port]

Isn't the address stored with jk_sock_t so I can use jk_dump_hinfo, as I can 
see is in the open socket method?

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: 2 second delays in mod_jk while maintaining workers

2010-09-09 Thread Caldarale, Charles R
 From: John Baker [mailto:jba...@javasystemsolutions.com] 
 Subject: Re: 2 second delays in mod_jk while maintaining workers

 some of our JBoss instances are showing a large number 
 ajp threads that seem to be in keep alive mode but are
 well beyond the connectionTimeout defined in server.xml

My understanding is that connectionTimeout only applies to the period between 
the connection being established and the arrival of the *first* request.  
Tomcat 6.0 has a keepAliveTimeout attribute, but older Tomcats do not, and I 
don't know what they default to.

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-08 Thread John Baker
Hello

We've just noticed that the maintenance mode operates on all workers, so
having one worker run maintenance for the rest is making tracing the
problem difficult.  Reading down the logs, we can see it finds a worker
and iterates through workers perforing maintenance.

Is this correct?

Moving on, consider the following debug:

[Wed Sep 08 10:36:16.907 2010] [2205:47701852022112] [trace]
ajp_abort_endpoint::jk_ajp_common.c (739): enter
[Wed Sep 08 10:36:16.907 2010] [2205:47701852022112] [trace]
jk_shutdown_socket::jk_connect.c (713): enter
[Wed Sep 08 10:36:16.907 2010] [2205:47701852022112] [debug]
jk_shutdown_socket::jk_connect.c (722): About to shutdown socket

The next entry in the log with [2205:47701852022112] is the following:

[Wed Sep 08 10:36:18.907 2010] [2205:47701852022112] [trace]
jk_close_socket::jk_connect.c (650): enter
[Wed Sep 08 10:36:18.908 2010] [2205:47701852022112] [trace]
jk_close_socket::jk_connect.c (665): exit
[Wed Sep 08 10:36:18.908 2010] [2205:47701852022112] [debug]
jk_shutdown_socket::jk_connect.c (813): Shutdown socket 156 and read 0
lingering bytes in 2 sec.
[Wed Sep 08 10:36:18.908 2010] [2205:47701852022112] [trace]
jk_shutdown_socket::jk_connect.c (817): exit

Looking at the code in jk_connect.c:

int jk_shutdown_socket(jk_sock_t sd, jk_logger_t *l)
{
...
if (JK_IS_DEBUG_LEVEL(l))
jk_log(l, JK_LOG_DEBUG, About to shutdown socket %d, sd);

/* Shut down the socket for write, which will send a FIN
 * to the peer.
 */
if (shutdown(sd, SHUT_WR)) {
rc = jk_close_socket(sd, l);


And the jk_close_socket function:

int jk_close_socket(jk_sock_t sd, jk_logger_t *l)
{
int rc;
int save_errno;

JK_TRACE_ENTER(l);

Tieing up with the logs suggests the shutdown(sd, SHUT_WR) method is
blocking.

Any thoughts,


John









-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-08 Thread John Baker
Hello,

I don't thikn the shutdown call is to blame - I think it's the large pile
of code below.  i.e. everything below the if (shutdown(..)) below.  The
question is, what does it all do and does it actually work?  It appears to
be the 'drain' code, but given it often results in this message:

Shutdown socket .. and read 0 lingering bytes in 2 sec.

then it would suggest it doesn't work.  The comments also suggest the 2
seconds is guess work.

I've sent SECONDS_TO_LINGER and MAX_SECS_TO_LINGER to 0 and the problem
goes away, and I've seen this discussed elsewhere on the forum.

So, if this code isn't needed, why is it still there?  And if it is
needed, why does it seem not to work and what is the correct figure for
SECONDS_TO_LINGER?  If this is being performed inline then I'd suggest 2s
is far too long - clients don't want to wait 2s for a request to complete
(ideally, socket clean up should not be done in this thread if any linger
is required).


John


..
MAX_SECS_TO_LINGER= 30
SECONDS_TO_LINGER= 2
...
int jk_shutdown_socket(jk_sock_t sd, jk_logger_t *l)
{
...

/* Shut down the socket for write, which will send a FIN
 * to the peer.
 */
if (shutdown(sd, SHUT_WR)) {
rc = jk_close_socket(sd, l);
if (JK_IS_DEBUG_LEVEL(l))
jk_log(l, JK_LOG_DEBUG, Failed sending SHUT_WR for socket
%d, sd);
errno = save_errno;
JK_TRACE_EXIT(l);
return rc;
}

do {
#ifdef HAVE_POLL
struct pollfd fds;

fds.fd = sd;
fds.events = POLLIN;
#else
fd_set rs;

FD_ZERO(rs);
/* Read all data from the peer until we reach end-of-file
 * (FIN from peer) or we've exceeded our overall timeout. If the
 * backend does not send us bytes within 2 seconds
 * (a value pulled from Apache 1.3 which seems to work well),
 * close the connection.
 */
FD_SET(sd, rs);
tv.tv_sec  = timeout / 1000;
tv.tv_usec = (timeout % 1000) * 1000;
#endif
rp = 0;
#ifdef HAVE_POLL
if (poll(fds, 1, timeout)  0)
#else
if (select((int)sd + 1, rs, NULL, NULL, tv)  0)
#endif
{
do {
#if defined(WIN32) || (defined(NETWARE)  defined(__NOVELL_LIBC__))
rc = recv(sd, dummy[0], sizeof(dummy), 0);
if (JK_IS_SOCKET_ERROR(rc))
JK_GET_SOCKET_ERRNO();
#else
rc = read(sd, dummy[0], sizeof(dummy));
#endif
if (rc  0)
rp += rc;
} while (JK_IS_SOCKET_ERROR(rc)  (errno == EINTR || errno ==
EAGAIN));

if (rc  0)
break;
}
else
break;
rd += rp;
if (rp  sizeof(dummy)) {
if (timeout  SECONDS_TO_LINGER) {
/* Try once again with 1000 times smaller timeout
 * In our case 2 msec.
 */
timeout = SECONDS_TO_LINGER;
continue;
}
/* We have read less then size of buffer
 * It's a good chance there will be no more data
 * to read.
 */
if ((rc = sononblock(sd))) {
rc = jk_close_socket(sd, l);
if (JK_IS_DEBUG_LEVEL(l))
jk_log(l, JK_LOG_DEBUG,
   error setting socket %d to nonblocking, sd);
errno = save_errno;
JK_TRACE_EXIT(l);
return rc;
}
if (JK_IS_DEBUG_LEVEL(l))
jk_log(l, JK_LOG_DEBUG,
   shutting down the read side of socket %d, sd);
shutdown(sd, SHUT_RD);
break;
}
timeout = SECONDS_TO_LINGER * 1000;
} while (difftime(time(NULL), start)  MAX_SECS_TO_LINGER);

rc = jk_close_socket(sd, l);
if (JK_IS_DEBUG_LEVEL(l))
jk_log(l, JK_LOG_DEBUG,
   Shutdown socket %d and read %d lingering bytes in %d sec.,
   sd, rd, (int)difftime(time(NULL), start));
errno = save_errno;
JK_TRACE_EXIT(l);
return rc;
}



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-08 Thread Mladen Turk

On 09/08/2010 05:08 PM, John Baker wrote:

Hello,

I don't thikn the shutdown call is to blame - I think it's the large pile
of code below.  i.e. everything below the if (shutdown(..)) below.  The
question is, what does it all do and does it actually work?  It appears to
be the 'drain' code, but given it often results in this message:

Shutdown socket .. and read 0 lingering bytes in 2 sec.

then it would suggest it doesn't work.  The comments also suggest the 2
seconds is guess work.

I've sent SECONDS_TO_LINGER and MAX_SECS_TO_LINGER to 0 and the problem
goes away, and I've seen this discussed elsewhere on the forum.

So, if this code isn't needed, why is it still there?  And if it is
needed, why does it seem not to work and what is the correct figure for
SECONDS_TO_LINGER?  If this is being performed inline then I'd suggest 2s
is far too long - clients don't want to wait 2s for a request to complete
(ideally, socket clean up should not be done in this thread if any linger
is required).



The code *is* required.
It is used when the client disconnects while the backend
still has some data in the AJP buffer. Drain is needed
to read that excess data.

If you can compile mod_jk and test, try adding some debug messages inside
the poll loop. Anyhow, seems to me there is some sort of network
problem you are facing.
Have you been able to see the traffic using Wireshark?


Regards
--
^TM

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-08 Thread Timothy Grocott
please can you remove me off the mailing list

many thanks

On Wed, Sep 8, 2010 at 4:29 PM, Mladen Turk mt...@apache.org wrote:

 On 09/08/2010 05:08 PM, John Baker wrote:

 Hello,

 I don't thikn the shutdown call is to blame - I think it's the large pile
 of code below.  i.e. everything below the if (shutdown(..)) below.  The
 question is, what does it all do and does it actually work?  It appears to
 be the 'drain' code, but given it often results in this message:

 Shutdown socket .. and read 0 lingering bytes in 2 sec.

 then it would suggest it doesn't work.  The comments also suggest the 2
 seconds is guess work.

 I've sent SECONDS_TO_LINGER and MAX_SECS_TO_LINGER to 0 and the problem
 goes away, and I've seen this discussed elsewhere on the forum.

 So, if this code isn't needed, why is it still there?  And if it is
 needed, why does it seem not to work and what is the correct figure for
 SECONDS_TO_LINGER?  If this is being performed inline then I'd suggest 2s
 is far too long - clients don't want to wait 2s for a request to complete
 (ideally, socket clean up should not be done in this thread if any linger
 is required).


 The code *is* required.
 It is used when the client disconnects while the backend
 still has some data in the AJP buffer. Drain is needed
 to read that excess data.

 If you can compile mod_jk and test, try adding some debug messages inside
 the poll loop. Anyhow, seems to me there is some sort of network
 problem you are facing.
 Have you been able to see the traffic using Wireshark?


 Regards
 --
 ^TM


 -
 To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
 For additional commands, e-mail: users-h...@tomcat.apache.org




Re: 2 second delays in mod_jk while maintaining workers

2010-09-08 Thread John Baker
 On 09/08/2010 05:08 PM, John Baker wrote:

 The code *is* required.
 It is used when the client disconnects while the backend
 still has some data in the AJP buffer. Drain is needed
 to read that excess data.

Why does it always report 0 bytes read?

 If you can compile mod_jk and test, try adding some debug messages inside
 the poll loop. Anyhow, seems to me there is some sort of network
 problem you are facing.

That's next on my list.

 Have you been able to see the traffic using Wireshark?

Not yet.


 Regards
 --
 ^TM

 -
 To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
 For additional commands, e-mail: users-h...@tomcat.apache.org





-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-08 Thread John Baker
I think it would be helpful if you could walk us through the code. 

The shutdown function provides three error codes - the jk function doesn't 
check them, and the most likely is 'socket closed' (ie jboss closed it?), hence 
I'm unsure what the drain code is doing. 
--Original Message--
From: John Baker
To: Tomcat Users List
ReplyTo: Tomcat Users List
Subject: Re: 2 second delays in mod_jk while maintaining workers
Sent: 8 Sep 2010 16:41

 On 09/08/2010 05:08 PM, John Baker wrote:

 The code *is* required.
 It is used when the client disconnects while the backend
 still has some data in the AJP buffer. Drain is needed
 to read that excess data.

Why does it always report 0 bytes read?

 If you can compile mod_jk and test, try adding some debug messages inside
 the poll loop. Anyhow, seems to me there is some sort of network
 problem you are facing.

That's next on my list.

 Have you been able to see the traffic using Wireshark?

Not yet.


 Regards
 --
 ^TM

 -
 To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
 For additional commands, e-mail: users-h...@tomcat.apache.org





-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org




Re: 2 second delays in mod_jk while maintaining workers

2010-09-08 Thread Mladen Turk

On 09/08/2010 05:51 PM, John Baker wrote:

I think it would be helpful if you could walk us through the code.



It is very simple actually.

1. shutdown write end of our connection to Tomcat
   This should cause the soket.read() in Tomcat
   to throw exception which is used to close the socket
2. use poll with 2 second timeout for POLLIN event
   poll() will return immediately if either the
   socket is closed by Tomcat or there is a data
   to read in which case read to a dummy buffer (drain)
   and wait for more data until the poll returns
   error or read returns error
3. close the socket

So the reason why the poll breaks after 2 seconds
is the timeout which shouldn't happen because
we informed the Tomcat the our side of connection
is closed. So it should eventually (after some latency
depending on the CPU throttle close his side).

Now, since this works most of the times, it's either
that poll loop is running but the read returns 0

A simple debug message after
rc = read(sd, dummy[0], sizeof(dummy));
eg.
jk_log(l, JK_LOG_DEBUG, readed %d bytes from socked %d, rc, sd);

Would help eliminate that situation.
The fix would be to add
if (rc == 0) {
   rc = -1;
   break;
}

Also you can add
jk_log(l, JK_LOG_DEBUG, socked %d shutdown write side, sd);
just before that 'do {' so we can have the info how long
the shutdown call took.

It would be helpful  if you can apply that code and
recompile.


Regards
--
^TM

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-07 Thread John Baker
On Tuesday 07 September 2010 06:28:33 you wrote:
 On 09/06/2010 11:59 PM, John Baker wrote:
  What's jk_maintain?
 
 
 Function that maintains the workers
 (closes excess connections inactive for a long time)
 
 Anyhow, like Rainer said, if that's the case
 you should have something like this in the log (DEBUG)
 
 ... Shutdown socket 123 and read 123 lingering bytes in 2 sec.

Is this what we're looking for?  There's a lot of it in the logs during the 
period in which I switched on 'trace'.

[Mon Sep 06 15:05:18.128 2010] [24061:1404667232] [debug] 
jk_shutdown_socket::jk_connect.c (732): Shutdown socket 158 and rea  0 
lingering bytes

If so, what does this mean and how do I solve the problem?

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-07 Thread Mladen Turk

On 09/07/2010 09:05 AM, John Baker wrote:

On Tuesday 07 September 2010 06:28:33 you wrote:

On 09/06/2010 11:59 PM, John Baker wrote:

What's jk_maintain?



Function that maintains the workers
(closes excess connections inactive for a long time)

Anyhow, like Rainer said, if that's the case
you should have something like this in the log (DEBUG)

... Shutdown socket 123 and read 123 lingering bytes in 2 sec.


Is this what we're looking for?  There's a lot of it in the logs during the 
period in which I switched on 'trace'.

[Mon Sep 06 15:05:18.128 2010] [24061:1404667232] [debug] 
jk_shutdown_socket::jk_connect.c (732): Shutdown socket 158 and rea  0 
lingering bytes

If so, what does this mean and how do I solve the problem?



It means that socket shutdown failed.
Do you have firewall between mod_jk and JBoss or some non-standard
network driver (e.g running under some VM)?
This can cause network FIN packet not getting transmitted over it
since we issued socket shutdown, but the other side didn't react on
that message.

What kind of connector is on the JBoss side. Java or APR?


Regards
--
^TM

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-07 Thread Rainer Jung

On 07.09.2010 09:05, John Baker wrote:

On Tuesday 07 September 2010 06:28:33 you wrote:

On 09/06/2010 11:59 PM, John Baker wrote:

What's jk_maintain?



Function that maintains the workers
(closes excess connections inactive for a long time)

Anyhow, like Rainer said, if that's the case
you should have something like this in the log (DEBUG)

... Shutdown socket 123 and read 123 lingering bytes in 2 sec.


Is this what we're looking for?  There's a lot of it in the logs during the 
period in which I switched on 'trace'.

[Mon Sep 06 15:05:18.128 2010] [24061:1404667232] [debug] 
jk_shutdown_socket::jk_connect.c (732): Shutdown socket 158 and rea  0 
lingering bytes

If so, what does this mean and how do I solve the problem?


You gave us the following log lines:

[Mon Sep 06 15:05:14.128 2010] [24061:1404667232] [debug]
wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration
[Mon Sep 06 15:05:14.241 2010] [23945:1404667232] [debug]
wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration
[Mon Sep 06 15:05:16.644 2010] [23944:1404667232] [debug]
wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration
[Mon Sep 06 15:05:18.131 2010] [24061:1404667232] [debug]
init_ws_service::mod_jk.c (977): Service protocol=HTTP/1.1 method=GET
ssl=false host=(null) addr=10.117.112.1 name=xxx port=80 auth=(null)
user=(null) laddr=10.117.112.1 raddr=10.117.112.1 uri=/index.jsp
[Mon Sep 06 15:05:18.131 2010] [24061:1404667232] [debug]
get_most_suitable_worker::jk_lb_worker.c (997): found best worker
jboss51-integration2 (jboss51-integration2) using method 'Request'

Those are written by three different processes and threads (see the IDS 
[PID:TID] behind the timestamp) and thus three different requests. 
Sorting by request:


[Mon Sep 06 15:05:14.128 2010] [24061:1404667232] [debug]
wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration
[Mon Sep 06 15:05:18.131 2010] [24061:1404667232] [debug]
init_ws_service::mod_jk.c (977): Service protocol=HTTP/1.1 method=GET
ssl=false host=(null) addr=10.117.112.1 name=xxx port=80 auth=(null)
user=(null) laddr=10.117.112.1 raddr=10.117.112.1 uri=/index.jsp
[Mon Sep 06 15:05:18.131 2010] [24061:1404667232] [debug]
get_most_suitable_worker::jk_lb_worker.c (997): found best worker
jboss51-integration2 (jboss51-integration2) using method 'Request'

[Mon Sep 06 15:05:14.241 2010] [23945:1404667232] [debug]
wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration

[Mon Sep 06 15:05:16.644 2010] [23944:1404667232] [debug]
wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration

So the first block shows something, the other two blocks only consist of 
one line, so we can not make any statements about durations.


In the first block, there is a 4 seconds gap between the start of the 
worker maintenance and the start of the service. This is the gap you are 
referring to?


The log message Mladen cited is written in line 813 (mod_jk version 
1.2.30) of jk_connect.c with level DEBUG: Shutdown socket ... and read 
... lingering bytes in ... sec. Before the socket closing, there should 
also be messages like About to shutdown socket  If those are not 
present, then the reason for the time gap is something else. Note that 
version 1.2.30 also ncludes the duration of the socket shutdown 
procedure, so it's better to use that version.


If you can reproduce with trace log level (very high log volume!), then 
one log block from Maintaining until Service protocol= would be helpful.


Regards,

Rainer

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-07 Thread John Baker
On Tuesday 07 September 2010 08:59:27 you wrote:
 It means that socket shutdown failed.
 Do you have firewall between mod_jk and JBoss or some non-standard
 network driver (e.g running under some VM)?

We are using VMs but there should be no firewall.  I should probably re-iterate 
that the problem is occassional.

 This can cause network FIN packet not getting transmitted over it
 since we issued socket shutdown, but the other side didn't react on
 that message.
 
 What kind of connector is on the JBoss side. Java or APR?

APR?  It's a standard JBoss 5.1.2 installation.


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-07 Thread Mladen Turk

On 09/07/2010 10:11 AM, John Baker wrote:

On Tuesday 07 September 2010 08:59:27 you wrote:

It means that socket shutdown failed.
Do you have firewall between mod_jk and JBoss or some non-standard
network driver (e.g running under some VM)?


We are using VMs but there should be no firewall.  I should probably re-iterate 
that the problem is occassional.



See Rainer's comment on howto provide a useful log section.
Hard to tell what's actually going underneath.

If you have enough resources, try to disable
connectionTimeout on AJP connector and see weather
the same will happen again.
Shooting blanks here cause transient errors are pretty
hard to debug. It might be that the AJP connection is
closed by JBoss side by connectionTimeout just at the
time when maintain is about to do the same on the
mod_jk side. Eliminating connectionTimeout would
share some more light.


Regards
--
^TM

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-07 Thread John Baker
Rainer,  

Is this acceptable?  I am using a tail and an egrep to match the various 
statements you wish to see.  if it's missing anything, plesae let me know what 
to add to the grep.

[Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] 
find_match::jk_uri_worker_map.c (850): Attempting to map context URI 
'/*=lb-jboss51-integration' source 'JkMount'
[Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] 
find_match::jk_uri_worker_map.c (863): Found a wildchar match 
'/*=lb-jboss51-integration'
[Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] 
jk_handler::mod_jk.c (2462): Into handler jakarta-servlet 
worker=lb-jboss51-integration r-proxyreq=0
[Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] 
wc_get_worker_for_name::jk_worker.c (116): found a worker lb-jboss51-integration
[Tue Sep 07 10:20:20.618 2010] [18806:46962404156768] [debug] 
jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 154
[Tue Sep 07 10:20:22.619 2010] [18806:46962404156768] [debug] 
jk_shutdown_socket::jk_connect.c (813): Shutdown socket 154 and read 0 
lingering bytes in 2 sec.
[Tue Sep 07 10:20:22.625 2010] [18806:46962404156768] [debug] 
wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration
[Tue Sep 07 10:20:22.628 2010] [18806:46962404156768] [debug] 
get_most_suitable_worker::jk_lb_worker.c (1001): found best worker 
jboss51-integration1 (jboss51-integration1) using method 'Request'
[Tue Sep 07 10:20:22.628 2010] [18806:46962404156768] [debug] 
service::jk_lb_worker.c (1161): service worker=jboss51-integration1 
route=jboss51-integration1
[Tue Sep 07 10:20:22.628 2010] [18806:46962404156768] [debug] 
ajp_service::jk_ajp_common.c (2376): processing jboss51-integration1 with 1 
retries
[Tue Sep 07 10:20:22.628 2010] [18806:46962404156768] [debug] 
ajp_send_request::jk_ajp_common.c (1579): (jboss51-integration1) all endpoints 
are disconnected.
[Tue Sep 07 10:20:22.631 2010] [18806:46962404156768] [debug] 
ajp_send_request::jk_ajp_common.c (1639): (jboss51-integration1) request body 
to send 0 - request body to resend 0
[Tue Sep 07 10:20:22.637 2010] [18806:46962404156768] [debug] 
ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 00305F 2E 6A 62 6F 
73 73 35 31 2D 69 6E 74 65 67 72  - _.jboss51-integr
[Tue Sep 07 10:20:22.638 2010] [18806:46962404156768] [debug] 
ajp_unmarshal_response::jk_ajp_common.c (723): Header[0] [Set-Cookie] = 
[JSESSIONID=cM2w+W7dIfDERfQpzO6BDw__.jboss51-integration1; Path=/xx]
[Tue Sep 07 10:20:22.640 2010] [18806:46962404156768] [debug] 
ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 01605F 2E 6A 62 6F 
73 73 35 31 2D 69 6E 74 65 67 72  - _.jboss51-integr
..
[Tue Sep 07 10:20:22.651 2010] [18806:46962404156768] [debug] 
ajp_reset_endpoint::jk_ajp_common.c (757): (jboss51-integration1) resetting 
endpoint with sd = 154
[Tue Sep 07 10:20:22.651 2010] [18806:46962404156768] [debug] 
ajp_done::jk_ajp_common.c (3010): recycling connection pool slot=0 for worker 
jboss51-integration1
[Tue Sep 07 10:20:22.651 2010] [18806:46962404156768] [debug] 
jk_handler::mod_jk.c (2602): Service finished with status=200 for 
worker=lb-jboss51-integration

To confirm the settings:

worker.basic.connection_pool_timeout=90
worker.basic.connection_pool_size=1
worker.basic.socket_keepalive=1
worker.basic.socket_timeout=90
worker.basic.connect_timeout=1000
worker.basic.prepost_timeout=1000
worker.basic.reply_timeout=9
# Stop recoery attempts when JBoss instances do not respond.
worker.basic.retries=1
worker.basic.recovery_options=27

worker.jboss51-integration1.reference=worker.basic
worker.jboss51-integration1.host=xx
worker.jboss51-integration1.port=13802
worker.jboss51-integration1.type=ajp13
worker.jboss51-integration1.lbfactor=1
worker.jboss51-integration2.reference=worker.basic
worker.jboss51-integration2.host=yy
worker.jboss51-integration2.port=13802
worker.jboss51-integration2.type=ajp13
worker.jboss51-integration2.lbfactor=1
worker.lb-jboss51-integration.type=lb
worker.lb-jboss51-integration.balance_workers=jboss51-integration1,jboss51-integration2
worker.lb-jboss51-integration.sticky_session=0
worker.list=lb-jboss51-integration

Also, I've tried setting:

worker.basic.lock=0

but that does not solve the problem.

Thanks 


John

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-07 Thread John Baker
TM,

On Tuesday 07 September 2010 09:47:17 you wrote:
 If you have enough resources, try to disable
 connectionTimeout on AJP connector and see weather
 the same will happen again.

I've removed the connectionTimeout attribute and the problem persists - I can 
see the lingering bytes in 2 sec messages as before.


John

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-07 Thread Rainer Jung

On 07.09.2010 11:36, John Baker wrote:

Rainer,

Is this acceptable?  I am using a tail and an egrep to match the various 
statements you wish to see.  if it's missing anything, plesae let me know what 
to add to the grep.


Yes, that's better!


[Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] 
find_match::jk_uri_worker_map.c (850): Attempting to map context URI 
'/*=lb-jboss51-integration' source 'JkMount'
[Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] 
find_match::jk_uri_worker_map.c (863): Found a wildchar match 
'/*=lb-jboss51-integration'
[Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] jk_handler::mod_jk.c 
(2462): Into handler jakarta-servlet worker=lb-jboss51-integration r-proxyreq=0
[Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] 
wc_get_worker_for_name::jk_worker.c (116): found a worker lb-jboss51-integration
[Tue Sep 07 10:20:20.618 2010] [18806:46962404156768] [debug] 
jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 154
[Tue Sep 07 10:20:22.619 2010] [18806:46962404156768] [debug] 
jk_shutdown_socket::jk_connect.c (813): Shutdown socket 154 and read 0 
lingering bytes in 2 sec.


Hmmm, that's strange. Why is there a connection close between 
wc_get_worker_for_name() and wc_maintain()? The first call doing 
something with the connections would be wc_maintain(). And if it decides 
to close sockets, it would log addiional statements. Nevertheless we can 
see, that closing the socket really took 2 seconds (which is in fact a 
builtin mod_jk timeout).


Are we sure, that those log lines are correct? This was generated by 
mod_jk 1.2.30 without any code changes, right?


Log level trace would show us more precisely, how we came to close the 
socket, but it blows up log volume a lot.



[Tue Sep 07 10:20:22.625 2010] [18806:46962404156768] [debug] 
wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration
[Tue Sep 07 10:20:22.628 2010] [18806:46962404156768] [debug] 
get_most_suitable_worker::jk_lb_worker.c (1001): found best worker 
jboss51-integration1 (jboss51-integration1) using method 'Request'
[Tue Sep 07 10:20:22.628 2010] [18806:46962404156768] [debug] 
service::jk_lb_worker.c (1161): service worker=jboss51-integration1 
route=jboss51-integration1
[Tue Sep 07 10:20:22.628 2010] [18806:46962404156768] [debug] 
ajp_service::jk_ajp_common.c (2376): processing jboss51-integration1 with 1 
retries
[Tue Sep 07 10:20:22.628 2010] [18806:46962404156768] [debug] 
ajp_send_request::jk_ajp_common.c (1579): (jboss51-integration1) all endpoints 
are disconnected.
[Tue Sep 07 10:20:22.631 2010] [18806:46962404156768] [debug] 
ajp_send_request::jk_ajp_common.c (1639): (jboss51-integration1) request body 
to send 0 - request body to resend 0
[Tue Sep 07 10:20:22.637 2010] [18806:46962404156768] [debug] 
ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 00305F 2E 6A 62 6F 
73 73 35 31 2D 69 6E 74 65 67 72  - _.jboss51-integr
[Tue Sep 07 10:20:22.638 2010] [18806:46962404156768] [debug] 
ajp_unmarshal_response::jk_ajp_common.c (723): Header[0] [Set-Cookie] = 
[JSESSIONID=cM2w+W7dIfDERfQpzO6BDw__.jboss51-integration1; Path=/xx]
[Tue Sep 07 10:20:22.640 2010] [18806:46962404156768] [debug] 
ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 01605F 2E 6A 62 6F 
73 73 35 31 2D 69 6E 74 65 67 72  - _.jboss51-integr
..
[Tue Sep 07 10:20:22.651 2010] [18806:46962404156768] [debug] 
ajp_reset_endpoint::jk_ajp_common.c (757): (jboss51-integration1) resetting 
endpoint with sd = 154
[Tue Sep 07 10:20:22.651 2010] [18806:46962404156768] [debug] 
ajp_done::jk_ajp_common.c (3010): recycling connection pool slot=0 for worker 
jboss51-integration1
[Tue Sep 07 10:20:22.651 2010] [18806:46962404156768] [debug] 
jk_handler::mod_jk.c (2602): Service finished with status=200 for 
worker=lb-jboss51-integration

To confirm the settings:

worker.basic.connection_pool_timeout=90


You don't usually want to set a pool_size. 1 would be appropriate when 
using the prefork Apache MPM, but in that case mod_jk automatically sets 
it to 1.



worker.basic.connection_pool_size=1
worker.basic.socket_keepalive=1


I don't like socket_timeout ...


worker.basic.socket_timeout=90


but I would like socket_connect_timeout.

The next two are possibly a bit short, because if the backend e.g. does 
a Java Garbage Collection which miht take longer than 1 second, tose 
timeouts will fire and take the  node out of load balancing.



worker.basic.connect_timeout=1000
worker.basic.prepost_timeout=1000


You might want to add max_reply_timeouts, otherwise one single reply 
timeout can take a node out of load balancing.



worker.basic.reply_timeout=9
# Stop recoery attempts when JBoss instances do not respond.
worker.basic.retries=1
worker.basic.recovery_options=27

worker.jboss51-integration1.reference=worker.basic
worker.jboss51-integration1.host=xx
worker.jboss51-integration1.port=13802
worker.jboss51-integration1.type=ajp13

Re: 2 second delays in mod_jk while maintaining workers

2010-09-07 Thread Mladen Turk

On 09/07/2010 11:53 AM, John Baker wrote:

TM,

On Tuesday 07 September 2010 09:47:17 you wrote:

If you have enough resources, try to disable
connectionTimeout on AJP connector and see weather
the same will happen again.


I've removed the connectionTimeout attribute and the problem persists - I can see the 
lingering bytes in 2 sec messages as before.



OK. The next one would be the tcpdump during that
interval to see actually the status of the sockets
both on mod_jk and JBoss boxes.

It's obvious that

shutdown(socket, SHUT_WR)
poll(socket, 2 seconds)
close(socket)

caused poll call to time out, meaning that
the JBoss side didn't respond to the
shutdown(socket, SHUT_WR) call by
closing it's side of the connection.

BTW, do you have lingering bytes in 0 sec messages
in your log or lower then 2 sec?



Regards
--
^TM

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-07 Thread John Baker
Rainer,

On Tuesday 07 September 2010 11:09:46 you wrote:
  [Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] 
  find_match::jk_uri_worker_map.c (850): Attempting to map context URI 
  '/*=lb-jboss51-integration' source 'JkMount'
  [Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] 
  find_match::jk_uri_worker_map.c (863): Found a wildchar match 
  '/*=lb-jboss51-integration'
  [Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] 
  jk_handler::mod_jk.c (2462): Into handler jakarta-servlet 
  worker=lb-jboss51-integration r-proxyreq=0
  [Tue Sep 07 10:20:20.617 2010] [18806:46962404156768] [debug] 
  wc_get_worker_for_name::jk_worker.c (116): found a worker 
  lb-jboss51-integration
  [Tue Sep 07 10:20:20.618 2010] [18806:46962404156768] [debug] 
  jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 154
  [Tue Sep 07 10:20:22.619 2010] [18806:46962404156768] [debug] 
  jk_shutdown_socket::jk_connect.c (813): Shutdown socket 154 and read 0 
  lingering bytes in 2 sec.
 
 Hmmm, that's strange. Why is there a connection close between 
 wc_get_worker_for_name() and wc_maintain()? The first call doing 
 something with the connections would be wc_maintain(). And if it decides 
 to close sockets, it would log addiional statements. Nevertheless we can 
 see, that closing the socket really took 2 seconds (which is in fact a 
 builtin mod_jk timeout).

That's interesting, because I thought this bug:

https://issues.apache.org/bugzilla/show_bug.cgi?id=48169

Was addressed in 1.2.30 and this discusses a 2 second problem.

 Are we sure, that those log lines are correct? This was generated by 
 mod_jk 1.2.30 without any code changes, right?

I compiled it fresh from source without any change:

[Tue Sep 07 10:07:56.683 2010] [17424:46962404156768] [info] init_jk::mod_jk.c 
(3189): mod_jk/1.2.30 initialized

 Log level trace would show us more precisely, how we came to close the 
 socket, but it blows up log volume a lot.

I'm using trace and greping for whatever you need to see.  if you want to see 
more debug, tell me what keyword to grep (ideally, so I don't catch the 
hundreds of other lines of debug being generated by other worker threads).

  To confirm the settings:
 
  worker.basic.connection_pool_timeout=90
 
 You don't usually want to set a pool_size. 1 would be appropriate when 
 using the prefork Apache MPM, but in that case mod_jk automatically sets 
 it to 1.

I can remove it if you want - I only set it to 1 to ensure it was set (yes, I 
am using prefork).
 
  worker.basic.connection_pool_size=1
  worker.basic.socket_keepalive=1
 
 I don't like socket_timeout ...

Why?

  worker.basic.socket_timeout=90
 
 but I would like socket_connect_timeout.

I can change it.

 The next two are possibly a bit short, because if the backend e.g. does 
 a Java Garbage Collection which miht take longer than 1 second, tose 
 timeouts will fire and take the  node out of load balancing.
 
  worker.basic.connect_timeout=1000
  worker.basic.prepost_timeout=1000
 
 You might want to add max_reply_timeouts, otherwise one single reply 
 timeout can take a node out of load balancing.
 
  worker.basic.reply_timeout=9
  # Stop recoery attempts when JBoss instances do not respond.
  worker.basic.retries=1
  worker.basic.recovery_options=27
 Don't want sticky sessions?

Nope.  The app clusters.  Although it's set for other workers.


John

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-07 Thread John Baker


On Tuesday 07 September 2010 11:13:07 you wrote:
 It's obvious that
 
 shutdown(socket, SHUT_WR)
 poll(socket, 2 seconds)
 close(socket)
 
 caused poll call to time out, meaning that
 the JBoss side didn't respond to the
 shutdown(socket, SHUT_WR) call by
 closing it's side of the connection.
 
 BTW, do you have lingering bytes in 0 sec messages
 in your log or lower then 2 sec?

Yes.  There's plenty of lingering 0 sec.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-07 Thread Mladen Turk

On 09/07/2010 12:09 PM, Rainer Jung wrote:


Are we sure, that those log lines are correct? This was generated by mod_jk 
1.2.30 without any code changes, right?



The other option would be to set JkWatchdogInterval 60
In that case wc_maintain would happen in separate thread
without observable request delay.
However the JK_LOCK would still block the connection.
Think we should fix that by just invalidating the
sockets, and then closing them in the watchdog thread. WDYT?



Regards
--
^TM

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-07 Thread John Baker
Rainer,

On Tuesday 07 September 2010 11:09:46 you wrote:
 I don't like socket_timeout ...
 
  worker.basic.socket_timeout=90
 
 but I would like socket_connect_timeout.
 
 The next two are possibly a bit short, because if the backend e.g. does 
 a Java Garbage Collection which miht take longer than 1 second, tose 
 timeouts will fire and take the  node out of load balancing.
 
  worker.basic.connect_timeout=1000
  worker.basic.prepost_timeout=1000

I've change socket_timeout=90 to socket_connect_timeout=9000, and 
connect_timeout/prepost_timeout to 5000, and the problem persists:

[Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug] 
jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 152
[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug] 
jk_shutdown_socket::jk_connect.c (813): Shutdown socket 152 and read 0 
lingering bytes in 2 sec.
[Tue Sep 07 11:29:24.868 2010] [364:47921619424608] [debug] 
wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration


John

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-07 Thread Rainer Jung

On 07.09.2010 12:29, Mladen Turk wrote:

On 09/07/2010 12:09 PM, Rainer Jung wrote:


Are we sure, that those log lines are correct? This was generated by
mod_jk 1.2.30 without any code changes, right?



The other option would be to set JkWatchdogInterval 60
In that case wc_maintain would happen in separate thread
without observable request delay.
However the JK_LOCK would still block the connection.
Think we should fix that by just invalidating the
sockets, and then closing them in the watchdog thread. WDYT?


I'm going to answer his other mail. Since he has trace log, I'm still 
interested to understand, why the socket close was logged *before* the 
maintain. After this is understood, moving the whole thing to the 
maintain could be a good solution (and not holding the lock while 
shutting down the socket).


Regards,

Rainer

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-07 Thread Mladen Turk

On 09/07/2010 12:20 PM, John Baker wrote:



On Tuesday 07 September 2010 11:13:07 you wrote:

It's obvious that

shutdown(socket, SHUT_WR)
poll(socket, 2 seconds)
close(socket)

caused poll call to time out, meaning that
the JBoss side didn't respond to the
shutdown(socket, SHUT_WR) call by
closing it's side of the connection.

BTW, do you have lingering bytes in 0 sec messages
in your log or lower then 2 sec?


Yes.  There's plenty of lingering 0 sec.



Nice, it means that for most of the times the maintain
works as it should. JBoss closes the socket and
poll returns immediately cause of closed socket.

You can use Wireshark and see what's actually going
on in those cases.


Regards
--
^TM

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-07 Thread Rainer Jung

On 07.09.2010 12:32, John Baker wrote:

Rainer,

On Tuesday 07 September 2010 11:09:46 you wrote:

I don't like socket_timeout ...


worker.basic.socket_timeout=90


but I would like socket_connect_timeout.

The next two are possibly a bit short, because if the backend e.g. does
a Java Garbage Collection which miht take longer than 1 second, tose
timeouts will fire and take the  node out of load balancing.


worker.basic.connect_timeout=1000
worker.basic.prepost_timeout=1000


I've change socket_timeout=90 to socket_connect_timeout=9000, and 
connect_timeout/prepost_timeout to 5000, and the problem persists:

[Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug] 
jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 152
[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug] 
jk_shutdown_socket::jk_connect.c (813): Shutdown socket 152 and read 0 
lingering bytes in 2 sec.
[Tue Sep 07 11:29:24.868 2010] [364:47921619424608] [debug] 
wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration


OK, so it would be helpful to see a more complete log, e.g.

grep 364:47921619424608 LOGFILE | egrep -e '11:29:2[012345]'

If there's data in it which is not meant for public posting, you can 
also send it to me va direct mail.


Regards,

Rainer

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-07 Thread Rainer Jung

On 07.09.2010 12:40, Rainer Jung wrote:

On 07.09.2010 12:32, John Baker wrote:

Rainer,

On Tuesday 07 September 2010 11:09:46 you wrote:

I don't like socket_timeout ...


worker.basic.socket_timeout=90


but I would like socket_connect_timeout.

The next two are possibly a bit short, because if the backend e.g. does
a Java Garbage Collection which miht take longer than 1 second, tose
timeouts will fire and take the node out of load balancing.


worker.basic.connect_timeout=1000
worker.basic.prepost_timeout=1000


I've change socket_timeout=90 to socket_connect_timeout=9000, and
connect_timeout/prepost_timeout to 5000, and the problem persists:

[Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug]
jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 152
[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug]
jk_shutdown_socket::jk_connect.c (813): Shutdown socket 152 and read 0
lingering bytes in 2 sec.
[Tue Sep 07 11:29:24.868 2010] [364:47921619424608] [debug]
wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration


OK, so it would be helpful to see a more complete log, e.g.

grep 364:47921619424608 LOGFILE | egrep -e '11:29:2[012345]'

If there's data in it which is not meant for public posting, you can
also send it to me va direct mail.


Thanks for the log snippets. I can now see:

- the socket close *does* happen during maintain
- this wasn't obvious from the previous log snippert, because more 
Maintain lines for other workers were left out


Some relevant lines are:

[Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug] 
wc_maintain::jk_worker.c (339): Maintaining worker lb-...

...
[Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug] 
ajp_reset_endpoint::jk_ajp_common.c (757): (worker1) resetting endpoint 
with sd = 152 (socket shutdown)

...
[Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug] 
jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 152

...
[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug] 
jk_shutdown_socket::jk_connect.c (813): Shutdown socket 152 and read 0 
lingering bytes in 2 sec.

...
[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug] 
ajp_maintain::jk_ajp_common.c (3190): cleaning pool slot=0 elapsed 185 in 2
[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug] 
ajp_maintain::jk_ajp_common.c (3197): reached pool min size 1 from 1 
cache slots
[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug] 
ajp_maintain::jk_ajp_common.c (3237): recycled 1 sockets in 2 seconds 
from 1 pool slots

...

So now we can discuss, how to fix this.

- Can we detect more reliably, whether we want to drain before closing 
or not


- As a workaround: using a JkWatchdog moves the maintain into a separate 
thread. But during the socket closing a lock is held, which blocks other 
threads from accessing the same worker.


We would need to rewrite the logic in ajp_maintain() to only hold the 
lock (critical section) while pulling sockets out of the ep_cache or 
entering them back, not while actually using the sockets.


I still don't know, *why* the socket close blocks, i.e. why the backend 
doesn't respond to us wanting to close the socket. Here a network sniff 
like indicated by Mladen would help.


Regards,

Rainer

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-07 Thread John Baker
Where can I find documentation on JkWatchdog?


On Tuesday 07 September 2010 13:51:23 you wrote:
 - As a workaround: using a JkWatchdog moves the maintain into a separate 
 thread. But during the socket closing a lock is held, which blocks other 
 threads from accessing the same worker.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-07 Thread Rainer Jung

On 07.09.2010 19:46, John Baker wrote:

Where can I find documentation on JkWatchdog?


General documntation on mod_jk:

http://tomcat.apache.org/connectors-doc/

JK directives: http://tomcat.apache.org/connectors-doc/reference/apache.html

workers.properties: 
http://tomcat.apache.org/connectors-doc/reference/workers.html


Timeouts: 
http://tomcat.apache.org/connectors-doc/generic_howto/timeouts.html


Regards,

Rainer

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-06 Thread Mladen Turk

On 09/06/2010 04:16 PM, John Baker wrote:


I've set the Jk logging to trace and you can see the debug statements and
the 2s delays:



Do you use NFS share by any chance to store the
mod_jk log directory data?

What happens if you set
worker.lb-jboss51-integration.lock=O


Regards
--
^TM

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-06 Thread John Baker


On Monday 06 September 2010 18:56:20 you wrote:
 On 09/06/2010 04:16 PM, John Baker wrote:
 
  I've set the Jk logging to trace and you can see the debug statements and
  the 2s delays:
 
 
 Do you use NFS share by any chance to store the
 mod_jk log directory data?

Nope.  All log files are written to local disc, and they are very quick discs..

 What happens if you set
 worker.lb-jboss51-integration.lock=O

I'll get back to you and let you know.

By the way, there's a lot of activity on these Apache servers - do you suspect 
a file lock timeout?

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-06 Thread Rainer Jung

On 06.09.2010 19:56, Mladen Turk wrote:

On 09/06/2010 04:16 PM, John Baker wrote:


I've set the Jk logging to trace and you can see the debug statements and
the 2s delays:



Do you use NFS share by any chance to store the
mod_jk log directory data?


I didn't look at the code now, but the 2 seconds remind me of the 
connection draining during socket shutdown, which could be related to 
jk_maintain?


Regards,

Rainer

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-06 Thread John Baker
On Monday 06 September 2010 22:57:21 you wrote:
 I didn't look at the code now, but the 2 seconds remind me of the 
 connection draining during socket shutdown, which could be related to 
 jk_maintain?

What's jk_maintain?

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 2 second delays in mod_jk while maintaining workers

2010-09-06 Thread Mladen Turk

On 09/06/2010 11:59 PM, John Baker wrote:

On Monday 06 September 2010 22:57:21 you wrote:

I didn't look at the code now, but the 2 seconds remind me of the
connection draining during socket shutdown, which could be related to
jk_maintain?


What's jk_maintain?



Function that maintains the workers
(closes excess connections inactive for a long time)

Anyhow, like Rainer said, if that's the case
you should have something like this in the log (DEBUG)

... Shutdown socket 123 and read 123 lingering bytes in 2 sec.



Regards
--
^TM

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org