Hi all.

We've been doing some rather extreme JBoss/Tomcat stress-testing on both Windows and Linux, APR and non-APR. We played with maxThreads and acceptCount and tried to make sense of the various behaviors that we saw.

In a nutshell, we discovered that under Windows, the maximum size of the backLog on a listening socket is hard limited to 200 and there is nothing you can do to overcome this at the O/S level. This has some implications for high-load (peaky) situations where a flood of new connection requests hits the server all at once. (Let's put to one side the SYN flood DoS protection that the O/S might also have that might get in the way of things).

Now unfortunately, APR does not appear to help alleviate this problem...you can have thousands of open connections, all idling away with CPU at zero, and still have lots of new clients turned away at the door if there is a burst of them.

I dug into the AprEndpoint code to see how it worked. It would seem that when a new connection is accepted, it requires a thread from the worker thread pool to help process the new connection. This is the same pool of threads that is servicing requests from the existing connection pool. If there is a reasonable amount of activity with all the existing connections, contention for these threads will be very high, and the rate at which the new connection requests can be serviced is therefore quite low. Thus with a burst of new connection requests, the backlog queue fills quickly and (under Windows) the ones that don't make it to the queue get their connection unceremoniously and immediately reset. (Which is rather ugly in itself since it does not allow for tcp recovery attempts on the connection request over the next 20 secs or so, as happens under Linux).

So what I was wondering is whether the acceptor could adopt a slightly different strategy. Firstly, can it not use a separate (perhaps internal) small pool of threads for handling new connection requests, and in addition, hand them straight over to the poller without processing the first request that might be associated with it. [I think that the relatively new connector setting deferAccept might have some relevance here?] Basically the idea would be to simply get the new connections out of the backlog as quickly as possible and over to the poller to deal with. This would mean that the listen backlog is likely to be kept to an absolute minimum even in flooding situations. (Until of course, you hit the other limit of how many sockets the poller can handle).

Be interested in your thoughts.

Cheers,
MT


Reply via email to