Hi Mark, I am the colleague of Sun Qi who reported bug57943 <https://bz.apache.org/bugzilla/show_bug.cgi?id=57943> one month ago. I believe we managed to find out why ConcurrentModificationException happens in our scenario.
Here’s the scenario, our application uses async servlet to write the response back to the client. The operation we put in async servlet which run in another thread is quite time-consuming. When the incoming traffic is heavy, which usually happens when the instances of tomcat server reboots, we hit this issue very easily. When the ClientPoller thread run into timeout method (in NioEndpoint) and execute the following code: } else if (!ka.isAsync() || ka.getTimeout() > 0) { // Async requests with a timeout of 0 or less never timeout long delta = now - ka.getLastAccess(); long timeout = (ka.getTimeout()==-1)?((long)socketProperties.getSoTimeout()):(ka.getTimeout()); boolean isTimedout = delta > timeout; if (isTimedout) { // Prevent subsequent timeouts if the timeout event takes a while to process ka.access(Long.MAX_VALUE); processSocket(ka.getChannel(), SocketStatus.TIMEOUT, true, 6); }} Eventually it calls processSocket(ka.getChannel()) and puts back the NioChannel instance to the cache nioChannels for the late re-use: if (ka!=null) ka.setComet(false); socket.getPoller().cancelledKey(key, SocketStatus.ERROR, false);if (running && !paused) { nioChannels.offer(socket);} Unfortunately in our scenarios, when QPS is greater than 1000req/sec, async servlet threads have very large chance to start async-complete and begin to call processSocket (in Http11NioProcessor) at the same time: } else if (actionCode == ActionCode.ASYNC_COMPLETE) { if (asyncStateMachine.asyncComplete()) { ((NioEndpoint)endpoint).processSocket(socketWrapper.getSocket(), SocketStatus.OPEN_READ, true, 10);} This leads to cache pollution in nioChannels since there’re chances to offer the same object into the cache multiple times. When the Acceptor thread polls the object and hands it over to the ClientPollers, the same object could be pass into the different ClientPoller threads, then leads to very serious problem. You may notice the code I show here is the code from tomcat 7.0.54, and claim the similar issue has already been addressed, but this is not true. We noticed the similar issue was reported on Bug57340 <https://bz.apache.org/bugzilla/show_bug.cgi?id=57340>, and the solution has already been checked into the later release. But this solution is not a complete solution, in our scenario, there’s chance to offer the duplicated object to the cache in else if clause where the solution for Bug57340 doesn’t cover: } else if (handshake == -1 ) { if (key != null) { socket.getPoller().cancelledKey(key, SocketStatus.DISCONNECT, false); } nioChannels.offer(socket); socket = null; if ( ka!=null ) keyCache.offer(ka); ka = null;} This is easily to understand. When handshake equals -1, one possibility is the key passed into the doRun method is null, and when key is null, it means some other thread has already finished processing on the same socket: public void run() { SelectionKey key = socket.getIOChannel().keyFor( socket.getPoller().getSelector()); We propose the change below in order to address this issue throughly. In this case, I think we can simply drop the object instead of offering it since it looks there’s no other ideal way to not use lock. } else if (handshake == -1 ) { if (key != null) { if (socket.getPoller().cancelledKey(key, SocketStatus.DISCONNECT, false) != null) { nioChannels.offer(socket); if (ka != null) keyCache.offer(ka); } } socket = null; ka = null; Best Regards, -Ian. On Fri, May 22, 2015 at 9:29 PM, <bugzi...@apache.org> wrote: > https://bz.apache.org/bugzilla/show_bug.cgi?id=57943 > > --- Comment #1 from Mark Thomas <ma...@apache.org> --- > This is the third report we have received about this issue. > > The other two are: > http://markmail.org/message/xgxblpk4v4ykgi5y > http://markmail.org/message/eypk42i6gdpztkpy > > I have looked through the NIO Poller code again and I still can't see how > this > can be happening. I've also dug into the JRE code and I can't see any > issues > there either (although I'll not this has been reported on a variety of > JVMs and > I haven't checked the specific ones mentions). > > I think the best thing to do at this point is catch the error, log a > warning > and maybe ask folks to report it if they can repeat it. > > -- > You are receiving this mail because: > You are the assignee for the bug. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org > For additional commands, e-mail: dev-h...@tomcat.apache.org > >