Ian, This is a nice follow-up. Could you attach it directly to the bug report in Bugzilla, so the commentary is all in one place?
https://bz.apache.org/bugzilla/show_bug.cgi?id=57943 Thanks, -chris On 7/2/15 6:29 AM, Ian Luo wrote: > Hi Mark, > > I am the colleague of Sun Qi who reported bug57943 > <https://bz.apache.org/bugzilla/show_bug.cgi?id=57943> one month ago. I > believe we managed to find out why ConcurrentModificationException happens > in our scenario. > > Here’s the scenario, our application uses async servlet to write the > response back to the client. The operation we put in async servlet which > run in another thread is quite time-consuming. When the incoming traffic is > heavy, which usually happens when the instances of tomcat server reboots, > we hit this issue very easily. > > When the ClientPoller thread run into timeout method (in NioEndpoint) and > execute the following code: > > } else if (!ka.isAsync() || ka.getTimeout() > 0) { > // Async requests with a timeout of 0 or less never timeout > long delta = now - ka.getLastAccess(); > long timeout = > (ka.getTimeout()==-1)?((long)socketProperties.getSoTimeout()):(ka.getTimeout()); > boolean isTimedout = delta > timeout; > if (isTimedout) { > // Prevent subsequent timeouts if the timeout event takes a > while to process > ka.access(Long.MAX_VALUE); > processSocket(ka.getChannel(), SocketStatus.TIMEOUT, true, 6); > }} > > Eventually it calls processSocket(ka.getChannel()) and puts back the > NioChannel instance to the cache nioChannels for the late re-use: > > if (ka!=null) ka.setComet(false); > socket.getPoller().cancelledKey(key, SocketStatus.ERROR, false);if > (running && !paused) { > nioChannels.offer(socket);} > > Unfortunately in our scenarios, when QPS is greater than 1000req/sec, async > servlet threads have very large chance to start async-complete and begin to > call processSocket (in Http11NioProcessor) at the same time: > > } else if (actionCode == ActionCode.ASYNC_COMPLETE) { > if (asyncStateMachine.asyncComplete()) { > ((NioEndpoint)endpoint).processSocket(socketWrapper.getSocket(), > SocketStatus.OPEN_READ, true, 10);} > > This leads to cache pollution in nioChannels since there’re chances to > offer the same object into the cache multiple times. When the Acceptor > thread polls the object and hands it over to the ClientPollers, the same > object could be pass into the different ClientPoller threads, then leads to > very serious problem. > > You may notice the code I show here is the code from tomcat 7.0.54, and > claim the similar issue has already been addressed, but this is not true. > We noticed the similar issue was reported on Bug57340 > <https://bz.apache.org/bugzilla/show_bug.cgi?id=57340>, and the solution > has already been checked into the later release. > > But this solution is not a complete solution, in our scenario, there’s > chance to offer the duplicated object to the cache in else if clause where > the solution for Bug57340 doesn’t cover: > > } else if (handshake == -1 ) { > if (key != null) { > socket.getPoller().cancelledKey(key, SocketStatus.DISCONNECT, false); > } > nioChannels.offer(socket); > socket = null; > if ( ka!=null ) keyCache.offer(ka); > ka = null;} > > This is easily to understand. When handshake equals -1, one possibility is > the key passed into the doRun method is null, and when key is null, it > means some other thread has already finished processing on the same socket: > > public void run() { > SelectionKey key = socket.getIOChannel().keyFor( > socket.getPoller().getSelector()); > > We propose the change below in order to address this issue throughly. In > this case, I think we can simply drop the object instead of offering it > since it looks there’s no other ideal way to not use lock. > > } else if (handshake == -1 ) { > if (key != null) { > if (socket.getPoller().cancelledKey(key, > SocketStatus.DISCONNECT, false) != null) { > nioChannels.offer(socket); > if (ka != null) keyCache.offer(ka); > } > } > socket = null; > ka = null; > > Best Regards, > -Ian. > > On Fri, May 22, 2015 at 9:29 PM, <bugzi...@apache.org> wrote: > >> https://bz.apache.org/bugzilla/show_bug.cgi?id=57943 >> >> --- Comment #1 from Mark Thomas <ma...@apache.org> --- >> This is the third report we have received about this issue. >> >> The other two are: >> http://markmail.org/message/xgxblpk4v4ykgi5y >> http://markmail.org/message/eypk42i6gdpztkpy >> >> I have looked through the NIO Poller code again and I still can't see how >> this >> can be happening. I've also dug into the JRE code and I can't see any >> issues >> there either (although I'll not this has been reported on a variety of >> JVMs and >> I haven't checked the specific ones mentions). >> >> I think the best thing to do at this point is catch the error, log a >> warning >> and maybe ask folks to report it if they can repeat it. >> >> -- >> You are receiving this mail because: >> You are the assignee for the bug. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org >> For additional commands, e-mail: dev-h...@tomcat.apache.org >> >> >
signature.asc
Description: OpenPGP digital signature