Ian,

This is a nice follow-up. Could you attach it directly to the bug report
in Bugzilla, so the commentary is all in one place?

https://bz.apache.org/bugzilla/show_bug.cgi?id=57943

Thanks,
-chris

On 7/2/15 6:29 AM, Ian Luo wrote:
> Hi Mark,
> 
> I am the colleague of Sun Qi who reported bug57943
> <https://bz.apache.org/bugzilla/show_bug.cgi?id=57943> one month ago. I
> believe we managed to find out why ConcurrentModificationException happens
> in our scenario.
> 
> Here’s the scenario, our application uses async servlet to write the
> response back to the client. The operation we put in async servlet which
> run in another thread is quite time-consuming. When the incoming traffic is
> heavy, which usually happens when the instances of tomcat server reboots,
> we hit this issue very easily.
> 
> When the ClientPoller thread run into timeout method (in NioEndpoint) and
> execute the following code:
> 
> } else if (!ka.isAsync() || ka.getTimeout() > 0) {
>     // Async requests with a timeout of 0 or less never timeout
>     long delta = now - ka.getLastAccess();
>     long timeout =
> (ka.getTimeout()==-1)?((long)socketProperties.getSoTimeout()):(ka.getTimeout());
>     boolean isTimedout = delta > timeout;
>     if (isTimedout) {
>         // Prevent subsequent timeouts if the timeout event takes a
> while to process
>         ka.access(Long.MAX_VALUE);
>         processSocket(ka.getChannel(), SocketStatus.TIMEOUT, true, 6);
>     }}
> 
> Eventually it calls processSocket(ka.getChannel()) and puts back the
> NioChannel instance to the cache nioChannels for the late re-use:
> 
> if (ka!=null) ka.setComet(false);
> socket.getPoller().cancelledKey(key, SocketStatus.ERROR, false);if
> (running && !paused) {
>     nioChannels.offer(socket);}
> 
> Unfortunately in our scenarios, when QPS is greater than 1000req/sec, async
> servlet threads have very large chance to start async-complete and begin to
> call processSocket (in Http11NioProcessor) at the same time:
> 
> } else if (actionCode == ActionCode.ASYNC_COMPLETE) {
>     if (asyncStateMachine.asyncComplete()) {
>     ((NioEndpoint)endpoint).processSocket(socketWrapper.getSocket(),
> SocketStatus.OPEN_READ, true, 10);}
> 
> This leads to cache pollution in nioChannels since there’re chances to
> offer the same object into the cache multiple times. When the Acceptor
> thread polls the object and hands it over to the ClientPollers, the same
> object could be pass into the different ClientPoller threads, then leads to
> very serious problem.
> 
> You may notice the code I show here is the code from tomcat 7.0.54, and
> claim the similar issue has already been addressed, but this is not true.
> We noticed the similar issue was reported on Bug57340
> <https://bz.apache.org/bugzilla/show_bug.cgi?id=57340>, and the solution
> has already been checked into the later release.
> 
> But this solution is not a complete solution, in our scenario, there’s
> chance to offer the duplicated object to the cache in else if clause where
> the solution for Bug57340 doesn’t cover:
> 
> } else if (handshake == -1 ) {
>     if (key != null) {
>         socket.getPoller().cancelledKey(key, SocketStatus.DISCONNECT, false);
>     }
>     nioChannels.offer(socket);
>     socket = null;
>     if ( ka!=null ) keyCache.offer(ka);
>     ka = null;}
> 
> This is easily to understand. When handshake equals -1, one possibility is
> the key passed into the doRun method is null, and when key is null, it
> means some other thread has already finished processing on the same socket:
> 
> public void run() {
>     SelectionKey key = socket.getIOChannel().keyFor(
>         socket.getPoller().getSelector());
> 
> We propose the change below in order to address this issue throughly. In
> this case, I think we can simply drop the object instead of offering it
> since it looks there’s no other ideal way to not use lock.
> 
> } else if (handshake == -1 ) {
>     if (key != null) {
>         if (socket.getPoller().cancelledKey(key,
> SocketStatus.DISCONNECT, false) != null) {
>             nioChannels.offer(socket);
>             if (ka != null) keyCache.offer(ka);
>         }
>     }
>     socket = null;
>     ka = null;
> 
> Best Regards,
> -Ian.
> 
> On Fri, May 22, 2015 at 9:29 PM, <bugzi...@apache.org> wrote:
> 
>> https://bz.apache.org/bugzilla/show_bug.cgi?id=57943
>>
>> --- Comment #1 from Mark Thomas <ma...@apache.org> ---
>> This is the third report we have received about this issue.
>>
>> The other two are:
>> http://markmail.org/message/xgxblpk4v4ykgi5y
>> http://markmail.org/message/eypk42i6gdpztkpy
>>
>> I have looked through the NIO Poller code again and I still can't see how
>> this
>> can be happening. I've also dug into the JRE code and I can't see any
>> issues
>> there either (although I'll not this has been reported on a variety of
>> JVMs and
>> I haven't checked the specific ones mentions).
>>
>> I think the best thing to do at this point is catch the error, log a
>> warning
>> and maybe ask folks to report it if they can repeat it.
>>
>> --
>> You are receiving this mail because:
>> You are the assignee for the bug.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
>> For additional commands, e-mail: dev-h...@tomcat.apache.org
>>
>>
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to