Jason, Thanks for your contribution! Would you mind to open a JIRA bug report and attach the diff file to it, checking the ASF license check box?
Thanks a lot, Bernd On Sat, Apr 3, 2010 at 17:10, Jason Resch <[email protected]> wrote: > All, > > I think I've discovered a bug in Mina's TLS code. While I encountered this > bug in Mina 1.1.7 it seems the same problem code exists in Mina 2.0. The > code in question is the flushScheduledEvents() method in SSLHandler.java: > > public void flushScheduledEvents() { > // Fire events only when no lock is hold for this handler. > if (Thread.holdsLock(this)) { > return; > } > > Event e; > // We need synchronization here inevitably because filterWrite can > be > // called simultaneously and cause 'bad record MAC' integrity error. > synchronized (this) { > while ((e = filterWriteEventQueue.poll()) != null) { > e.nextFilter.filterWrite(session, (WriteRequest) e.data); > } > } > > while ((e = messageReceivedEventQueue.poll()) != null) { > e.nextFilter.messageReceived(session, e.data); > } > } > > This method is called both by threads which handle writes, and threads that > handle reads. Therefore, as the comments suggest, multiple threads may go > through this code simultaneously. However, since there is no > synchronization around processing of the messageReceivedEventQueue, it is > possible that the received messages will be sent to the next filter out of > order, should there be more than one message in the queue and a context > switch happen at the wrong time. > > The bug would manifest in our application as a failure of our protocol layer > to decode a message, we believe, due to a re-ordering. It only occurred in > environments with a large amount of contention and network traffic and when > using TLS. The fix I have tested was to move the closing brace of the > synchronized block to extend to cover both while loops. I've attached a > patch representing that change. Since making that change we have not > encountered the bug again after about 30 hours of testing and 1.5 TB of > traffic, whereas before the change we could reproduce it after a few > minutes. > > > Jason > >
