[ 
https://issues.apache.org/jira/browse/DIRMINA-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmanuel Lecharny resolved DIRMINA-779.
---------------------------------------
    Resolution: Fixed

Using a lock instead of a synchronized (this) does not break the ssltest, and 
should provide the right protection against concurrent access.

Fixed with commit 56a6e58004ea4a6af5640f03d1f6796a073c5d62

> SSLHandler can re-order data that it reads
> ------------------------------------------
>
>                 Key: DIRMINA-779
>                 URL: https://issues.apache.org/jira/browse/DIRMINA-779
>             Project: MINA
>          Issue Type: Bug
>          Components: Filter
>    Affects Versions: 1.1.7, 2.0.0-M1, 2.0.0-M2, 2.0.0-M3, 2.0.0-M4, 2.0.0-M5, 
> 2.0.0-M6, 2.0.0-RC1
>            Reporter: Jason Resch
>             Fix For: 2.0.8
>
>         Attachments: ssl_reodering_fix.diff
>
>
> The code in question is the flushScheduledEvents() method in SSLHandler.java:
> {code}
>  public void flushScheduledEvents() {
>    // Fire events only when no lock is hold for this handler.
>    if (Thread.holdsLock(this)) {
>        return;
>    }
>    Event e;
>          // We need synchronization here inevitably because filterWrite can be
>    // called simultaneously and cause 'bad record MAC' integrity error.
>    synchronized (this) {
>        while ((e = filterWriteEventQueue.poll()) != null) {
>            e.nextFilter.filterWrite(session, (WriteRequest) e.data);
>        }
>    }
>    while ((e = messageReceivedEventQueue.poll()) != null) {
>        e.nextFilter.messageReceived(session, e.data);
>    }
>  }
> {code}
>  This method is called both by threads which handle writes, and threads that
>  handle reads.  Therefore, as the comments suggest, multiple threads may go
>  through this code simultaneously.  However, since there is no
>  synchronization around processing of the messageReceivedEventQueue, it is
>  possible that the received messages will be sent to the next filter out of
>  order, should there be more than one message in the queue and a context
>  switch happen at the wrong time.
>  The bug would manifest in our application as a failure of our protocol layer
>  to decode a message, we believe, due to a re-ordering.  It only occurred in
>  environments with a large amount of contention and network traffic and when
>  using TLS.  The fix I have tested was to move the closing brace of the
>  synchronized block to extend to cover both while loops.  I've attached a
>  patch representing that change.  Since making that change we have not
>  encountered the bug again after about 30 hours of testing and 1.5 TB of
>  traffic, whereas before the change we could reproduce it after a few
>  minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to