[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux

Emmanuel Lecharny (JIRA) Mon, 13 May 2019 15:07:18 -0700


    [ 
https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838898#comment-16838898
 ]


Emmanuel Lecharny commented on DIRMINA-1107:
--------------------------------------------

There are a few bad things done. In the {{SslFilter.messageReceived()}} method, 
we push any incoming message if the {{Ssl}} layer has not yet been installed 
fully :

{code:java}
        synchronized (sslHandler) {
            if (!isSslStarted(session) && sslHandler.isInboundDone()) {
                // The SSL session must be established first before we
                // can push data to the application. Store the incoming
                // data into a queue for a later processing
                sslHandler.scheduleMessageReceived(nextFilter, message);
            } else {
{code}

It does not make a lot of sense. There is absolutely no way the {{SslFilter}} 
{{messageReceived}} method could be called if the filter has not been set 
fully. The check done is for the case where the {{SSL}} layer has been closed, 
but in this very case, what's the point in passing the received message to the 
{{IoHandler}} ?

Otherwise, incoming messages are processed in the 
{{SslHandler.messageReceived}} method, and the message will be uncrypted, then 
pushed into the queue. The bizarre thing is that we do write encrypted data 
during this call :

{code:java}
    private void handleSslData(NextFilter nextFilter, SslHandler sslHandler) 
throws SSLException {
        ...
        // Write encrypted data to be written (if any)
        sslHandler.writeNetBuffer(nextFilter);

        // handle app. data read (if any)
        handleAppDataRead(nextFilter, sslHandler);
    }
{code}

I don't see any reason to do so at this point. Why should we write *anything* 
back to the remote peer when we are processing an incoming message ?

This part of the code needs a serious review, IMHO.

> SslHandler flushScheduledEvents race condition, redux
> -----------------------------------------------------
>
>                 Key: DIRMINA-1107
>                 URL: https://issues.apache.org/jira/browse/DIRMINA-1107
>             Project: MINA
>          Issue Type: Bug
>    Affects Versions: 2.1.2
>            Reporter: Guus der Kinderen
>            Priority: Major
>             Fix For: 2.1.3
>
>
> DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally 
> replaces it with another multithreading issue.
> The fix for DIRMINA-1019 introduces a counter that contains the number of 
> events to be processed. A simplified version of the code is included below.
> {code:java}
> private final AtomicInteger scheduledEvents = new AtomicInteger(0);
> void flushScheduledEvents() {
>     scheduledEvents.incrementAndGet();
>     if (sslLock.tryLock()) {            
>         try {
>             do {
>                 while ((event = filterWriteEventQueue.poll()) != null) {
>                     // ...
>                 }
>             
>                 while ((event = messageReceivedEventQueue.poll()) != null){
>                     // ...
>                 }
>             } while (scheduledEvents.decrementAndGet() > 0);
>         } finally {
>             sslLock.unlock();
>         }
>     }
> }{code}
> We have observed occasions where the value of {{scheduledEvents}} becomes a 
> negative value, while at the same time {{filterWriteEventQueue}} go 
> unprocessed.
> We suspect that this issue is triggered by a concurrency issue caused by the 
> first thread decrementing the counter after a second thread incremented it, 
> but before it attempted to acquire the lock.
> This allows the the first thread to empty the queues, decrementing the 
> counter to zero and release the lock, after which the second thread acquires 
> the lock successfully. Now, the second thread processes any elements in 
> {{filterWriteEventQueue}}, and then processes any elements in 
> {{messageReceivedEventQueue}}. If in between these two checks yet another 
> thread adds a new element to {{filterWriteEventQueue}}, this element can go 
> unprocessed (as the second thread does not loop, since the counter is zero or 
> negative, and the third thread can fail to acquire the lock).
> It's a seemingly unlikely scenario, but we are observing the behavior when 
> our systems are under high load.
> We've applied a code change after which this problem is no longer observed. 
> We've removed the counter, and check on the size of the queues instead:
> {code:java}
> void flushScheduledEvents() {
>     if (sslLock.tryLock()) {            
>         try {
>             do {
>                 while ((event = filterWriteEventQueue.poll()) != null) {
>                     // ...
>                 }
>             
>                 while ((event = messageReceivedEventQueue.poll()) != null){
>                     // ...
>                 }
>             } while (!filterWriteEventQueue.isEmpty() || 
> !messageReceivedEventQueue.isEmpty());
>         } finally {
>             sslLock.unlock();
>         }
>     }
> }{code}
> This code change, as illustrated above, does introduce a new potential 
> problem. Theoretically, an event could be added to the queues and 
> {{flushScheduledEvents}} be called returning {{false}} for 
> {{sslLock.tryLock()}}, exactly after another thread just finished the 
> {{while}} loop, but before releasing the lock. This again would cause events 
> to go unprocessed.
> We've not observed this problem in the wild yet, but we're uncomfortable 
> applying this change as-is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux

Reply via email to