[jira] [Resolved] (SSHD-917) Add support for SSH2 public key file format
[ https://issues.apache.org/jira/browse/SSHD-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Goldstein Lyor resolved SSHD-917. - Resolution: Fixed Fix Version/s: 2.3.0 > Add support for SSH2 public key file format > --- > > Key: SSHD-917 > URL: https://issues.apache.org/jira/browse/SSHD-917 > Project: MINA SSHD > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Goldstein Lyor >Assignee: Goldstein Lyor >Priority: Minor > Fix For: 2.3.0 > > > [https://tools.ietf.org/html/rfc4716] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux
[ https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841505#comment-16841505 ] Emmanuel Lecharny commented on DIRMINA-1107: FTR, I just deleted a comment and an attached file, as I realized 5 mins after having pushed them I was wrong. The *only* reason we have a write queue is for when we have messages to be written while the handshake is not completed: we need to wait and keep those messages. My deleted comment was that we can bypass the {{SslHandler.scheduleFilterWrite()}} and push the writes to the next filter. That works, assuming no messages are written while the handshake is being processed (this is possibly what could happen when using {{startTLS}}). We can leverage the newly added {{event}} in {{2.1}} to start flushing those pending messages when the session has been secured, but in {{2.0}} it's going a bit more complex. > SslHandler flushScheduledEvents race condition, redux > - > > Key: DIRMINA-1107 > URL: https://issues.apache.org/jira/browse/DIRMINA-1107 > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Fix For: 2.1.3 > > > DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally > replaces it with another multithreading issue. > The fix for DIRMINA-1019 introduces a counter that contains the number of > events to be processed. A simplified version of the code is included below. > {code:java} > private final AtomicInteger scheduledEvents = new AtomicInteger(0); > void flushScheduledEvents() { > scheduledEvents.incrementAndGet(); > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (scheduledEvents.decrementAndGet() > 0); > } finally { > sslLock.unlock(); > } > } > }{code} > We have observed occasions where the value of {{scheduledEvents}} becomes a > negative value, while at the same time {{filterWriteEventQueue}} go > unprocessed. > We suspect that this issue is triggered by a concurrency issue caused by the > first thread decrementing the counter after a second thread incremented it, > but before it attempted to acquire the lock. > This allows the the first thread to empty the queues, decrementing the > counter to zero and release the lock, after which the second thread acquires > the lock successfully. Now, the second thread processes any elements in > {{filterWriteEventQueue}}, and then processes any elements in > {{messageReceivedEventQueue}}. If in between these two checks yet another > thread adds a new element to {{filterWriteEventQueue}}, this element can go > unprocessed (as the second thread does not loop, since the counter is zero or > negative, and the third thread can fail to acquire the lock). > It's a seemingly unlikely scenario, but we are observing the behavior when > our systems are under high load. > We've applied a code change after which this problem is no longer observed. > We've removed the counter, and check on the size of the queues instead: > {code:java} > void flushScheduledEvents() { > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (!filterWriteEventQueue.isEmpty() || > !messageReceivedEventQueue.isEmpty()); > } finally { > sslLock.unlock(); > } > } > }{code} > This code change, as illustrated above, does introduce a new potential > problem. Theoretically, an event could be added to the queues and > {{flushScheduledEvents}} be called returning {{false}} for > {{sslLock.tryLock()}}, exactly after another thread just finished the > {{while}} loop, but before releasing the lock. This again would cause events > to go unprocessed. > We've not observed this problem in the wild yet, but we're uncomfortable > applying this change as-is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux
[ https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emmanuel Lecharny updated DIRMINA-1107: --- Attachment: (was: ssl.diff) > SslHandler flushScheduledEvents race condition, redux > - > > Key: DIRMINA-1107 > URL: https://issues.apache.org/jira/browse/DIRMINA-1107 > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Fix For: 2.1.3 > > > DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally > replaces it with another multithreading issue. > The fix for DIRMINA-1019 introduces a counter that contains the number of > events to be processed. A simplified version of the code is included below. > {code:java} > private final AtomicInteger scheduledEvents = new AtomicInteger(0); > void flushScheduledEvents() { > scheduledEvents.incrementAndGet(); > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (scheduledEvents.decrementAndGet() > 0); > } finally { > sslLock.unlock(); > } > } > }{code} > We have observed occasions where the value of {{scheduledEvents}} becomes a > negative value, while at the same time {{filterWriteEventQueue}} go > unprocessed. > We suspect that this issue is triggered by a concurrency issue caused by the > first thread decrementing the counter after a second thread incremented it, > but before it attempted to acquire the lock. > This allows the the first thread to empty the queues, decrementing the > counter to zero and release the lock, after which the second thread acquires > the lock successfully. Now, the second thread processes any elements in > {{filterWriteEventQueue}}, and then processes any elements in > {{messageReceivedEventQueue}}. If in between these two checks yet another > thread adds a new element to {{filterWriteEventQueue}}, this element can go > unprocessed (as the second thread does not loop, since the counter is zero or > negative, and the third thread can fail to acquire the lock). > It's a seemingly unlikely scenario, but we are observing the behavior when > our systems are under high load. > We've applied a code change after which this problem is no longer observed. > We've removed the counter, and check on the size of the queues instead: > {code:java} > void flushScheduledEvents() { > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (!filterWriteEventQueue.isEmpty() || > !messageReceivedEventQueue.isEmpty()); > } finally { > sslLock.unlock(); > } > } > }{code} > This code change, as illustrated above, does introduce a new potential > problem. Theoretically, an event could be added to the queues and > {{flushScheduledEvents}} be called returning {{false}} for > {{sslLock.tryLock()}}, exactly after another thread just finished the > {{while}} loop, but before releasing the lock. This again would cause events > to go unprocessed. > We've not observed this problem in the wild yet, but we're uncomfortable > applying this change as-is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux
[ https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emmanuel Lecharny updated DIRMINA-1107: --- Comment: was deleted (was: So here is my first proposed change in {{SslFilter}} (see attachement) that directly pushes writes to the next filter (aka {{HeadFilter}}) without having to use the {{SslHander.flushScheduledEvents()}} method (I haven't yet modified the {{SslHandler}} class to get rid of the write queue, but will do soon). Tests are passing green. More to come.) > SslHandler flushScheduledEvents race condition, redux > - > > Key: DIRMINA-1107 > URL: https://issues.apache.org/jira/browse/DIRMINA-1107 > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Fix For: 2.1.3 > > > DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally > replaces it with another multithreading issue. > The fix for DIRMINA-1019 introduces a counter that contains the number of > events to be processed. A simplified version of the code is included below. > {code:java} > private final AtomicInteger scheduledEvents = new AtomicInteger(0); > void flushScheduledEvents() { > scheduledEvents.incrementAndGet(); > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (scheduledEvents.decrementAndGet() > 0); > } finally { > sslLock.unlock(); > } > } > }{code} > We have observed occasions where the value of {{scheduledEvents}} becomes a > negative value, while at the same time {{filterWriteEventQueue}} go > unprocessed. > We suspect that this issue is triggered by a concurrency issue caused by the > first thread decrementing the counter after a second thread incremented it, > but before it attempted to acquire the lock. > This allows the the first thread to empty the queues, decrementing the > counter to zero and release the lock, after which the second thread acquires > the lock successfully. Now, the second thread processes any elements in > {{filterWriteEventQueue}}, and then processes any elements in > {{messageReceivedEventQueue}}. If in between these two checks yet another > thread adds a new element to {{filterWriteEventQueue}}, this element can go > unprocessed (as the second thread does not loop, since the counter is zero or > negative, and the third thread can fail to acquire the lock). > It's a seemingly unlikely scenario, but we are observing the behavior when > our systems are under high load. > We've applied a code change after which this problem is no longer observed. > We've removed the counter, and check on the size of the queues instead: > {code:java} > void flushScheduledEvents() { > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (!filterWriteEventQueue.isEmpty() || > !messageReceivedEventQueue.isEmpty()); > } finally { > sslLock.unlock(); > } > } > }{code} > This code change, as illustrated above, does introduce a new potential > problem. Theoretically, an event could be added to the queues and > {{flushScheduledEvents}} be called returning {{false}} for > {{sslLock.tryLock()}}, exactly after another thread just finished the > {{while}} loop, but before releasing the lock. This again would cause events > to go unprocessed. > We've not observed this problem in the wild yet, but we're uncomfortable > applying this change as-is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux
[ https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emmanuel Lecharny updated DIRMINA-1107: --- Attachment: ssl.diff > SslHandler flushScheduledEvents race condition, redux > - > > Key: DIRMINA-1107 > URL: https://issues.apache.org/jira/browse/DIRMINA-1107 > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Fix For: 2.1.3 > > Attachments: ssl.diff > > > DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally > replaces it with another multithreading issue. > The fix for DIRMINA-1019 introduces a counter that contains the number of > events to be processed. A simplified version of the code is included below. > {code:java} > private final AtomicInteger scheduledEvents = new AtomicInteger(0); > void flushScheduledEvents() { > scheduledEvents.incrementAndGet(); > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (scheduledEvents.decrementAndGet() > 0); > } finally { > sslLock.unlock(); > } > } > }{code} > We have observed occasions where the value of {{scheduledEvents}} becomes a > negative value, while at the same time {{filterWriteEventQueue}} go > unprocessed. > We suspect that this issue is triggered by a concurrency issue caused by the > first thread decrementing the counter after a second thread incremented it, > but before it attempted to acquire the lock. > This allows the the first thread to empty the queues, decrementing the > counter to zero and release the lock, after which the second thread acquires > the lock successfully. Now, the second thread processes any elements in > {{filterWriteEventQueue}}, and then processes any elements in > {{messageReceivedEventQueue}}. If in between these two checks yet another > thread adds a new element to {{filterWriteEventQueue}}, this element can go > unprocessed (as the second thread does not loop, since the counter is zero or > negative, and the third thread can fail to acquire the lock). > It's a seemingly unlikely scenario, but we are observing the behavior when > our systems are under high load. > We've applied a code change after which this problem is no longer observed. > We've removed the counter, and check on the size of the queues instead: > {code:java} > void flushScheduledEvents() { > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (!filterWriteEventQueue.isEmpty() || > !messageReceivedEventQueue.isEmpty()); > } finally { > sslLock.unlock(); > } > } > }{code} > This code change, as illustrated above, does introduce a new potential > problem. Theoretically, an event could be added to the queues and > {{flushScheduledEvents}} be called returning {{false}} for > {{sslLock.tryLock()}}, exactly after another thread just finished the > {{while}} loop, but before releasing the lock. This again would cause events > to go unprocessed. > We've not observed this problem in the wild yet, but we're uncomfortable > applying this change as-is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux
[ https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841490#comment-16841490 ] Emmanuel Lecharny commented on DIRMINA-1107: So here is my first proposed change in {{SslFilter}} (see attachement) that directly pushes writes to the next filter (aka {{HeadFilter}}) without having to use the {{SslHander.flushScheduledEvents()}} method (I haven't yet modified the {{SslHandler}} class to get rid of the write queue, but will do soon). Tests are passing green. More to come. > SslHandler flushScheduledEvents race condition, redux > - > > Key: DIRMINA-1107 > URL: https://issues.apache.org/jira/browse/DIRMINA-1107 > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Fix For: 2.1.3 > > Attachments: ssl.diff > > > DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally > replaces it with another multithreading issue. > The fix for DIRMINA-1019 introduces a counter that contains the number of > events to be processed. A simplified version of the code is included below. > {code:java} > private final AtomicInteger scheduledEvents = new AtomicInteger(0); > void flushScheduledEvents() { > scheduledEvents.incrementAndGet(); > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (scheduledEvents.decrementAndGet() > 0); > } finally { > sslLock.unlock(); > } > } > }{code} > We have observed occasions where the value of {{scheduledEvents}} becomes a > negative value, while at the same time {{filterWriteEventQueue}} go > unprocessed. > We suspect that this issue is triggered by a concurrency issue caused by the > first thread decrementing the counter after a second thread incremented it, > but before it attempted to acquire the lock. > This allows the the first thread to empty the queues, decrementing the > counter to zero and release the lock, after which the second thread acquires > the lock successfully. Now, the second thread processes any elements in > {{filterWriteEventQueue}}, and then processes any elements in > {{messageReceivedEventQueue}}. If in between these two checks yet another > thread adds a new element to {{filterWriteEventQueue}}, this element can go > unprocessed (as the second thread does not loop, since the counter is zero or > negative, and the third thread can fail to acquire the lock). > It's a seemingly unlikely scenario, but we are observing the behavior when > our systems are under high load. > We've applied a code change after which this problem is no longer observed. > We've removed the counter, and check on the size of the queues instead: > {code:java} > void flushScheduledEvents() { > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (!filterWriteEventQueue.isEmpty() || > !messageReceivedEventQueue.isEmpty()); > } finally { > sslLock.unlock(); > } > } > }{code} > This code change, as illustrated above, does introduce a new potential > problem. Theoretically, an event could be added to the queues and > {{flushScheduledEvents}} be called returning {{false}} for > {{sslLock.tryLock()}}, exactly after another thread just finished the > {{while}} loop, but before releasing the lock. This again would cause events > to go unprocessed. > We've not observed this problem in the wild yet, but we're uncomfortable > applying this change as-is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SSHD-918) Implement UMAC message authentication code
[ https://issues.apache.org/jira/browse/SSHD-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Goldstein Lyor updated SSHD-918: Description: https://tools.ietf.org/html/draft-miller-secsh-umac-01 https://tools.ietf.org/html/rfc4418 was:https://tools.ietf.org/html/draft-miller-secsh-umac-01 > Implement UMAC message authentication code > -- > > Key: SSHD-918 > URL: https://issues.apache.org/jira/browse/SSHD-918 > Project: MINA SSHD > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Goldstein Lyor >Priority: Minor > > https://tools.ietf.org/html/draft-miller-secsh-umac-01 > https://tools.ietf.org/html/rfc4418 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux
[ https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841103#comment-16841103 ] Guus der Kinderen commented on DIRMINA-1107: \o/ For what it's worth: since I wrote that comment, we have successfully applied the solution that I've outlined in https://issues.apache.org/jira/browse/DIRMINA-1107?focusedCommentId=16838709&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16838709 > SslHandler flushScheduledEvents race condition, redux > - > > Key: DIRMINA-1107 > URL: https://issues.apache.org/jira/browse/DIRMINA-1107 > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Fix For: 2.1.3 > > > DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally > replaces it with another multithreading issue. > The fix for DIRMINA-1019 introduces a counter that contains the number of > events to be processed. A simplified version of the code is included below. > {code:java} > private final AtomicInteger scheduledEvents = new AtomicInteger(0); > void flushScheduledEvents() { > scheduledEvents.incrementAndGet(); > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (scheduledEvents.decrementAndGet() > 0); > } finally { > sslLock.unlock(); > } > } > }{code} > We have observed occasions where the value of {{scheduledEvents}} becomes a > negative value, while at the same time {{filterWriteEventQueue}} go > unprocessed. > We suspect that this issue is triggered by a concurrency issue caused by the > first thread decrementing the counter after a second thread incremented it, > but before it attempted to acquire the lock. > This allows the the first thread to empty the queues, decrementing the > counter to zero and release the lock, after which the second thread acquires > the lock successfully. Now, the second thread processes any elements in > {{filterWriteEventQueue}}, and then processes any elements in > {{messageReceivedEventQueue}}. If in between these two checks yet another > thread adds a new element to {{filterWriteEventQueue}}, this element can go > unprocessed (as the second thread does not loop, since the counter is zero or > negative, and the third thread can fail to acquire the lock). > It's a seemingly unlikely scenario, but we are observing the behavior when > our systems are under high load. > We've applied a code change after which this problem is no longer observed. > We've removed the counter, and check on the size of the queues instead: > {code:java} > void flushScheduledEvents() { > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (!filterWriteEventQueue.isEmpty() || > !messageReceivedEventQueue.isEmpty()); > } finally { > sslLock.unlock(); > } > } > }{code} > This code change, as illustrated above, does introduce a new potential > problem. Theoretically, an event could be added to the queues and > {{flushScheduledEvents}} be called returning {{false}} for > {{sslLock.tryLock()}}, exactly after another thread just finished the > {{while}} loop, but before releasing the lock. This again would cause events > to go unprocessed. > We've not observed this problem in the wild yet, but we're uncomfortable > applying this change as-is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux
[ https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841096#comment-16841096 ] Emmanuel Lecharny commented on DIRMINA-1107: Don't worry, we hijacked the thread devising about some future changes. The fix will be applied in the existing branches, without breaking the APIs. > SslHandler flushScheduledEvents race condition, redux > - > > Key: DIRMINA-1107 > URL: https://issues.apache.org/jira/browse/DIRMINA-1107 > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Fix For: 2.1.3 > > > DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally > replaces it with another multithreading issue. > The fix for DIRMINA-1019 introduces a counter that contains the number of > events to be processed. A simplified version of the code is included below. > {code:java} > private final AtomicInteger scheduledEvents = new AtomicInteger(0); > void flushScheduledEvents() { > scheduledEvents.incrementAndGet(); > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (scheduledEvents.decrementAndGet() > 0); > } finally { > sslLock.unlock(); > } > } > }{code} > We have observed occasions where the value of {{scheduledEvents}} becomes a > negative value, while at the same time {{filterWriteEventQueue}} go > unprocessed. > We suspect that this issue is triggered by a concurrency issue caused by the > first thread decrementing the counter after a second thread incremented it, > but before it attempted to acquire the lock. > This allows the the first thread to empty the queues, decrementing the > counter to zero and release the lock, after which the second thread acquires > the lock successfully. Now, the second thread processes any elements in > {{filterWriteEventQueue}}, and then processes any elements in > {{messageReceivedEventQueue}}. If in between these two checks yet another > thread adds a new element to {{filterWriteEventQueue}}, this element can go > unprocessed (as the second thread does not loop, since the counter is zero or > negative, and the third thread can fail to acquire the lock). > It's a seemingly unlikely scenario, but we are observing the behavior when > our systems are under high load. > We've applied a code change after which this problem is no longer observed. > We've removed the counter, and check on the size of the queues instead: > {code:java} > void flushScheduledEvents() { > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (!filterWriteEventQueue.isEmpty() || > !messageReceivedEventQueue.isEmpty()); > } finally { > sslLock.unlock(); > } > } > }{code} > This code change, as illustrated above, does introduce a new potential > problem. Theoretically, an event could be added to the queues and > {{flushScheduledEvents}} be called returning {{false}} for > {{sslLock.tryLock()}}, exactly after another thread just finished the > {{while}} loop, but before releasing the lock. This again would cause events > to go unprocessed. > We've not observed this problem in the wild yet, but we're uncomfortable > applying this change as-is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux
[ https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841084#comment-16841084 ] Guus der Kinderen commented on DIRMINA-1107: I'd love to see a solution for the 2.1 branch, even if that's a patch of the existing, sub-par, implementation. We're currently suffering from issues (which prompted me to create this issue), which we'd like to fix before/without upgrading to a new major release of MINA. > SslHandler flushScheduledEvents race condition, redux > - > > Key: DIRMINA-1107 > URL: https://issues.apache.org/jira/browse/DIRMINA-1107 > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Fix For: 2.1.3 > > > DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally > replaces it with another multithreading issue. > The fix for DIRMINA-1019 introduces a counter that contains the number of > events to be processed. A simplified version of the code is included below. > {code:java} > private final AtomicInteger scheduledEvents = new AtomicInteger(0); > void flushScheduledEvents() { > scheduledEvents.incrementAndGet(); > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (scheduledEvents.decrementAndGet() > 0); > } finally { > sslLock.unlock(); > } > } > }{code} > We have observed occasions where the value of {{scheduledEvents}} becomes a > negative value, while at the same time {{filterWriteEventQueue}} go > unprocessed. > We suspect that this issue is triggered by a concurrency issue caused by the > first thread decrementing the counter after a second thread incremented it, > but before it attempted to acquire the lock. > This allows the the first thread to empty the queues, decrementing the > counter to zero and release the lock, after which the second thread acquires > the lock successfully. Now, the second thread processes any elements in > {{filterWriteEventQueue}}, and then processes any elements in > {{messageReceivedEventQueue}}. If in between these two checks yet another > thread adds a new element to {{filterWriteEventQueue}}, this element can go > unprocessed (as the second thread does not loop, since the counter is zero or > negative, and the third thread can fail to acquire the lock). > It's a seemingly unlikely scenario, but we are observing the behavior when > our systems are under high load. > We've applied a code change after which this problem is no longer observed. > We've removed the counter, and check on the size of the queues instead: > {code:java} > void flushScheduledEvents() { > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (!filterWriteEventQueue.isEmpty() || > !messageReceivedEventQueue.isEmpty()); > } finally { > sslLock.unlock(); > } > } > }{code} > This code change, as illustrated above, does introduce a new potential > problem. Theoretically, an event could be added to the queues and > {{flushScheduledEvents}} be called returning {{false}} for > {{sslLock.tryLock()}}, exactly after another thread just finished the > {{while}} loop, but before releasing the lock. This again would cause events > to go unprocessed. > We've not observed this problem in the wild yet, but we're uncomfortable > applying this change as-is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux
[ https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841071#comment-16841071 ] Emmanuel Lecharny commented on DIRMINA-1107: Agreed. In any case, that would be quite a disruptive change, and can't be injected in 2.0 or 2.1. > SslHandler flushScheduledEvents race condition, redux > - > > Key: DIRMINA-1107 > URL: https://issues.apache.org/jira/browse/DIRMINA-1107 > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Fix For: 2.1.3 > > > DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally > replaces it with another multithreading issue. > The fix for DIRMINA-1019 introduces a counter that contains the number of > events to be processed. A simplified version of the code is included below. > {code:java} > private final AtomicInteger scheduledEvents = new AtomicInteger(0); > void flushScheduledEvents() { > scheduledEvents.incrementAndGet(); > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (scheduledEvents.decrementAndGet() > 0); > } finally { > sslLock.unlock(); > } > } > }{code} > We have observed occasions where the value of {{scheduledEvents}} becomes a > negative value, while at the same time {{filterWriteEventQueue}} go > unprocessed. > We suspect that this issue is triggered by a concurrency issue caused by the > first thread decrementing the counter after a second thread incremented it, > but before it attempted to acquire the lock. > This allows the the first thread to empty the queues, decrementing the > counter to zero and release the lock, after which the second thread acquires > the lock successfully. Now, the second thread processes any elements in > {{filterWriteEventQueue}}, and then processes any elements in > {{messageReceivedEventQueue}}. If in between these two checks yet another > thread adds a new element to {{filterWriteEventQueue}}, this element can go > unprocessed (as the second thread does not loop, since the counter is zero or > negative, and the third thread can fail to acquire the lock). > It's a seemingly unlikely scenario, but we are observing the behavior when > our systems are under high load. > We've applied a code change after which this problem is no longer observed. > We've removed the counter, and check on the size of the queues instead: > {code:java} > void flushScheduledEvents() { > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (!filterWriteEventQueue.isEmpty() || > !messageReceivedEventQueue.isEmpty()); > } finally { > sslLock.unlock(); > } > } > }{code} > This code change, as illustrated above, does introduce a new potential > problem. Theoretically, an event could be added to the queues and > {{flushScheduledEvents}} be called returning {{false}} for > {{sslLock.tryLock()}}, exactly after another thread just finished the > {{while}} loop, but before releasing the lock. This again would cause events > to go unprocessed. > We've not observed this problem in the wild yet, but we're uncomfortable > applying this change as-is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)