[jira] [Updated] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux

2022-02-07 Thread Jira


 [ 
https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmanuel Lécharny updated DIRMINA-1107:
---
Fix Version/s: 2.0.23

> SslHandler flushScheduledEvents race condition, redux
> -
>
> Key: DIRMINA-1107
> URL: https://issues.apache.org/jira/browse/DIRMINA-1107
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Assignee: Jonathan Valliere
>Priority: Major
> Fix For: 2.1.3, 2.0.23
>
>
> DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally 
> replaces it with another multithreading issue.
> The fix for DIRMINA-1019 introduces a counter that contains the number of 
> events to be processed. A simplified version of the code is included below.
> {code:java}
> private final AtomicInteger scheduledEvents = new AtomicInteger(0);
> void flushScheduledEvents() {
> scheduledEvents.incrementAndGet();
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (scheduledEvents.decrementAndGet() > 0);
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> We have observed occasions where the value of {{scheduledEvents}} becomes a 
> negative value, while at the same time {{filterWriteEventQueue}} go 
> unprocessed.
> We suspect that this issue is triggered by a concurrency issue caused by the 
> first thread decrementing the counter after a second thread incremented it, 
> but before it attempted to acquire the lock.
> This allows the the first thread to empty the queues, decrementing the 
> counter to zero and release the lock, after which the second thread acquires 
> the lock successfully. Now, the second thread processes any elements in 
> {{filterWriteEventQueue}}, and then processes any elements in 
> {{messageReceivedEventQueue}}. If in between these two checks yet another 
> thread adds a new element to {{filterWriteEventQueue}}, this element can go 
> unprocessed (as the second thread does not loop, since the counter is zero or 
> negative, and the third thread can fail to acquire the lock).
> It's a seemingly unlikely scenario, but we are observing the behavior when 
> our systems are under high load.
> We've applied a code change after which this problem is no longer observed. 
> We've removed the counter, and check on the size of the queues instead:
> {code:java}
> void flushScheduledEvents() {
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (!filterWriteEventQueue.isEmpty() || 
> !messageReceivedEventQueue.isEmpty());
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> This code change, as illustrated above, does introduce a new potential 
> problem. Theoretically, an event could be added to the queues and 
> {{flushScheduledEvents}} be called returning {{false}} for 
> {{sslLock.tryLock()}}, exactly after another thread just finished the 
> {{while}} loop, but before releasing the lock. This again would cause events 
> to go unprocessed.
> We've not observed this problem in the wild yet, but we're uncomfortable 
> applying this change as-is.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



[jira] [Updated] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux

2019-05-16 Thread Emmanuel Lecharny (JIRA)


 [ 
https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmanuel Lecharny updated DIRMINA-1107:
---
Attachment: (was: ssl.diff)

> SslHandler flushScheduledEvents race condition, redux
> -
>
> Key: DIRMINA-1107
> URL: https://issues.apache.org/jira/browse/DIRMINA-1107
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Fix For: 2.1.3
>
>
> DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally 
> replaces it with another multithreading issue.
> The fix for DIRMINA-1019 introduces a counter that contains the number of 
> events to be processed. A simplified version of the code is included below.
> {code:java}
> private final AtomicInteger scheduledEvents = new AtomicInteger(0);
> void flushScheduledEvents() {
> scheduledEvents.incrementAndGet();
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (scheduledEvents.decrementAndGet() > 0);
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> We have observed occasions where the value of {{scheduledEvents}} becomes a 
> negative value, while at the same time {{filterWriteEventQueue}} go 
> unprocessed.
> We suspect that this issue is triggered by a concurrency issue caused by the 
> first thread decrementing the counter after a second thread incremented it, 
> but before it attempted to acquire the lock.
> This allows the the first thread to empty the queues, decrementing the 
> counter to zero and release the lock, after which the second thread acquires 
> the lock successfully. Now, the second thread processes any elements in 
> {{filterWriteEventQueue}}, and then processes any elements in 
> {{messageReceivedEventQueue}}. If in between these two checks yet another 
> thread adds a new element to {{filterWriteEventQueue}}, this element can go 
> unprocessed (as the second thread does not loop, since the counter is zero or 
> negative, and the third thread can fail to acquire the lock).
> It's a seemingly unlikely scenario, but we are observing the behavior when 
> our systems are under high load.
> We've applied a code change after which this problem is no longer observed. 
> We've removed the counter, and check on the size of the queues instead:
> {code:java}
> void flushScheduledEvents() {
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (!filterWriteEventQueue.isEmpty() || 
> !messageReceivedEventQueue.isEmpty());
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> This code change, as illustrated above, does introduce a new potential 
> problem. Theoretically, an event could be added to the queues and 
> {{flushScheduledEvents}} be called returning {{false}} for 
> {{sslLock.tryLock()}}, exactly after another thread just finished the 
> {{while}} loop, but before releasing the lock. This again would cause events 
> to go unprocessed.
> We've not observed this problem in the wild yet, but we're uncomfortable 
> applying this change as-is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux

2019-05-16 Thread Emmanuel Lecharny (JIRA)


 [ 
https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmanuel Lecharny updated DIRMINA-1107:
---
Attachment: ssl.diff

> SslHandler flushScheduledEvents race condition, redux
> -
>
> Key: DIRMINA-1107
> URL: https://issues.apache.org/jira/browse/DIRMINA-1107
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Fix For: 2.1.3
>
> Attachments: ssl.diff
>
>
> DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally 
> replaces it with another multithreading issue.
> The fix for DIRMINA-1019 introduces a counter that contains the number of 
> events to be processed. A simplified version of the code is included below.
> {code:java}
> private final AtomicInteger scheduledEvents = new AtomicInteger(0);
> void flushScheduledEvents() {
> scheduledEvents.incrementAndGet();
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (scheduledEvents.decrementAndGet() > 0);
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> We have observed occasions where the value of {{scheduledEvents}} becomes a 
> negative value, while at the same time {{filterWriteEventQueue}} go 
> unprocessed.
> We suspect that this issue is triggered by a concurrency issue caused by the 
> first thread decrementing the counter after a second thread incremented it, 
> but before it attempted to acquire the lock.
> This allows the the first thread to empty the queues, decrementing the 
> counter to zero and release the lock, after which the second thread acquires 
> the lock successfully. Now, the second thread processes any elements in 
> {{filterWriteEventQueue}}, and then processes any elements in 
> {{messageReceivedEventQueue}}. If in between these two checks yet another 
> thread adds a new element to {{filterWriteEventQueue}}, this element can go 
> unprocessed (as the second thread does not loop, since the counter is zero or 
> negative, and the third thread can fail to acquire the lock).
> It's a seemingly unlikely scenario, but we are observing the behavior when 
> our systems are under high load.
> We've applied a code change after which this problem is no longer observed. 
> We've removed the counter, and check on the size of the queues instead:
> {code:java}
> void flushScheduledEvents() {
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (!filterWriteEventQueue.isEmpty() || 
> !messageReceivedEventQueue.isEmpty());
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> This code change, as illustrated above, does introduce a new potential 
> problem. Theoretically, an event could be added to the queues and 
> {{flushScheduledEvents}} be called returning {{false}} for 
> {{sslLock.tryLock()}}, exactly after another thread just finished the 
> {{while}} loop, but before releasing the lock. This again would cause events 
> to go unprocessed.
> We've not observed this problem in the wild yet, but we're uncomfortable 
> applying this change as-is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux

2019-05-13 Thread Jonathan Valliere (JIRA)


 [ 
https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Valliere updated DIRMINA-1107:
---
Fix Version/s: (was: 2.0.23)

> SslHandler flushScheduledEvents race condition, redux
> -
>
> Key: DIRMINA-1107
> URL: https://issues.apache.org/jira/browse/DIRMINA-1107
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Fix For: 2.1.3
>
>
> DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally 
> replaces it with another multithreading issue.
> The fix for DIRMINA-1019 introduces a counter that contains the number of 
> events to be processed. A simplified version of the code is included below.
> {code:java}
> private final AtomicInteger scheduledEvents = new AtomicInteger(0);
> void flushScheduledEvents() {
> scheduledEvents.incrementAndGet();
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (scheduledEvents.decrementAndGet() > 0);
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> We have observed occasions where the value of {{scheduledEvents}} becomes a 
> negative value, while at the same time {{filterWriteEventQueue}} go 
> unprocessed.
> We suspect that this issue is triggered by a concurrency issue caused by the 
> first thread decrementing the counter after a second thread incremented it, 
> but before it attempted to acquire the lock.
> This allows the the first thread to empty the queues, decrementing the 
> counter to zero and release the lock, after which the second thread acquires 
> the lock successfully. Now, the second thread processes any elements in 
> {{filterWriteEventQueue}}, and then processes any elements in 
> {{messageReceivedEventQueue}}. If in between these two checks yet another 
> thread adds a new element to {{filterWriteEventQueue}}, this element can go 
> unprocessed (as the second thread does not loop, since the counter is zero or 
> negative, and the third thread can fail to acquire the lock).
> It's a seemingly unlikely scenario, but we are observing the behavior when 
> our systems are under high load.
> We've applied a code change after which this problem is no longer observed. 
> We've removed the counter, and check on the size of the queues instead:
> {code:java}
> void flushScheduledEvents() {
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (!filterWriteEventQueue.isEmpty() || 
> !messageReceivedEventQueue.isEmpty());
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> This code change, as illustrated above, does introduce a new potential 
> problem. Theoretically, an event could be added to the queues and 
> {{flushScheduledEvents}} be called returning {{false}} for 
> {{sslLock.tryLock()}}, exactly after another thread just finished the 
> {{while}} loop, but before releasing the lock. This again would cause events 
> to go unprocessed.
> We've not observed this problem in the wild yet, but we're uncomfortable 
> applying this change as-is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux

2019-05-13 Thread Jonathan Valliere (JIRA)


 [ 
https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Valliere updated DIRMINA-1107:
---
Affects Version/s: 2.1.2

> SslHandler flushScheduledEvents race condition, redux
> -
>
> Key: DIRMINA-1107
> URL: https://issues.apache.org/jira/browse/DIRMINA-1107
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Fix For: 2.1.3, 2.0.23
>
>
> DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally 
> replaces it with another multithreading issue.
> The fix for DIRMINA-1019 introduces a counter that contains the number of 
> events to be processed. A simplified version of the code is included below.
> {code:java}
> private final AtomicInteger scheduledEvents = new AtomicInteger(0);
> void flushScheduledEvents() {
> scheduledEvents.incrementAndGet();
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (scheduledEvents.decrementAndGet() > 0);
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> We have observed occasions where the value of {{scheduledEvents}} becomes a 
> negative value, while at the same time {{filterWriteEventQueue}} go 
> unprocessed.
> We suspect that this issue is triggered by a concurrency issue caused by the 
> first thread decrementing the counter after a second thread incremented it, 
> but before it attempted to acquire the lock.
> This allows the the first thread to empty the queues, decrementing the 
> counter to zero and release the lock, after which the second thread acquires 
> the lock successfully. Now, the second thread processes any elements in 
> {{filterWriteEventQueue}}, and then processes any elements in 
> {{messageReceivedEventQueue}}. If in between these two checks yet another 
> thread adds a new element to {{filterWriteEventQueue}}, this element can go 
> unprocessed (as the second thread does not loop, since the counter is zero or 
> negative, and the third thread can fail to acquire the lock).
> It's a seemingly unlikely scenario, but we are observing the behavior when 
> our systems are under high load.
> We've applied a code change after which this problem is no longer observed. 
> We've removed the counter, and check on the size of the queues instead:
> {code:java}
> void flushScheduledEvents() {
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (!filterWriteEventQueue.isEmpty() || 
> !messageReceivedEventQueue.isEmpty());
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> This code change, as illustrated above, does introduce a new potential 
> problem. Theoretically, an event could be added to the queues and 
> {{flushScheduledEvents}} be called returning {{false}} for 
> {{sslLock.tryLock()}}, exactly after another thread just finished the 
> {{while}} loop, but before releasing the lock. This again would cause events 
> to go unprocessed.
> We've not observed this problem in the wild yet, but we're uncomfortable 
> applying this change as-is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux

2019-05-13 Thread Jonathan Valliere (JIRA)


 [ 
https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Valliere updated DIRMINA-1107:
---
Fix Version/s: 2.0.23
   2.1.3

> SslHandler flushScheduledEvents race condition, redux
> -
>
> Key: DIRMINA-1107
> URL: https://issues.apache.org/jira/browse/DIRMINA-1107
> Project: MINA
>  Issue Type: Bug
>Reporter: Guus der Kinderen
>Priority: Major
> Fix For: 2.1.3, 2.0.23
>
>
> DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally 
> replaces it with another multithreading issue.
> The fix for DIRMINA-1019 introduces a counter that contains the number of 
> events to be processed. A simplified version of the code is included below.
> {code:java}
> private final AtomicInteger scheduledEvents = new AtomicInteger(0);
> void flushScheduledEvents() {
> scheduledEvents.incrementAndGet();
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (scheduledEvents.decrementAndGet() > 0);
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> We have observed occasions where the value of {{scheduledEvents}} becomes a 
> negative value, while at the same time {{filterWriteEventQueue}} go 
> unprocessed.
> We suspect that this issue is triggered by a concurrency issue caused by the 
> first thread decrementing the counter after a second thread incremented it, 
> but before it attempted to acquire the lock.
> This allows the the first thread to empty the queues, decrementing the 
> counter to zero and release the lock, after which the second thread acquires 
> the lock successfully. Now, the second thread processes any elements in 
> {{filterWriteEventQueue}}, and then processes any elements in 
> {{messageReceivedEventQueue}}. If in between these two checks yet another 
> thread adds a new element to {{filterWriteEventQueue}}, this element can go 
> unprocessed (as the second thread does not loop, since the counter is zero or 
> negative, and the third thread can fail to acquire the lock).
> It's a seemingly unlikely scenario, but we are observing the behavior when 
> our systems are under high load.
> We've applied a code change after which this problem is no longer observed. 
> We've removed the counter, and check on the size of the queues instead:
> {code:java}
> void flushScheduledEvents() {
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (!filterWriteEventQueue.isEmpty() || 
> !messageReceivedEventQueue.isEmpty());
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> This code change, as illustrated above, does introduce a new potential 
> problem. Theoretically, an event could be added to the queues and 
> {{flushScheduledEvents}} be called returning {{false}} for 
> {{sslLock.tryLock()}}, exactly after another thread just finished the 
> {{while}} loop, but before releasing the lock. This again would cause events 
> to go unprocessed.
> We've not observed this problem in the wild yet, but we're uncomfortable 
> applying this change as-is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)