date:20190521

[jira] [Commented] (FTPSERVER-491) SSLConfigurationFactory.setSslProtocol never actually work

2019-05-21 Thread Roy Lu (JIRA)



[ 
https://issues.apache.org/jira/browse/FTPSERVER-491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845438#comment-16845438
 ] 

Roy Lu commented on FTPSERVER-491:
--

Hi [~johnnyv], how may I verify this? I was using Apache FTP through Maven. Or 
could you provide me the jar file built at your side?

> SSLConfigurationFactory.setSslProtocol never actually work
> --
>
> Key: FTPSERVER-491
> URL: https://issues.apache.org/jira/browse/FTPSERVER-491
> Project: FtpServer
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Roy Lu
>Assignee: Jonathan Valliere
>Priority: Critical
>  Labels: easyfix
> Fix For: 1.1.2
>
>
> It says in the document: Set the SSL protocol used for this channel. 
> Supported values are "SSL" and "TLS". Defaults to "TLS".
> Actually the available value could be TLSv1, TLSv1.1, TLSv1.2, SSLv3. This is 
> mentioned 
> [https://mina.apache.org/mina-project/userguide/ch11-ssl-filter/ch11-ssl-filter.html]
>  at the bottom.
> But the things is, the +setSslProtocol+ method here actually doesn't work. 
> Because the ssl protocol set in the +SSLConfiguration+ is never used. Check 
> +NioListener+ you will see this:
> Configuration of cipher suites was set into +sslFilter+ but no protocol. It 
> seems protocols are missing.
> |if (ssl.getEnabledCipherSuites() != null) {
>     sslFilter.setEnabledCipherSuites(ssl.getEnabledCipherSuites());
> }
>  
> |
> This leads to a problem:
> In +SSLHandler+ protocols will be set into +sslEngine+. Because protocol was 
> lost when building sslFilter, so the protocols setting never work.
>  
> |if (this.sslFilter.getEnabledCipherSuites() != null) {
>     
> this.sslEngine.setEnabledCipherSuites(this.sslFilter.getEnabledCipherSuites());
> }
>  
> if (this.sslFilter.getEnabledProtocols() != null) {
>    this.sslEngine.setEnabledProtocols(this.sslFilter.getEnabledProtocols());
> }|
>  
> I found this because I scanned FTP with Nmap. I set it to critical because 
> it's a security issue and hope it can be fixed soon.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (DIRMINA-1112) Backport DIRMINA-1107 to 2.0.X

2019-05-21 Thread Jonathan Valliere (JIRA)



 [ 
https://issues.apache.org/jira/browse/DIRMINA-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Valliere reassigned DIRMINA-1112:
--

Assignee: Jonathan Valliere

> Backport DIRMINA-1107 to 2.0.X
> --
>
> Key: DIRMINA-1112
> URL: https://issues.apache.org/jira/browse/DIRMINA-1112
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.0.22
>Reporter: Jonathan Valliere
>Assignee: Jonathan Valliere
>Priority: Major
>
> Backport DIRMINA-1107 to 2.0.X branch



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux

2019-05-21 Thread Jonathan Valliere (JIRA)



 [ 
https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Valliere resolved DIRMINA-1107.

Resolution: Fixed

> SslHandler flushScheduledEvents race condition, redux
> -
>
> Key: DIRMINA-1107
> URL: https://issues.apache.org/jira/browse/DIRMINA-1107
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Fix For: 2.1.3
>
>
> DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally 
> replaces it with another multithreading issue.
> The fix for DIRMINA-1019 introduces a counter that contains the number of 
> events to be processed. A simplified version of the code is included below.
> {code:java}
> private final AtomicInteger scheduledEvents = new AtomicInteger(0);
> void flushScheduledEvents() {
> scheduledEvents.incrementAndGet();
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (scheduledEvents.decrementAndGet() > 0);
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> We have observed occasions where the value of {{scheduledEvents}} becomes a 
> negative value, while at the same time {{filterWriteEventQueue}} go 
> unprocessed.
> We suspect that this issue is triggered by a concurrency issue caused by the 
> first thread decrementing the counter after a second thread incremented it, 
> but before it attempted to acquire the lock.
> This allows the the first thread to empty the queues, decrementing the 
> counter to zero and release the lock, after which the second thread acquires 
> the lock successfully. Now, the second thread processes any elements in 
> {{filterWriteEventQueue}}, and then processes any elements in 
> {{messageReceivedEventQueue}}. If in between these two checks yet another 
> thread adds a new element to {{filterWriteEventQueue}}, this element can go 
> unprocessed (as the second thread does not loop, since the counter is zero or 
> negative, and the third thread can fail to acquire the lock).
> It's a seemingly unlikely scenario, but we are observing the behavior when 
> our systems are under high load.
> We've applied a code change after which this problem is no longer observed. 
> We've removed the counter, and check on the size of the queues instead:
> {code:java}
> void flushScheduledEvents() {
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (!filterWriteEventQueue.isEmpty() || 
> !messageReceivedEventQueue.isEmpty());
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> This code change, as illustrated above, does introduce a new potential 
> problem. Theoretically, an event could be added to the queues and 
> {{flushScheduledEvents}} be called returning {{false}} for 
> {{sslLock.tryLock()}}, exactly after another thread just finished the 
> {{while}} loop, but before releasing the lock. This again would cause events 
> to go unprocessed.
> We've not observed this problem in the wild yet, but we're uncomfortable 
> applying this change as-is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DIRMINA-1112) Backport DIRMINA-1107 to 2.0.X

2019-05-21 Thread Jonathan Valliere (JIRA)



 [ 
https://issues.apache.org/jira/browse/DIRMINA-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Valliere updated DIRMINA-1112:
---
Description: Backport DIRMINA-1107 to 2.0.X branch

> Backport DIRMINA-1107 to 2.0.X
> --
>
> Key: DIRMINA-1112
> URL: https://issues.apache.org/jira/browse/DIRMINA-1112
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.0.22
>Reporter: Jonathan Valliere
>Priority: Major
>
> Backport DIRMINA-1107 to 2.0.X branch



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DIRMINA-1112) Backport DIRMINA-1107 to 2.0.X

2019-05-21 Thread Jonathan Valliere (JIRA)

Jonathan Valliere created DIRMINA-1112:
--

 Summary: Backport DIRMINA-1107 to 2.0.X
 Key: DIRMINA-1112
 URL: https://issues.apache.org/jira/browse/DIRMINA-1112
 Project: MINA
  Issue Type: Bug
Affects Versions: 2.0.22
Reporter: Jonathan Valliere






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux

2019-05-21 Thread Emmanuel Lecharny (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845261#comment-16845261
 ] 

Emmanuel Lecharny commented on DIRMINA-1107:


Can we backport the fix to 2.0 ?

> SslHandler flushScheduledEvents race condition, redux
> -
>
> Key: DIRMINA-1107
> URL: https://issues.apache.org/jira/browse/DIRMINA-1107
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Fix For: 2.1.3
>
>
> DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally 
> replaces it with another multithreading issue.
> The fix for DIRMINA-1019 introduces a counter that contains the number of 
> events to be processed. A simplified version of the code is included below.
> {code:java}
> private final AtomicInteger scheduledEvents = new AtomicInteger(0);
> void flushScheduledEvents() {
> scheduledEvents.incrementAndGet();
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (scheduledEvents.decrementAndGet() > 0);
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> We have observed occasions where the value of {{scheduledEvents}} becomes a 
> negative value, while at the same time {{filterWriteEventQueue}} go 
> unprocessed.
> We suspect that this issue is triggered by a concurrency issue caused by the 
> first thread decrementing the counter after a second thread incremented it, 
> but before it attempted to acquire the lock.
> This allows the the first thread to empty the queues, decrementing the 
> counter to zero and release the lock, after which the second thread acquires 
> the lock successfully. Now, the second thread processes any elements in 
> {{filterWriteEventQueue}}, and then processes any elements in 
> {{messageReceivedEventQueue}}. If in between these two checks yet another 
> thread adds a new element to {{filterWriteEventQueue}}, this element can go 
> unprocessed (as the second thread does not loop, since the counter is zero or 
> negative, and the third thread can fail to acquire the lock).
> It's a seemingly unlikely scenario, but we are observing the behavior when 
> our systems are under high load.
> We've applied a code change after which this problem is no longer observed. 
> We've removed the counter, and check on the size of the queues instead:
> {code:java}
> void flushScheduledEvents() {
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (!filterWriteEventQueue.isEmpty() || 
> !messageReceivedEventQueue.isEmpty());
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> This code change, as illustrated above, does introduce a new potential 
> problem. Theoretically, an event could be added to the queues and 
> {{flushScheduledEvents}} be called returning {{false}} for 
> {{sslLock.tryLock()}}, exactly after another thread just finished the 
> {{while}} loop, but before releasing the lock. This again would cause events 
> to go unprocessed.
> We've not observed this problem in the wild yet, but we're uncomfortable 
> applying this change as-is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (SSHD-920) Scp Subsystem does not support java.nio.file.FileSystem

2019-05-21 Thread Goldstein Lyor (JIRA)



 [ 
https://issues.apache.org/jira/browse/SSHD-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Goldstein Lyor resolved SSHD-920.
-
   Resolution: Implemented
 Assignee: Goldstein Lyor
Fix Version/s: 2.3.0

Please try the latest (snapshot) version - fixed by SSHD-893

> Scp Subsystem does not support java.nio.file.FileSystem
> ---
>
> Key: SSHD-920
> URL: https://issues.apache.org/jira/browse/SSHD-920
> Project: MINA SSHD
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Fulvio Cavarretta
>Assignee: Goldstein Lyor
>Priority: Blocker
> Fix For: 2.3.0
>
>
> If you plug a custom java.nio.file.FileSystem into Apache SSHD, some scp use 
> case are not supported.
> Specifically if you download from apache-sshd using wildcard like:
> scp user1@apache-sshd:/home/user1/* /tmp
> Command does not succed.
> This because in 
> *ScpFileOpener.getMatchingFilesToSend*
> the class *DirectoryScanner*, used to retrieve list of files to be 
> transferred, does not relies on java.nio.file.FileSystem plugged into SSHD, 
> but just leverage on local file system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Jonathan Valliere (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844936#comment-16844936
 ] 

Jonathan Valliere commented on DIRMINA-:


{{increaseWrittenBytes}} automatically decrements {{scheduledWriteBytes}} 
(which is a horrible idea but that is how it works).  We could just force set 
{{scheduledWriteBytes}} to 0 when {{clearWriteRequestQueue()}} is called.

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Emmanuel Lecharny (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844890#comment-16844890
 ] 

Emmanuel Lecharny commented on DIRMINA-:


We might updated the written bytes counter directly in the 
{{clearWriteRequestQueue()}} method, assuming we know how many bytes have 
already been  written.

In any case, if the stats are wrong but the code works (ie does not eat 
100%CPU), I guess it's acceptable for a moment, until we come to a rational 
view on how to correct the stats.

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Jonathan Valliere (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844887#comment-16844887
 ] 

Jonathan Valliere edited comment on DIRMINA- at 5/21/19 2:27 PM:
-

Did some more checking, the scheduledWriteBytes is always going to get messed 
up because it is being decrement as bytes are written.  My vote is to just 
remove {{buf.reset()}} line entirely.


was (Author: johnnyv):
Did some more checking, the scheduledWriteBytes is always going to get messed 
up because it is being decrement as bytes are written.  My vote it to just 
remove {{buf.reset()}} line entirely.

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Jonathan Valliere (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844887#comment-16844887
 ] 

Jonathan Valliere commented on DIRMINA-:


Did some more checking, the scheduledWriteBytes is always going to get messed 
up because it is being decrement as bytes are written.  My vote it to just 
remove {{buf.reset()}} line entirely.

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844880#comment-16844880
 ] 

Guus der Kinderen commented on DIRMINA-:


Maybe bike-shidding, but wouldn't {{clear}} potentially cause some bytes to be 
'uncounted', even if they were processed? Even if you're not using {{mark}} 
now, at some point in the future, that might be introduced again. Checking for 
a mark, and repositioning to the mark if present, otherwise to 0, seems like a 
desirable defensive construct. 

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844880#comment-16844880
 ] 

Guus der Kinderen edited comment on DIRMINA- at 5/21/19 2:17 PM:
-

Maybe bike-shedding, but wouldn't {{clear}} potentially cause some bytes to be 
'uncounted', even if they were processed? Even if you're not using {{mark}} 
now, at some point in the future, that might be introduced again. Checking for 
a mark, and repositioning to the mark if present, otherwise to 0, seems like a 
desirable defensive construct. 


was (Author: guus.der.kinderen):
Maybe bike-shidding, but wouldn't {{clear}} potentially cause some bytes to be 
'uncounted', even if they were processed? Even if you're not using {{mark}} 
now, at some point in the future, that might be introduced again. Checking for 
a mark, and repositioning to the mark if present, otherwise to 0, seems like a 
desirable defensive construct. 

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Emmanuel Lecharny (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844847#comment-16844847
 ] 

Emmanuel Lecharny commented on DIRMINA-:


Ok, after some history digging, I find out why {{buff.reset()}} was called. 
This is a 11 years old addition : 
[|http://svn.apache.org/viewvc?view=revision=592218].

The rational, AFAICT, was to be sure the stats about the number of written 
bytes is correct. We should not count the non-written bytes still present in a 
buffer when the session is disposed.

Calling {{clear()}} instead of {{ereset()}} - which only work when a {{mark()}} 
has been set - seems the right thing to do. Note that with {{2.0.X}} we don't 
have this issue because we call {{mark}}, something we don't do anymore with 
{{2.1.X}}.

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Jonathan Valliere (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844838#comment-16844838
 ] 

Jonathan Valliere commented on DIRMINA-:


Slice ensures that the bounds of the ByteBuffer View are correct.  The 
{{scheduledWriteBytes}} counter is incremented and decremented atomically based 
on {{remaining()}}.  The only purpose of {{clear()}} is to reset the position 
so when the {{scheduledWriteBytes}} is decremented as the {{IoSession}} closes 
that the counter correctly goes to zero.  It should not adversely affect any 
functionality.

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844834#comment-16844834
 ] 

Guus der Kinderen commented on DIRMINA-:


Thanks for both your efforts on this!

I still don't quite understand the fix, but am happy to accept it. {{Clear}}, 
unlike {{reset}} will prevent the {{InvalidMarkException}}.

I'm not confident that using {{slice}} instead if {{duplicate}} sufficiently 
prevents the buffer from being in an unpredictable state in 
{{clearWriteRequestQueue}}. Isn't it possible that after construction (slicing) 
of the WriteRequest and during the processing of that instance an exception 
occurs that causes it to be handled by {{clearWriteRequestQueue}}? We'd still 
be in potential trouble there.

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Jonathan Valliere (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844815#comment-16844815
 ] 

Jonathan Valliere commented on DIRMINA-:


Emmanuel is trying to figure out why {{reset()}} was in there at all.  Might 
end up just removing that call.  For the time being, this fixes the problem.
{code:java}
diff --git 
a/mina-core/src/main/java/org/apache/mina/core/polling/AbstractPollingIoProcessor.java
 
b/mina-core/src/main/java/org/apache/mina/core/polling/AbstractPollingIoProcessor.java
index 1553a99..9501074 100644
--- 
a/mina-core/src/main/java/org/apache/mina/core/polling/AbstractPollingIoProcessor.java
+++ 
b/mina-core/src/main/java/org/apache/mina/core/polling/AbstractPollingIoProcessor.java
@@ -1170,7 +1170,7 @@
 // The first unwritten empty buffer must be
 // forwarded to the filter chain.
 if (buf.hasRemaining()) {
-buf.reset();
+buf.clear();
 failedRequests.add(req);
 } else {
 IoFilterChain filterChain = session.getFilterChain();
diff --git 
a/mina-core/src/main/java/org/apache/mina/core/write/DefaultWriteRequest.java 
b/mina-core/src/main/java/org/apache/mina/core/write/DefaultWriteRequest.java
index 8324c6a..23604fa 100644
--- 
a/mina-core/src/main/java/org/apache/mina/core/write/DefaultWriteRequest.java
+++ 
b/mina-core/src/main/java/org/apache/mina/core/write/DefaultWriteRequest.java
@@ -226,7 +226,7 @@
 if (message instanceof IoBuffer) {
 // duplicate it, so that any modification made on it
 // won't change the original message
-this.message = ((IoBuffer)message).duplicate();
+this.message = ((IoBuffer)message).slice();
 }
 
 

{code}


> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844812#comment-16844812
 ] 

Guus der Kinderen commented on DIRMINA-:


I don't quite understand your suggestion. Could you illustrate it with a code 
snippet please?

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux

2019-05-21 Thread Jonathan Valliere (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844809#comment-16844809
 ] 

Jonathan Valliere commented on DIRMINA-1107:


Merged into 2.1.X 
[99fbf4fe64d72409b19f8835c0a568022916d840|https://gitbox.apache.org/repos/asf?p=mina.git;a=commit;h=99fbf4fe64d72409b19f8835c0a568022916d840]

> SslHandler flushScheduledEvents race condition, redux
> -
>
> Key: DIRMINA-1107
> URL: https://issues.apache.org/jira/browse/DIRMINA-1107
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Fix For: 2.1.3
>
>
> DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally 
> replaces it with another multithreading issue.
> The fix for DIRMINA-1019 introduces a counter that contains the number of 
> events to be processed. A simplified version of the code is included below.
> {code:java}
> private final AtomicInteger scheduledEvents = new AtomicInteger(0);
> void flushScheduledEvents() {
> scheduledEvents.incrementAndGet();
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (scheduledEvents.decrementAndGet() > 0);
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> We have observed occasions where the value of {{scheduledEvents}} becomes a 
> negative value, while at the same time {{filterWriteEventQueue}} go 
> unprocessed.
> We suspect that this issue is triggered by a concurrency issue caused by the 
> first thread decrementing the counter after a second thread incremented it, 
> but before it attempted to acquire the lock.
> This allows the the first thread to empty the queues, decrementing the 
> counter to zero and release the lock, after which the second thread acquires 
> the lock successfully. Now, the second thread processes any elements in 
> {{filterWriteEventQueue}}, and then processes any elements in 
> {{messageReceivedEventQueue}}. If in between these two checks yet another 
> thread adds a new element to {{filterWriteEventQueue}}, this element can go 
> unprocessed (as the second thread does not loop, since the counter is zero or 
> negative, and the third thread can fail to acquire the lock).
> It's a seemingly unlikely scenario, but we are observing the behavior when 
> our systems are under high load.
> We've applied a code change after which this problem is no longer observed. 
> We've removed the counter, and check on the size of the queues instead:
> {code:java}
> void flushScheduledEvents() {
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (!filterWriteEventQueue.isEmpty() || 
> !messageReceivedEventQueue.isEmpty());
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> This code change, as illustrated above, does introduce a new potential 
> problem. Theoretically, an event could be added to the queues and 
> {{flushScheduledEvents}} be called returning {{false}} for 
> {{sslLock.tryLock()}}, exactly after another thread just finished the 
> {{while}} loop, but before releasing the lock. This again would cause events 
> to go unprocessed.
> We've not observed this problem in the wild yet, but we're uncomfortable 
> applying this change as-is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Jonathan Valliere (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844795#comment-16844795
 ] 

Jonathan Valliere commented on DIRMINA-:


Okay, I checked it. {{DefaultWriteRequest}} should use {{slice()}} to duplicate 
the buffer.  That ensures that index 0 is the current position of the input 
buffer.  That way, we can call {{clear()}} and fully reset the message to the 
correct start and end offsets.

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844793#comment-16844793
 ] 

Guus der Kinderen commented on DIRMINA-:


{{rewind()}} looks like a simpler solution than resetting-if-marked, but 
there's a small difference in that the position would always revert to 0, 
instead of the last marked position. I'm not sure if that makes any difference 
to code that depends on this.

In any case, preventing the {{InvalidMarkException}} would be achieved by both, 
and that is what seems to prevent the CPU-churning. I'd be happy with either 
solution.

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Jonathan Valliere (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844786#comment-16844786
 ] 

Jonathan Valliere edited comment on DIRMINA- at 5/21/19 12:25 PM:
--

Sorry, {{buf.rewind()}}  Honestly, it's not necessary at all because the 
duplicated buffer is the one being modified.  The original one is left intact.  
I'm not sure why {{failedRequests}} is a thing because any failed writes should 
mark the {{IoSession}} as invalid and close everything.


was (Author: johnnyv):
Sorry, {{buf.rewind()  Honestly, it's not necessary at all because the 
duplicated buffer is the one being modified.  The original one is left intact.}}

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Jonathan Valliere (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844786#comment-16844786
 ] 

Jonathan Valliere edited comment on DIRMINA- at 5/21/19 12:24 PM:
--

Sorry, {{buf.rewind()  Honestly, it's not necessary at all because the 
duplicated buffer is the one being modified.  The original one is left intact.}}


was (Author: johnnyv):
Sorry, {{buf.rewind()}}

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Jonathan Valliere (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844786#comment-16844786
 ] 

Jonathan Valliere commented on DIRMINA-:


Sorry, {{buf.rewind()}}

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844783#comment-16844783
 ] 

Guus der Kinderen commented on DIRMINA-:


That would change the behavior - instead of rewinding the buffer to be re-read, 
it would effectively discard information.

I'm not sure why we're resetting in the first place here, but I'm guessing it 
serves a purpose (to later process the remainder of the buffer?) With a reset, 
the data will likely be lost (although it's not effectively wiped from the 
buffer, recovering it would be harder).

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Jonathan Valliere (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844777#comment-16844777
 ] 

Jonathan Valliere commented on DIRMINA-:


Replace {{buf.reset()}} with {{buf.clear()}}.

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844762#comment-16844762
 ] 

Guus der Kinderen edited comment on DIRMINA- at 5/21/19 12:05 PM:
--

Although we're not exactly sure how, we've "resolved" the issue by modifying 
the snippet below:
{code:java|title=Original code for 
org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
buf.reset();
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}
As said, this triggered {{InvalidMarkExceptions}}. After we added a simple 
{{try/catch}} around the reset, the CPU spikes go away. We have not 
investigated why this resolves the CPU spikes, but we assume that the thrown 
exception prevents keys from being consumed.

We're still suffering from a different issue (all clients unexpectedly 
reconnect periodically), but we're now working on the assumption that the CPU 
spike is a result, not the cause, of this.

Looking further into the {{clearWriteRequestQueue}} snippet, we noticed that 
this is often called from exception handlers. With that, the state of the 
buffer that it's operating on, is likely unpredictable. The call to {{reset()}} 
assumes that a mark is set, but there are various scenario's where this is 
untrue:
 * the buffer could have been completely unused (a buffer fresh from the 
constructor will cause {{InvalidMarkException}} here
 * the buffer might have been flipped, but not yet read.

We're uncertain if the reset is needed, but if it is, we suggest to explicitly 
check if the mark has been set. If not, then we don't believe a reset is needed.
{code:java|title=Proposed fix for 
org.apache.mina.core.polling.abstractpollingioprocessor.processor#clearwriterequestqueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
if (buf.markValue() > -1) {
buf.reset();
}
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}


was (Author: guus.der.kinderen):
Although we're not exactly sure how, we've "resolved" the issue by modifying 
the snippet below:
{code:java|title=Original code for 
org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
buf.reset();
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}
As said, this triggered {{InvalidMarkExceptions}}. After we added a simple 
{{try/catch}} around the reset, the CPU spikes go away. We have not 
investigated why this resolves the CPU spikes, but we assume that the thrown 
exception prevents keys from being consumed.

We're still suffering from a different issue (all clients unexpectedly 
reconnect periodically), but we're now working on the assumption that the CPU 
spike is a result, not the cause, of this.

Looking further into the {{clearWriteRequestQueue}} snippet, we noticed that 
this is often called from exception handlers. With that, the state of the 
buffer that it's operating on, is likely unpredictable. The call to {{reset()}} 
assumes that a mark is set, but there are various scenario's where this is 
untrue:
 * the buffer could have been completely unused (a buffer fresh from the 
constructor will cause {{InvalidMarkException}} here
 * the buffer might have been flipped, but not yet read.

We're uncertain if the reset is needed, but if it is, we suggest to explicitly 
check if the mark has been set. If not, then we don't believe a reset is needed.
{code:java|title=Proposed fix for 
org.apache.mina.core.polling.abstractpollingioprocessor.processor#clearwriterequestqueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
if (buf.markValue() >= -1) {
buf.reset();
}
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug

[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844762#comment-16844762
 ] 

Guus der Kinderen edited comment on DIRMINA- at 5/21/19 12:04 PM:
--

Although we're not exactly sure how, we've "resolved" the issue by modifying 
the snippet below:
{code:java|title=Original code for 
org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
buf.reset();
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}
As said, this triggered {{InvalidMarkExceptions}}. After we added a simple 
{{try/catch}} around the reset, the CPU spikes go away. We have not 
investigated why this resolves the CPU spikes, but we assume that the thrown 
exception prevents keys from being consumed.

We're still suffering from a different issue (all clients unexpectedly 
reconnect periodically), but we're now working on the assumption that the CPU 
spike is a result, not the cause, of this.

Looking further into the {{clearWriteRequestQueue}} snippet, we noticed that 
this is often called from exception handlers. With that, the state of the 
buffer that it's operating on, is likely unpredictable. The call to {{reset()}} 
assumes that a mark is set, but there are various scenario's where this is 
untrue:
 * the buffer could have been completely unused (a buffer fresh from the 
constructor will cause {{InvalidMarkException}} here
 * the buffer might have been flipped, but not yet read.

We're uncertain if the reset is needed, but if it is, we suggest to explicitly 
check if the mark has been set. If not, then we don't believe a reset is needed.
{code:java|title=Proposed fix for 
org.apache.mina.core.polling.abstractpollingioprocessor.processor#clearwriterequestqueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
if (buf.markValue() >= -1) {
buf.reset();
}
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}


was (Author: guus.der.kinderen):
Although we're not exactly sure how, we've "resolved" the issue by modifying 
the snippet below:
{code:java|title=Original code for 
org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
buf.reset();
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}
As said, this triggered {{InvalidMarkExceptions}}. After we added a simple 
{{try/catch}} around the reset, the CPU spikes go away. We have not 
investigated why this resolves the CPU spikes, but we assume that the thrown 
exception prevents keys from being consumed.

We're still suffering from a different issue (all clients unexpectedly 
reconnect periodically), but we're now working on the assumption that the CPU 
spike is a result, not the cause, of this.

Looking further into the {{clearWriteRequestQueue}} snippet, we noticed that 
this is often called from exception handlers. With that, the state of the 
buffer that it's operating on, is likely unpredictable. The call to {{reset()}} 
assumes that a mark is set, but there are various scenario's where this is 
untrue:
 * the buffer could have been completely unused (a buffer fresh from the 
constructor will cause {{InvalidMarkException}} here
 * the buffer might have been flipped, but not yet read.

We're uncertain if the reset is needed, but if it is, we suggest to explicitly 
check if the mark has been set. If not, then we don't believe a reset is needed.
{code:java|title=proposed fix for 
org.apache.mina.core.polling.abstractpollingioprocessor.processor#clearwriterequestqueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
if (buf.markValue() >= -1) {
buf.reset();
}
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug

[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844762#comment-16844762
 ] 

Guus der Kinderen edited comment on DIRMINA- at 5/21/19 12:03 PM:
--

Although we're not exactly sure how, we've "resolved" the issue by modifying 
the snippet below:
{code:java|title=Original code for 
org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
buf.reset();
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}
As said, this triggered {{InvalidMarkExceptions}}. After we added a simple 
{{try/catch}} around the reset, the CPU spikes go away. We have not 
investigated why this resolves the CPU spikes, but we assume that the thrown 
exception prevents keys from being consumed.

We're still suffering from a different issue (all clients unexpectedly 
reconnect periodically), but we're now working on the assumption that the CPU 
spike is a result, not the cause, of this.

Looking further into the {{clearWriteRequestQueue}} snippet, we noticed that 
this is often called from exception handlers. With that, the state of the 
buffer that it's operating on, is likely unpredictable. The call to {{reset()}} 
assumes that a mark is set, but there are various scenario's where this is 
untrue:
 * the buffer could have been completely unused (a buffer fresh from the 
constructor will cause {{InvalidMarkException}} here
 * the buffer might have been flipped, but not yet read.

We're uncertain if the reset is needed, but if it is, we suggest to explicitly 
check if the mark has been set. If not, then we don't believe a reset is needed.
{code:java|title=proposed fix for 
org.apache.mina.core.polling.abstractpollingioprocessor.processor#clearwriterequestqueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
if (buf.markValue() >= -1) {
buf.reset();
}
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}


was (Author: guus.der.kinderen):
Although we're not exactly sure how, we've "resolved" the issue by modifying 
the snippet below:
{code:java|title=Original code for 
org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
buf.reset();
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}
As said, this triggered {{InvalidMarkExceptions}}. After we added a simple 
{{try/catch}} around the reset, the CPU spikes go away. We're still suffering 
from a different issue (all clients reconnecting periodically), but we're now 
working on the assumption that the CPU spike is a result, not the cause of this.

Looking further into the {{clearWriteRequestQueue}} snippet, we noticed that 
this is often called from exception handlers. With that, the state of the 
buffer that it's operating on, is likely unpredictable. The call to {{reset()}} 
assumes that a mark is set, but there are various scenario's where this is 
untrue:
 * the buffer could have been completely unused (a buffer fresh from the 
constructor will cause {{InvalidMarkException}} here
 * the buffer might have been flipped, but not yet read.

We're uncertain if the reset is needed, but if it is, we suggest to explicitly 
check if the mark has been set. If not, then we don't believe a reset is needed.
{code:title=proposed fix for 
org.apache.mina.core.polling.abstractpollingioprocessor.processor#clearwriterequestqueue|java}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
if (buf.markValue() >= -1) {
buf.reset();
}
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
>

[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844762#comment-16844762
 ] 

Guus der Kinderen commented on DIRMINA-:


Although we're not exactly sure how, we've "resolved" the issue by modifying 
the snippet below:
{code:java|title=Original code for 
org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
buf.reset();
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}
As said, this triggered {{InvalidMarkExceptions}}. After we added a simple 
{{try/catch}} around the reset, the CPU spikes go away. We're still suffering 
from a different issue (all clients reconnecting periodically), but we're now 
working on the assumption that the CPU spike is a result, not the cause of this.

Looking further into the {{clearWriteRequestQueue}} snippet, we noticed that 
this is often called from exception handlers. With that, the state of the 
buffer that it's operating on, is likely unpredictable. The call to {{reset()}} 
assumes that a mark is set, but there are various scenario's where this is 
untrue:
 * the buffer could have been completely unused (a buffer fresh from the 
constructor will cause {{InvalidMarkException}} here
 * the buffer might have been flipped, but not yet read.

We're uncertain if the reset is needed, but if it is, we suggest to explicitly 
check if the mark has been set. If not, then we don't believe a reset is needed.
{code:title=proposed fix for 
org.apache.mina.core.polling.abstractpollingioprocessor.processor#clearwriterequestqueue|java}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
if (buf.markValue() >= -1) {
buf.reset();
}
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844687#comment-16844687
 ] 

Guus der Kinderen edited comment on DIRMINA- at 5/21/19 10:01 AM:
--

The CPU cycles that get burned are reported to be "system", not "user" cycles.

We've obtained a Call Tree graph from JProfiler, from which I included a 
screenshot (the entire thing is 18MB). Just under the section from which I 
created this screenshot, there's another 10.2% native CPU usage for 
{{java.lang.System.currentTimeMillis}} that's called from within  
{{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}}

>From what I can deduce, this looks like it iterates over this snippet from a 
>lot. We don't see {{"Broken connection"}} or {{"Create a new selector. 
>Selected is 0, delta ="}} being logged, which suggests that the workaround for 
>the EPOLL / spinning selector bug is not even being considered.

{code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java}
/*(...snip..)*/
// This select has a timeout so that we can manage
// idle session when we get out of the select every
// second. (note : this is a hack to avoid creating
// a dedicated thread).
long t0 = System.currentTimeMillis();
int selected = select(SELECT_TIMEOUT);
long t1 = System.currentTimeMillis();
long delta = t1 - t0;

if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) {
// Last chance : the select() may have been
// interrupted because we have had an closed channel.
if (isBrokenConnection()) {
LOG.warn("Broken connection");
} else {
// Ok, we are hit by the nasty epoll
// spinning.
// Basically, there is a race condition
// which causes a closing file descriptor not to be
// considered as available as a selected channel,
// but
// it stopped the select. The next time we will
// call select(), it will exit immediately for the
// same
// reason, and do so forever, consuming 100%
// CPU.
// We have to destroy the selector, and
// register all the socket on a new one.
if (nbTries == 0) {
LOG.warn("Create a new selector. Selected is 0, delta = " + delta);
registerNewSelector();
nbTries = 10;
} else {
nbTries--;
}
}
} else {
nbTries = 10;
}
/*(...snip..)*/
{code}

!image-2019-05-21-11-37-41-931.png!


was (Author: guus.der.kinderen):
The CPU cycles that get burned are reported to be "system", not "user" cycles.

We've obtained a Call Tree graph from JProfiler, from which I included a 
screenshot (the entire thing is 18MB). Just under the section from which I 
created this screenshot, there's another 10.2% native CPU usage for 
{{java.lang.System.currentTimeMillis}} that's called from within  
{{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}}

>From what I can deduce, this looks like it iterates over this snippet from a 
>lot. We don't see {{Broken connection}} or "{{Create a new selector. Selected 
>is 0, delta =}} being logged, which suggests that the workaround for the EPOLL 
>/ spinning selector bug is not even being considered.

{code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java}
/*(...snip..)*/
// This select has a timeout so that we can manage
// idle session when we get out of the select every
// second. (note : this is a hack to avoid creating
// a dedicated thread).
long t0 = System.currentTimeMillis();
int selected = select(SELECT_TIMEOUT);
long t1 = System.currentTimeMillis();
long delta = t1 - t0;

if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) {
// Last chance : the select() may have been
// interrupted because we have had an closed channel.
if (isBrokenConnection()) {
LOG.warn("Broken connection");
} else {
// Ok, we are hit by the nasty epoll
// spinning.
// Basically, there is a race condition
// which causes a closing file descriptor not to be
// considered as available as a selected channel,
// but
// it stopped the select. The next time we will
// call select(), it will exit immediately for the
// same
// reason, and do so forever, consuming 100%
// CPU.
// We have to destroy the selector, and
// register all the socket on a new one.
if (nbTries == 0) {
LOG.warn("Create a new selector. Selected is 0, delta = " + delta);
registerNewSelector();
nbTries = 10;
} else {
nbTries--;
}
}
} else {
nbTries = 10;
}
/*(...snip..)*/
{code}

!image-2019-05-21-11-37-41-931.png!

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -

[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844690#comment-16844690
 ] 

Guus der Kinderen commented on DIRMINA-:


{quote}Do you have a link to the OpenFire issue ?{quote}

No, this is being reported in a non-public project that uses Openfire.

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844687#comment-16844687
 ] 

Guus der Kinderen commented on DIRMINA-:


The CPU cycles that get burned are reported to be "system", not "user" cycles.

We've obtained a Call Tree graph from JProfiler, from which I included a 
screenshot (the entire thing is 18MB). Just under the section from which I 
created this screenshot, there's another 10.2% native CPU usage for 
{{java.lang.System.currentTimeMillis}} that's called from within  
{{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}}

>From what I can deduce, this looks like it iterates over this snippet from a 
>lot. We don't see {{Broken connection}} or "{{Create a new selector. Selected 
>is 0, delta =}} being logged, which suggests that the workaround for the EPOLL 
>/ spinning selector bug is not even being considered.

{code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java}
/*(...snip..)*/
// This select has a timeout so that we can manage
// idle session when we get out of the select every
// second. (note : this is a hack to avoid creating
// a dedicated thread).
long t0 = System.currentTimeMillis();
int selected = select(SELECT_TIMEOUT);
long t1 = System.currentTimeMillis();
long delta = t1 - t0;

if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) {
// Last chance : the select() may have been
// interrupted because we have had an closed channel.
if (isBrokenConnection()) {
LOG.warn("Broken connection");
} else {
// Ok, we are hit by the nasty epoll
// spinning.
// Basically, there is a race condition
// which causes a closing file descriptor not to be
// considered as available as a selected channel,
// but
// it stopped the select. The next time we will
// call select(), it will exit immediately for the
// same
// reason, and do so forever, consuming 100%
// CPU.
// We have to destroy the selector, and
// register all the socket on a new one.
if (nbTries == 0) {
LOG.warn("Create a new selector. Selected is 0, delta = " + delta);
registerNewSelector();
nbTries = 10;
} else {
nbTries--;
}
}
} else {
nbTries = 10;
}
/*(...snip..)*/
{code}

!image-2019-05-21-11-37-41-931.png!

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844687#comment-16844687
 ] 

Guus der Kinderen edited comment on DIRMINA- at 5/21/19 9:58 AM:
-

The CPU cycles that get burned are reported to be "system", not "user" cycles.

We've obtained a Call Tree graph from JProfiler, from which I included a 
screenshot (the entire thing is 18MB). Just under the section from which I 
created this screenshot, there's another 10.2% native CPU usage for 
{{java.lang.System.currentTimeMillis}} that's called from within  
{{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}}

>From what I can deduce, this looks like it iterates over this snippet from a 
>lot. We don't see {{Broken connection}} or "{{Create a new selector. Selected 
>is 0, delta =}} being logged, which suggests that the workaround for the EPOLL 
>/ spinning selector bug is not even being considered.

{code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java}
/*(...snip..)*/
// This select has a timeout so that we can manage
// idle session when we get out of the select every
// second. (note : this is a hack to avoid creating
// a dedicated thread).
long t0 = System.currentTimeMillis();
int selected = select(SELECT_TIMEOUT);
long t1 = System.currentTimeMillis();
long delta = t1 - t0;

if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) {
// Last chance : the select() may have been
// interrupted because we have had an closed channel.
if (isBrokenConnection()) {
LOG.warn("Broken connection");
} else {
// Ok, we are hit by the nasty epoll
// spinning.
// Basically, there is a race condition
// which causes a closing file descriptor not to be
// considered as available as a selected channel,
// but
// it stopped the select. The next time we will
// call select(), it will exit immediately for the
// same
// reason, and do so forever, consuming 100%
// CPU.
// We have to destroy the selector, and
// register all the socket on a new one.
if (nbTries == 0) {
LOG.warn("Create a new selector. Selected is 0, delta = " + delta);
registerNewSelector();
nbTries = 10;
} else {
nbTries--;
}
}
} else {
nbTries = 10;
}
/*(...snip..)*/
{code}

!image-2019-05-21-11-37-41-931.png!


was (Author: guus.der.kinderen):
The CPU cycles that get burned are reported to be "system", not "user" cycles.

We've obtained a Call Tree graph from JProfiler, from which I included a 
screenshot (the entire thing is 18MB). Just under the section from which I 
created this screenshot, there's another 10.2% native CPU usage for 
{{java.lang.System.currentTimeMillis}} that's called from within  
{{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}}

>From what I can deduce, this looks like it iterates over this snippet from a 
>lot. We don't see {{Broken connection}} or "{{Create a new selector. Selected 
>is 0, delta =}} being logged, which suggests that the workaround for the EPOLL 
>/ spinning selector bug is not even being considered.

{code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java}
/*(...snip..)*/
// This select has a timeout so that we can manage
// idle session when we get out of the select every
// second. (note : this is a hack to avoid creating
// a dedicated thread).
long t0 = System.currentTimeMillis();
int selected = select(SELECT_TIMEOUT);
long t1 = System.currentTimeMillis();
long delta = t1 - t0;

if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) {
// Last chance : the select() may have been
// interrupted because we have had an closed channel.
if (isBrokenConnection()) {
LOG.warn("Broken connection");
} else {
// Ok, we are hit by the nasty epoll
// spinning.
// Basically, there is a race condition
// which causes a closing file descriptor not to be
// considered as available as a selected channel,
// but
// it stopped the select. The next time we will
// call select(), it will exit immediately for the
// same
// reason, and do so forever, consuming 100%
// CPU.
// We have to destroy the selector, and
// register all the socket on a new one.
if (nbTries == 0) {
LOG.warn("Create a new selector. Selected is 0, delta = " + delta);
registerNewSelector();
nbTries = 10;
} else {
nbTries--;
}
}
} else {
nbTries = 10;
}
/*(...snip..)*/
{code}

!image-2019-05-21-11-37-41-931.png|thumbnail!

> 100% CPU (epoll bug) on 2.1.x, Linux only
>

[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844687#comment-16844687
 ] 

Guus der Kinderen edited comment on DIRMINA- at 5/21/19 9:58 AM:
-

The CPU cycles that get burned are reported to be "system", not "user" cycles.

We've obtained a Call Tree graph from JProfiler, from which I included a 
screenshot (the entire thing is 18MB). Just under the section from which I 
created this screenshot, there's another 10.2% native CPU usage for 
{{java.lang.System.currentTimeMillis}} that's called from within  
{{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}}

>From what I can deduce, this looks like it iterates over this snippet from a 
>lot. We don't see {{Broken connection}} or "{{Create a new selector. Selected 
>is 0, delta =}} being logged, which suggests that the workaround for the EPOLL 
>/ spinning selector bug is not even being considered.

{code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java}
/*(...snip..)*/
// This select has a timeout so that we can manage
// idle session when we get out of the select every
// second. (note : this is a hack to avoid creating
// a dedicated thread).
long t0 = System.currentTimeMillis();
int selected = select(SELECT_TIMEOUT);
long t1 = System.currentTimeMillis();
long delta = t1 - t0;

if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) {
// Last chance : the select() may have been
// interrupted because we have had an closed channel.
if (isBrokenConnection()) {
LOG.warn("Broken connection");
} else {
// Ok, we are hit by the nasty epoll
// spinning.
// Basically, there is a race condition
// which causes a closing file descriptor not to be
// considered as available as a selected channel,
// but
// it stopped the select. The next time we will
// call select(), it will exit immediately for the
// same
// reason, and do so forever, consuming 100%
// CPU.
// We have to destroy the selector, and
// register all the socket on a new one.
if (nbTries == 0) {
LOG.warn("Create a new selector. Selected is 0, delta = " + delta);
registerNewSelector();
nbTries = 10;
} else {
nbTries--;
}
}
} else {
nbTries = 10;
}
/*(...snip..)*/
{code}

!image-2019-05-21-11-37-41-931.png|thumbnail!


was (Author: guus.der.kinderen):
The CPU cycles that get burned are reported to be "system", not "user" cycles.

We've obtained a Call Tree graph from JProfiler, from which I included a 
screenshot (the entire thing is 18MB). Just under the section from which I 
created this screenshot, there's another 10.2% native CPU usage for 
{{java.lang.System.currentTimeMillis}} that's called from within  
{{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}}

>From what I can deduce, this looks like it iterates over this snippet from a 
>lot. We don't see {{Broken connection}} or "{{Create a new selector. Selected 
>is 0, delta =}} being logged, which suggests that the workaround for the EPOLL 
>/ spinning selector bug is not even being considered.

{code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java}
/*(...snip..)*/
// This select has a timeout so that we can manage
// idle session when we get out of the select every
// second. (note : this is a hack to avoid creating
// a dedicated thread).
long t0 = System.currentTimeMillis();
int selected = select(SELECT_TIMEOUT);
long t1 = System.currentTimeMillis();
long delta = t1 - t0;

if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) {
// Last chance : the select() may have been
// interrupted because we have had an closed channel.
if (isBrokenConnection()) {
LOG.warn("Broken connection");
} else {
// Ok, we are hit by the nasty epoll
// spinning.
// Basically, there is a race condition
// which causes a closing file descriptor not to be
// considered as available as a selected channel,
// but
// it stopped the select. The next time we will
// call select(), it will exit immediately for the
// same
// reason, and do so forever, consuming 100%
// CPU.
// We have to destroy the selector, and
// register all the socket on a new one.
if (nbTries == 0) {
LOG.warn("Create a new selector. Selected is 0, delta = " + delta);
registerNewSelector();
nbTries = 10;
} else {
nbTries--;
}
}
} else {
nbTries = 10;
}
/*(...snip..)*/
{code}

!image-2019-05-21-11-37-41-931.png!

> 100% CPU (epoll bug) on 2.1.x, Linux only
>

[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Emmanuel Lecharny (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844683#comment-16844683
 ] 

Emmanuel Lecharny commented on DIRMINA-:


Do you have a link to the OpenFire issue ?

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)



 [ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guus der Kinderen updated DIRMINA-:
---
Attachment: image-2019-05-21-11-37-41-931.png

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844639#comment-16844639
 ] 

Guus der Kinderen edited comment on DIRMINA- at 5/21/19 8:50 AM:
-

The report claims that the CPU increase is combined with InvalidMarkException 
instances being thrown by MINA 2.1.2's implementation of 
{{org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}}

Specifically the {{InvalidMarkException}} is thrown when {{reset}} is called in 
this snippet:
{code:java}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
buf.reset();
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}


was (Author: guus.der.kinderen):
The report claims that the CPU increase is combined with InvalidMarkException 
instances being thrown by MINA 2.1.2's implementation of 
{{org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}}

Specifically, when {{reset}} is called in this snippet:
{code:java}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
buf.reset();
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844639#comment-16844639
 ] 

Guus der Kinderen edited comment on DIRMINA- at 5/21/19 8:49 AM:
-

The report claims that the CPU increase is combined with InvalidMarkException 
instances being thrown by MINA 2.1.2's implementation of 
{{org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}}

Specifically, when {{reset}} is called in this snippet:
{code:java}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
buf.reset();
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}


was (Author: guus.der.kinderen):
The report claims that the CPU increase is combined with InvalidMarkException 
instances being thrown by MINA 2.1.2's implementation of 
{{org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}}

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844639#comment-16844639
 ] 

Guus der Kinderen commented on DIRMINA-:


The report claims that the CPU increase is combined with InvalidMarkException 
instances being thrown by MINA 2.1.2's implementation of 
{{org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}}

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844614#comment-16844614
 ] 

Guus der Kinderen commented on DIRMINA-:


For what it's worth, I've received another report (unrelated to the first one) 
that appears to be related to this issue. This new environment recently 
upgraded to MINA 2.1.2 too. I'm working on obtaining the profiling information.

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux

2019-05-21 Thread Emmanuel Lecharny (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844611#comment-16844611
 ] 

Emmanuel Lecharny commented on DIRMINA-1107:


Great !

Jonathan, I let you merge the branch. We probably want to backport the change 
to the 2.0.X branch too.

Let me know and I'll kick a release when done.

Thanks a lot  !

> SslHandler flushScheduledEvents race condition, redux
> -
>
> Key: DIRMINA-1107
> URL: https://issues.apache.org/jira/browse/DIRMINA-1107
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Fix For: 2.1.3
>
>
> DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally 
> replaces it with another multithreading issue.
> The fix for DIRMINA-1019 introduces a counter that contains the number of 
> events to be processed. A simplified version of the code is included below.
> {code:java}
> private final AtomicInteger scheduledEvents = new AtomicInteger(0);
> void flushScheduledEvents() {
> scheduledEvents.incrementAndGet();
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (scheduledEvents.decrementAndGet() > 0);
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> We have observed occasions where the value of {{scheduledEvents}} becomes a 
> negative value, while at the same time {{filterWriteEventQueue}} go 
> unprocessed.
> We suspect that this issue is triggered by a concurrency issue caused by the 
> first thread decrementing the counter after a second thread incremented it, 
> but before it attempted to acquire the lock.
> This allows the the first thread to empty the queues, decrementing the 
> counter to zero and release the lock, after which the second thread acquires 
> the lock successfully. Now, the second thread processes any elements in 
> {{filterWriteEventQueue}}, and then processes any elements in 
> {{messageReceivedEventQueue}}. If in between these two checks yet another 
> thread adds a new element to {{filterWriteEventQueue}}, this element can go 
> unprocessed (as the second thread does not loop, since the counter is zero or 
> negative, and the third thread can fail to acquire the lock).
> It's a seemingly unlikely scenario, but we are observing the behavior when 
> our systems are under high load.
> We've applied a code change after which this problem is no longer observed. 
> We've removed the counter, and check on the size of the queues instead:
> {code:java}
> void flushScheduledEvents() {
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (!filterWriteEventQueue.isEmpty() || 
> !messageReceivedEventQueue.isEmpty());
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> This code change, as illustrated above, does introduce a new potential 
> problem. Theoretically, an event could be added to the queues and 
> {{flushScheduledEvents}} be called returning {{false}} for 
> {{sslLock.tryLock()}}, exactly after another thread just finished the 
> {{while}} loop, but before releasing the lock. This again would cause events 
> to go unprocessed.
> We've not observed this problem in the wild yet, but we're uncomfortable 
> applying this change as-is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux

2019-05-21 Thread Guus der Kinderen (JIRA)



[ 
https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844606#comment-16844606
 ] 

Guus der Kinderen commented on DIRMINA-1107:


We've replaced our initial fix with what Jonathan provided. His fix appears to 
prevent the original issue from occurring. We're happy to close this issue with 
that fix! Thanks!

> SslHandler flushScheduledEvents race condition, redux
> -
>
> Key: DIRMINA-1107
> URL: https://issues.apache.org/jira/browse/DIRMINA-1107
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Fix For: 2.1.3
>
>
> DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally 
> replaces it with another multithreading issue.
> The fix for DIRMINA-1019 introduces a counter that contains the number of 
> events to be processed. A simplified version of the code is included below.
> {code:java}
> private final AtomicInteger scheduledEvents = new AtomicInteger(0);
> void flushScheduledEvents() {
> scheduledEvents.incrementAndGet();
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (scheduledEvents.decrementAndGet() > 0);
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> We have observed occasions where the value of {{scheduledEvents}} becomes a 
> negative value, while at the same time {{filterWriteEventQueue}} go 
> unprocessed.
> We suspect that this issue is triggered by a concurrency issue caused by the 
> first thread decrementing the counter after a second thread incremented it, 
> but before it attempted to acquire the lock.
> This allows the the first thread to empty the queues, decrementing the 
> counter to zero and release the lock, after which the second thread acquires 
> the lock successfully. Now, the second thread processes any elements in 
> {{filterWriteEventQueue}}, and then processes any elements in 
> {{messageReceivedEventQueue}}. If in between these two checks yet another 
> thread adds a new element to {{filterWriteEventQueue}}, this element can go 
> unprocessed (as the second thread does not loop, since the counter is zero or 
> negative, and the third thread can fail to acquire the lock).
> It's a seemingly unlikely scenario, but we are observing the behavior when 
> our systems are under high load.
> We've applied a code change after which this problem is no longer observed. 
> We've removed the counter, and check on the size of the queues instead:
> {code:java}
> void flushScheduledEvents() {
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (!filterWriteEventQueue.isEmpty() || 
> !messageReceivedEventQueue.isEmpty());
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> This code change, as illustrated above, does introduce a new potential 
> problem. Theoretically, an event could be added to the queues and 
> {{flushScheduledEvents}} be called returning {{false}} for 
> {{sslLock.tryLock()}}, exactly after another thread just finished the 
> {{while}} loop, but before releasing the lock. This again would cause events 
> to go unprocessed.
> We've not observed this problem in the wild yet, but we're uncomfortable 
> applying this change as-is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

43 matches

Mail list logo