[ https://issues.apache.org/jira/browse/DIRMINA-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844687#comment-16844687 ]
Guus der Kinderen edited comment on DIRMINA-1111 at 5/21/19 9:58 AM: --------------------------------------------------------------------- The CPU cycles that get burned are reported to be "system", not "user" cycles. We've obtained a Call Tree graph from JProfiler, from which I included a screenshot (the entire thing is 18MB). Just under the section from which I created this screenshot, there's another 10.2% native CPU usage for {{java.lang.System.currentTimeMillis}} that's called from within {{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}} >From what I can deduce, this looks like it iterates over this snippet from a >lot. We don't see {{Broken connection}} or "{{Create a new selector. Selected >is 0, delta =}} being logged, which suggests that the workaround for the EPOLL >/ spinning selector bug is not even being considered. {code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java} /*(...snip..)*/ // This select has a timeout so that we can manage // idle session when we get out of the select every // second. (note : this is a hack to avoid creating // a dedicated thread). long t0 = System.currentTimeMillis(); int selected = select(SELECT_TIMEOUT); long t1 = System.currentTimeMillis(); long delta = t1 - t0; if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) { // Last chance : the select() may have been // interrupted because we have had an closed channel. if (isBrokenConnection()) { LOG.warn("Broken connection"); } else { // Ok, we are hit by the nasty epoll // spinning. // Basically, there is a race condition // which causes a closing file descriptor not to be // considered as available as a selected channel, // but // it stopped the select. The next time we will // call select(), it will exit immediately for the // same // reason, and do so forever, consuming 100% // CPU. // We have to destroy the selector, and // register all the socket on a new one. if (nbTries == 0) { LOG.warn("Create a new selector. Selected is 0, delta = " + delta); registerNewSelector(); nbTries = 10; } else { nbTries--; } } } else { nbTries = 10; } /*(...snip..)*/ {code} !image-2019-05-21-11-37-41-931.png|thumbnail! was (Author: guus.der.kinderen): The CPU cycles that get burned are reported to be "system", not "user" cycles. We've obtained a Call Tree graph from JProfiler, from which I included a screenshot (the entire thing is 18MB). Just under the section from which I created this screenshot, there's another 10.2% native CPU usage for {{java.lang.System.currentTimeMillis}} that's called from within {{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}} >From what I can deduce, this looks like it iterates over this snippet from a >lot. We don't see {{Broken connection}} or "{{Create a new selector. Selected >is 0, delta =}} being logged, which suggests that the workaround for the EPOLL >/ spinning selector bug is not even being considered. {code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java} /*(...snip..)*/ // This select has a timeout so that we can manage // idle session when we get out of the select every // second. (note : this is a hack to avoid creating // a dedicated thread). long t0 = System.currentTimeMillis(); int selected = select(SELECT_TIMEOUT); long t1 = System.currentTimeMillis(); long delta = t1 - t0; if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) { // Last chance : the select() may have been // interrupted because we have had an closed channel. if (isBrokenConnection()) { LOG.warn("Broken connection"); } else { // Ok, we are hit by the nasty epoll // spinning. // Basically, there is a race condition // which causes a closing file descriptor not to be // considered as available as a selected channel, // but // it stopped the select. The next time we will // call select(), it will exit immediately for the // same // reason, and do so forever, consuming 100% // CPU. // We have to destroy the selector, and // register all the socket on a new one. if (nbTries == 0) { LOG.warn("Create a new selector. Selected is 0, delta = " + delta); registerNewSelector(); nbTries = 10; } else { nbTries--; } } } else { nbTries = 10; } /*(...snip..)*/ {code} !image-2019-05-21-11-37-41-931.png! > 100% CPU (epoll bug) on 2.1.x, Linux only > ----------------------------------------- > > Key: DIRMINA-1111 > URL: https://issues.apache.org/jira/browse/DIRMINA-1111 > Project: MINA > Issue Type: Bug > Affects Versions: 2.1.2 > Reporter: Guus der Kinderen > Priority: Major > Attachments: image-2019-05-21-11-37-41-931.png > > > We're getting reports > [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13] > of a bug that causes 100% CPU usage on Linux (the problem does not occur on > Windows). > This problem occurs in 2.1.2. but does _not_ occur in 2.0.21. > Is this a regression of the epoll selector bug in DIRMINA-678 ? > A stack trace of one of the spinning threads: > {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x00007f0408002000 nid=0x2a6a > runnable [0x00007f0476dee000] > java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x00000004c486b748> (a sun.nio.ch.Util$3) > - locked <0x00000004c486b738> (a java.util.Collections$UnmodifiableSet) > - locked <0x00000004c420ccb0> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616) > at > org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Locked ownable synchronizers: > - <0x00000004c4d03530> (a > java.util.concurrent.ThreadPoolExecutor$Worker){code} > More info is available at > https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13 -- This message was sent by Atlassian JIRA (v7.6.3#76005)