[ https://issues.apache.org/jira/browse/AMQ-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849248#comment-17849248 ]
Tom Tichy commented on AMQ-9482: -------------------------------- Hi, I did add the `soTimeout` and `soWriteTimeout`, but alas no joy. I have a better thread dump with about 11.5k identical threads that are in the WAITING state Fastthread.io analysis says {quote} h1. 11440 threads with same stack trace !https://fastthread.io/assets/globally-shared/images/icon-error.svg! 11440 threads are WAITING on *_park()_* method in *_jdk.internal.misc.Unsafe_* file and they all have same stack trace. If multiple threads exhibit same stack trace, you might want to examine their stack trace. (Note: If your application is unresponsive or poorly responding, it might be caused because these threads). {panel} h2. {color:#cc3300}ActiveMQ BrokerService[localhost] Task-68826{color} PRIORITY : 5 THREAD ID : 0X00007F1E3010FB90 NATIVE ID : 0XE3ABE NATIVE ID (DECIMAL) : 932542 STATE : TIMED_WAITING stackTrace: java.lang.Thread.State: TIMED_WAITING (parking) at jdk.internal.misc.Unsafe.park({color:#000080}java.base@{*}17.0.9/Native Method{*}{color}{*}){*} - parking to wait for *<0x0000000700232d10>* (a java.util.concurrent.SynchronousQueue$TransferStack) at java.util.concurrent.locks.LockSupport.parkNanos({color:#000080}java.base@{*}17.0.9/LockSupport.java:252{*}{color}{*}){*} at java.util.concurrent.SynchronousQueue$TransferStack.transfer({color:#000080}java.base@{*}17.0.9/SynchronousQueue.java:401{*}{color}{*}){*} at java.util.concurrent.SynchronousQueue.poll({color:#000080}java.base@{*}17.0.9/SynchronousQueue.java:903{*}{color}{*}){*} at java.util.concurrent.ThreadPoolExecutor.getTask({color:#000080}java.base@{*}17.0.9/ThreadPoolExecutor.java:1061{*}{color}{*}){*} at java.util.concurrent.ThreadPoolExecutor.runWorker({color:#000080}java.base@{*}17.0.9/ThreadPoolExecutor.java:1122{*}{color}{*}){*} at java.util.concurrent.ThreadPoolExecutor$Worker.run({color:#000080}java.base@{*}17.0.9/ThreadPoolExecutor.java:635{*}{color}{*}){*} at java.lang.Thread.run({color:#000080}java.base@{*}17.0.9/Thread.java:840{*}{color}{*}){*} Locked ownable synchronizers: - None{panel} {quote} Here is another piece of information that may be relevant. The devices that connect to our brokers do something silly and open a new MQTT connection everytime they want to send something. The MQTT spec allows this and ActiveMQ dutifully logs {code:java} WARN | Stealing link for clientId xxxx{code} [^thread_dump.txt] > Broker crashes after runaway threads spawn > ------------------------------------------ > > Key: AMQ-9482 > URL: https://issues.apache.org/jira/browse/AMQ-9482 > Project: ActiveMQ Classic > Issue Type: Bug > Components: Broker > Affects Versions: 5.17.6, 6.0.1 > Environment: Bitnami created AMI in AWS > Reporter: Tom Tichy > Priority: Major > Attachments: activemq.tdump, brokerInfo-after-crash-redacted.json, > thread_dump.txt > > > Running on Bitnami created AMI in AWS. The broker has about 7000 devices > connected via MQTT. Each devices has its own topic name. > Broker stays up for about 4-5 days before being hobbled and unable to create > any new tasks/accept any new connections. > (There is identical setup for staging environment with about 100 devices > connected. It runs without any issues.) > I have troubleshot the cause to be the systemd task limit. The current > `TasksMax` is 18100. When running normally, the number of tasks is about 300. > Then (every 4-5 days) there is a quick spike to the max 18100 tasks and it > stays there never coming back down. The result is that the broker just sits > there, does nothing useful and keeps logging the following message > > {code:java} > [659914.788s][warning][os,thread] Failed to start thread "Unknown thread" - > pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, g > uardsize: 0k, detached. > [659914.788s][warning][os,thread] Failed to start the native thread for > java.lang.Thread "ActiveMQ BrokerService[localhost] Task-281805" > ERROR | Scheduled task error > java.lang.OutOfMemoryError: unable to create native thread: possibly out of > memory or process/resource limits reached > at java.lang.Thread.start0(Native Method) ~[?:?] > at java.lang.Thread.start(Thread.java:809) ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:945) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364) > ~[?:?] > at > org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:173) > ~[activemq-client-6.0.1.jar:6.0.1] > at > org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:165) > ~[activemq-client-6.0.1.jar:6.0.1] > at org.apache.activemq.broker.region.Topic$7.run(Topic.java:820) > ~[activemq-broker-6.0.1.jar:6.0.1] > at > org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:39) > ~[activemq-client-6.0.1.jar:6.0.1] > at java.util.TimerThread.mainLoop(Timer.java:566) ~[?:?] > at java.util.TimerThread.run(Timer.java:516) ~[?:?] > Exception in thread "ActiveMQ Broker[localhost] Scheduler" > java.lang.OutOfMemoryError: unable to create native thread: possibly out of > memory or process/resource limits reached > at java.base/java.lang.Thread.start0(Native Method) > at java.base/java.lang.Thread.start(Thread.java:809) > at > java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:945) > at > java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364) > at > org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:173) > at > org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:165) > at org.apache.activemq.broker.region.Topic$7.run(Topic.java:820) > at > org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:39) > at java.base/java.util.TimerThread.mainLoop(Timer.java:566) > at java.base/java.util.TimerThread.run(Timer.java:516) > {code} > > The start command is > {code:java} > /opt/bitnami/java/bin/java -Xms2G -Xmx4G > -Djava.util.logging.config.file=logging.properties > -Djava.security.auth.login.config=/opt/bitnami/activemq/conf/login.config > -Dorg.apache.activemq.UseDedicatedTaskRunner=false > -Dcom.sun.management.jmxremote -Djava.awt.headless=true > -Djava.io.tmpdir=/opt/bitnami/activemq/tmp --add-reads=java.xml=java.logging > --add-opens java.base/java.security=ALL-UNNAMED --add-opens > java.base/java.net=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED > --add-opens java.base/java.util=ALL-UNNAMED --add-opens > java.naming/javax.naming.spi=ALL-UNNAMED --add-opens > java.rmi/sun.rmi.transport.tcp=ALL-UNNAMED --add-opens > java.base/java.util.concurrent=ALL-UNNAMED --add-opens > java.base/java.util.concurrent.atomic=ALL-UNNAMED > --add-exports=java.base/sun.net.www.protocol.http=ALL-UNNAMED > --add-exports=java.base/sun.net.www.protocol.https=ALL-UNNAMED > --add-exports=java.base/sun.net.www.protocol.jar=ALL-UNNAMED > --add-exports=jdk.xml.dom/org.w3c.dom.html=ALL-UNNAMED > --add-exports=jdk.naming.rmi/com.sun.jndi.url.rmi=ALL-UNNAMED > -Dactivemq.classpath=/opt/bitnami/activemq/conf:/opt/bitnami/activemq/../lib/: > -Dactivemq.home=/opt/bitnami/activemq -Dactivemq.base=/opt/bitnami/activemq > -Dactivemq.conf=/opt/bitnami/activemq/conf > -Dactivemq.data=/opt/bitnami/activemq/data > -Djolokia.conf=file:/opt/bitnami/activemq/conf/jolokia-access.xml -jar > /opt/bitnami/activemq/bin/activemq.jar start {code} > During the error condition, I am able to collect broker information via > jolokia: [^brokerInfo-after-crash-redacted.json] > -- This message was sent by Atlassian Jira (v8.20.10#820010)