[jira] [Commented] (ARTEMIS-2811) Component org.apache.activemq.artemis.core.io.buffer.TimedBuffer is expired on path 0

2020-06-23 Thread daves (Jira)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142854#comment-17142854
 ] 

daves commented on ARTEMIS-2811:


[~jbertram] Thanks for your comment. I will look at the documentation.

 I think the problem with the windows service is that the service is not 
running the java process directly but uses a service wrapper called 
"artemis-service.exe".  I've used such wrappers before. My favorite is nssm 
[https://nssm.cc/] which as far as is know detects if the "hosted" process 
existed an is able to signal this exit to the service manager.

Sadly the broker is hosted in an environment not controlled by us. I don't have 
any option to change the monitoring or check if everything is ok with the 
filestystem… I know you can't do anything neither but maybe it would be an 
option to take a look at "artemis-service.exe" and check if there is an option 
to detect a stopped java process?

 

> Component org.apache.activemq.artemis.core.io.buffer.TimedBuffer is expired 
> on path 0
> -
>
> Key: ARTEMIS-2811
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2811
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Affects Versions: 2.11.0
> Environment: * Windows Server 2016
>  * Artemis running as Windows Service
>Reporter: daves
>Assignee: Justin Bertram
>Priority: Major
> Attachments: broker.xml
>
>
> We run Artemis 2.11.0 on Windows Server 2016 as Windows Service. Suddenly 
> Artemis stopped working. The Artemis process stopped but the Windows 
> Service/Service wrapper was still running. We monitor all Services if they 
> are running, but since the Artemis-Service was still running our monitoring 
> did not detect that Artemis was not running anymore.
>  
>  # Is it possible to kill the Windows-Service together with the Artemis 
> process? (would be very nice for monitoring etc.)
>  # Is there a fix for this issue maybe in a newer version?
> Please see below stacktrace for more details. Please let me know if you need 
> any additional information.
>   
> {code:java}
> 2020-06-18 11:06:20,865 WARN  
> [org.apache.activemq.artemis.utils.critical.CriticalMeasure] Component 
> org.apache.activemq.artemis.core.io.buffer.TimedBuffer is expired on path 0
> 2020-06-18 11:06:20,865 WARN  
> [org.apache.activemq.artemis.utils.critical.CriticalMeasure] Component 
> org.apache.activemq.artemis.core.io.buffer.TimedBuffer is expired on path 0
> 2020-06-18 11:06:20,865 ERROR [org.apache.activemq.artemis.core.server] 
> AMQ224079: The process for the virtual machine will be killed, as component 
> org.apache.activemq.artemis.core.io.buffer.TimedBuffer@37d4349f is not 
> responsive
> 2020-06-18 11:06:21,146 WARN  [org.apache.activemq.artemis.core.server] 
> AMQ222199: Thread dump: 
> ***Complete
>  Thread dump "qtp140224-69441" Id=69441 TIMED_WAITING on 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5b811449
>  at sun.misc.Unsafe.park(Native Method) -  waiting on 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5b811449
>  at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown
>  Source) at 
> org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:392) 
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:564)
>  at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.access$800(QueuedThreadPool.java:49)
>  at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:627)
>  at java.lang.Thread.run(Unknown Source)"Thread-11562 
> (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@3cc1435c)"
>  Id=69440 TIMED_WAITING on 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1e83815d
>  at sun.misc.Unsafe.park(Native Method) -  waiting on 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1e83815d
>  at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown
>  Source) at java.util.concurrent.LinkedBlockingQueue.poll(Unknown Source) at 
> org.apache.activemq.artemis.utils.ActiveMQThreadPoolExecutor$ThreadPoolQueue.poll(ActiveMQThreadPoolExecutor.java:112)
>  at 
> org.apache.activemq.artemis.utils.ActiveMQThreadPoolExecutor$ThreadPoolQueue.poll(ActiveMQThreadPoolExecutor.java:45)
>  at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at 
> 

[jira] [Commented] (ARTEMIS-2811) Component org.apache.activemq.artemis.core.io.buffer.TimedBuffer is expired on path 0

2020-06-20 Thread Justin Bertram (Jira)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141156#comment-17141156
 ] 

Justin Bertram commented on ARTEMIS-2811:
-

Without more details about your configuration it's hard to say for sure, but 
based on the logging it appears there was an issue with your environment. 
Specifically, it looks like the broker was not able to complete a disk IO 
operation in the allotted time. This caused the "critical analyzer" to halt the 
JVM process. You can [read more about the critical analyzer in the 
documentation|http://activemq.apache.org/components/artemis/documentation/latest/critical-analysis.html].
 By default this is the configuration for the critical analyzer in 
{{broker.xml}}:

{code:xml}
  
  true
  12
  6
  HALT
{code}
I assume this is what you're using. This configuration means that every 60 
seconds the critical analyzer will run and check if any "critical" operations 
have exceeded 120 seconds.

In your case the {{org.apache.activemq.artemis.core.io.buffer.TimedBuffer}} 
took too long on "path 0." Path {{0}} for the {{TimedBuffer}} is a flush 
operation to store data on the disk. In the thread dump we can see:

{noformat}
"activemq-buffer-timeout" Id=15 RUNNABLE (in native) 
  at sun.nio.ch.FileDispatcherImpl.force0(Native Method) 
  at sun.nio.ch.FileDispatcherImpl.force(Unknown Source) at 
sun.nio.ch.FileChannelImpl.force(Unknown Source) 
  at 
org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.sync(NIOSequentialFile.java:262)
 
  at 
org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.doInternalWrite(NIOSequentialFile.java:391)
 
  at 
org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.internalWrite(NIOSequentialFile.java:359)
 
  at 
org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.access$100(NIOSequentialFile.java:43)
 
  at 
org.apache.activemq.artemis.core.io.nio.NIOSequentialFile$SyncLocalBufferObserver.flushBuffer(NIOSequentialFile.java:434)
 
  at 
org.apache.activemq.artemis.core.io.buffer.TimedBuffer.flushBatch(TimedBuffer.java:361)
 -  locked org.apache.activemq.artemis.core.io.buffer.TimedBuffer@37d4349f 
  at 
org.apache.activemq.artemis.core.io.buffer.TimedBuffer.flush(TimedBuffer.java:338)
 
  at 
org.apache.activemq.artemis.core.io.buffer.TimedBuffer$CheckTimer.run(TimedBuffer.java:473)
 
  at java.lang.Thread.run(Unknown Source)
{noformat}

Spending 120 seconds trying to flush a disk write indicates a problem with your 
storage. At this point the broker will invoke 
[{{Runtime.getRuntime().halt()}}|https://docs.oracle.com/javase/8/docs/api/java/lang/Runtime.html#halt-int-].
 As the JavaDoc states this method, "Forcibly terminates the currently running 
Java virtual machine."

It's not clear what else could be done to help Windows recognize that the 
broker is dead. I would expect the Windows Service to terminate when the JVM is 
halted. Perhaps you could monitor something that's more directly related to 
broker operation (e.g. if you can connect to the broker's port(s)).

In conclusion, I don't see anything wrong with the broker at this point. I 
recommend you investigate the performance of your storage as well as 
alternative monitoring strategies.

> Component org.apache.activemq.artemis.core.io.buffer.TimedBuffer is expired 
> on path 0
> -
>
> Key: ARTEMIS-2811
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2811
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Affects Versions: 2.11.0
> Environment: * Windows Server 2016
>  * Artemis running as Windows Service
>Reporter: daves
>Assignee: Justin Bertram
>Priority: Major
>
> We run Artemis 2.11.0 on Windows Server 2016 as Windows Service. Suddenly 
> Artemis stopped working. The Artemis process stopped but the Windows 
> Service/Service wrapper was still running. We monitor all Services if they 
> are running, but since the Artemis-Service was still running our monitoring 
> did not detect that Artemis was not running anymore.
>  
>  # Is it possible to kill the Windows-Service together with the Artemis 
> process? (would be very nice for monitoring etc.)
>  # Is there a fix for this issue maybe in a newer version?
> Please see below stacktrace for more details. Please let me know if you need 
> any additional information.
>   
> {code:java}
> 2020-06-18 11:06:20,865 WARN  
> [org.apache.activemq.artemis.utils.critical.CriticalMeasure] Component 
> org.apache.activemq.artemis.core.io.buffer.TimedBuffer is expired on path 0
> 2020-06-18 11:06:20,865 WARN  
> [org.apache.activemq.artemis.utils.critical.CriticalMeasure] Component 
> org.apache.activemq.artemis.core.io.buffer.TimedBuffer is expired on path 0
> 2020-06-18 11:06:20,865 ERROR