Re: Ignite 2.7 Errors
hello, Andrey - for your 2nd question, in Ignite 2.5, we have 15 mins + JVM paused as well, but no IgniteException, was working fine. 2019-03-15 19:08:46,088 WARNING [ (jvm-pause-detector-worker)] Possible too long JVM pause: 1001113 milliseconds. 2019-03-15 19:08:46,280 INFO [IgniteKernal%XXXGrid (grid-timeout-worker-#71%XXXGrid%)] Metrics for local node (to disable set 'metricsLogFrequency' to 0) ^-- Node [id=1dc0de55, name=EnfusionGrid, uptime=01:49:39.992] ^-- H/N/C [hosts=1, nodes=1, CPUs=32] ^-- CPU [cur=100%, avg=39.77%, GC=1042.83%] ^-- PageMemory [pages=2300496] ^-- Heap [used=295123MB, free=14.48%, comm=345088MB] ^-- Non heap [used=442MB, free=-1%, comm=463MB] ^-- Outbound messages queue [size=0] ^-- Public thread pool [active=0, idle=0, qSize=0] ^-- System thread pool [active=0, idle=1, qSize=0] 2019-03-15 19:08:46,280 INFO [IgniteKernal%XXXGrid (grid-timeout-worker-#71%XXXGrid%)] FreeList [name=XXXGrid, buckets=256, dataPages=1, reusePages=0] -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Ignite 2.7 Errors
Hello, Andrey - for your 1st question, I do have a straceTrace in other *FailureProcessor*.log Thread [name="tcp-disco-msg-worker-#2%XXXGrid%", id=445, state=RUNNABLE, blockCnt=0, waitCnt=320611] at sun.management.ThreadImpl.dumpThreads0(Native Method) at sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:454) at o.a.i.i.util.IgniteUtils.dumpThreads(IgniteUtils.java:1364) at o.a.i.i.processors.failure.FailureProcessor.process(FailureProcessor.java:128) - locked o.a.i.i.processors.failure.FailureProcessor@6d7b017b at o.a.i.i.processors.failure.FailureProcessor.process(FailureProcessor.java:104) at o.a.i.i.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1829) at o.a.i.i.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826) at o.a.i.i.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233) at o.a.i.i.util.worker.GridWorker.onIdle(GridWorker.java:297) at o.a.i.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663) at o.a.i.spi.discovery.tcp.ServerImpl$RingMessageWorker$$Lambda$211/210453.run(Unknown Source) at o.a.i.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181) at o.a.i.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700) at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:120) at o.a.i.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119) at o.a.i.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Ignite 2.7 Errors
before that , there was: 2019-03-20 22:28:45,028 WARNING [G (tcp-disco-msg-worker-#2%XXXGrid%)] Thread [name="grid-nio-worker-tcp-comm-1-#73%XXXGrid%", id=415, state=RUNNABLE, blockCnt=0, waitCnt=0] -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Ignite 2.7 Errors
Hello, Andrey - actually this is the sequence of events in time order: 2019-03-20 22:28:44,999 WARNING [IgniteKernal%XXXGrid (jvm-pause-detector-worker)] Possible too long JVM pause: 928937 milliseconds 2019-03-20 22:28:45,014 SEVERE [G (tcp-disco-msg-worker-#2%XXXGrid%)] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=grid-nio-worker-tcp-comm-1, blockedFor=928s] 019-03-20 22:28:45,021 WARN [FailoverTransport (ActiveMQ Transport: Transport ) failed , attempting to automatically reconnect: java.io.EOFException 2019-03-20 22:28:45,021 ERROR [ActiveMQDelegate=>MasterServiceGlobalConnection (ActiveMQ Transport: transport Interrupted 2019-03-20 22:28:45,023 WARN [FailoverTransport (ActiveMQ Transport: Transport () failed , attempting to automatically reconnect: java.io.EOFException 2019-03-20 22:28:45,028 WARNING [G (tcp-disco-msg-worker-#2%EnfusionGrid%)] Thread [name="grid-nio-worker-tcp-comm-1-#73%EnfusionGrid%", id=415, state=RUNNABLE, blockCnt=0, waitCnt=0] 2019-03-20 22:28:45,028 WARN [FailoverTransport (ActiveMQ Transport: )] Transport failed , attempting to automatically reconnect: java.io.EOFException 2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor WriteCheckTimer)] Exception in thread "ActiveMQ InactivityMonitor WriteCheckTimer" java.lang.NullPointerException 2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor WriteCheckTimer)] at org.apache.activemq.transport.AbstractInactivityMonitor.writeCheck(AbstractInactivityMonitor.java:219) 2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor WriteCheckTimer)] at org.apache.activemq.transport.AbstractInactivityMonitor$3.run(AbstractInactivityMonitor.java:153) 2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor WriteCheckTimer)] at org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33) 2019-03-20 22:28:45,030 ERROR [stderr (ActiveMQ InactivityMonitor WriteCheckTimer)] at java.util.TimerThread.mainLoop(Timer.java:555) 2019-03-20 22:28:45,030 ERROR [stderr (ActiveMQ InactivityMonitor WriteCheckTimer)] at java.util.TimerThread.run(Timer.java:505) 2019-03-20 22:28:45,044 ERROR [ActiveMQDelegate=>MasterServiceGlobalConnection (ActiveMQ Transport: ] transport Interrupted 2019-03-20 22:28:45,044 SEVERE [ (tcp-disco-msg-worker-#2%XXXGrid%)] Critical system error detected. Will be handled accordingly to configured handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=XXXGrid, finished=false, heartbeatTs=1553137996031]]]: class org.apache.ignite.IgniteException: GridWorker [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=XXXGrid, finished=false, heartbeatTs=1553137996031] at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826) at org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233) at org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663) at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119) at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) 2019-03-20 22:28:45,052 WARNING [FailureProcessor (tcp-disco-msg-worker-#2%XXXGrid%)] No deadlocked threads detected. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Ignite 2.7 Errors
Hello, Andrey - I see this: 2019-03-20 22:28:45,052 WARNING [FailureProcessor (tcp-disco-msg-worker-#2%XXXGrid%)] No deadlocked threads detected. is that what you mean? --- Also, prior to crash, I see this: not sure if it is related. 2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor WriteCheckTimer)] Exception in thread "ActiveMQ InactivityMonitor WriteCheckTimer" java.lang.NullPointerException 2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor WriteCheckTimer)] at org.apache.activemq.transport.AbstractInactivityMonitor.writeCheck(AbstractInactivityMonitor.java:219) 2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor WriteCheckTimer)] at org.apache.activemq.transport.AbstractInactivityMonitor$3.run(AbstractInactivityMonitor.java:153) 2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor WriteCheckTimer)] at org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33) 2019-03-20 22:28:45,030 ERROR [stderr (ActiveMQ InactivityMonitor WriteCheckTimer)] at java.util.TimerThread.mainLoop(Timer.java:555) 2019-03-20 22:28:45,030 ERROR [stderr (ActiveMQ InactivityMonitor WriteCheckTimer)] at java.util.TimerThread.run(Timer.java:505) -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Ignite 2.7 Errors
Thanks, llya! Actually it happened in PROD system again last night ... even with NoOpFailureHandler. I am rolling back to Ignite 2.5 or 2.6 for now. Thanks! 2019-03-20 22:28:45,044 SEVERE [ (tcp-disco-msg-worker-#2%XXXGrid%)] Critical system error detected. Will be handled accordingly to configured handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=XXXGrid, finished=false, heartbeatTs=1553137996031]]]: class org.apache.ignite.IgniteException: GridWorker [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=XXXGrid, finished=false, heartbeatTs=1553137996031] at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826) at org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233) at org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663) at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119) at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Ignite 2.7 Errors
Thank you, IIya! We ended up using cfg.setFailureHandler(new NoOpFailureHandler()); it silenced the errors and no more stack dumps, etc. and it seems to work like in 2.5 and 2.6, with no other changes. I am still curious if in the future I can take that line out if 2.7 is more stable or 2.8. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Ignite 2.7 Errors
Hi, recently we upgraded Ignite from 2.5 to 2.7, got the following error. Is this configurational, or known bug in 2.7? 2019-03-18 15:44:23,383 SEVERE [ (tcp-disco-msg-worker-#2%XXXGrid%)] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=grid-nio-worker-tcp-comm-9, igniteInstanceName=XXXGrid, finished=false, heartbeatTs=1552941767243]]]: class org.apache.ignite.IgniteException: GridWorker [name=grid-nio-worker-tcp-comm-9, igniteInstanceName=XXXGrid, finished=false, heartbeatTs=1552941767243] -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/