Re: Ignite 2.7 Errors

2019-03-21 Thread Philip Wu
hello, Andrey - 

for your 2nd question, in Ignite 2.5, we have 15 mins + JVM paused as well,
but no IgniteException, was working fine.

2019-03-15 19:08:46,088 WARNING [ (jvm-pause-detector-worker)] Possible too
long JVM pause: 1001113 milliseconds.

2019-03-15 19:08:46,280 INFO  [IgniteKernal%XXXGrid
(grid-timeout-worker-#71%XXXGrid%)]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=1dc0de55, name=EnfusionGrid, uptime=01:49:39.992]
^-- H/N/C [hosts=1, nodes=1, CPUs=32]
^-- CPU [cur=100%, avg=39.77%, GC=1042.83%]
^-- PageMemory [pages=2300496]
^-- Heap [used=295123MB, free=14.48%, comm=345088MB]
^-- Non heap [used=442MB, free=-1%, comm=463MB]
^-- Outbound messages queue [size=0]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=1, qSize=0]
2019-03-15 19:08:46,280 INFO  [IgniteKernal%XXXGrid
(grid-timeout-worker-#71%XXXGrid%)] FreeList [name=XXXGrid, buckets=256,
dataPages=1, reusePages=0]





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite 2.7 Errors

2019-03-21 Thread Philip Wu
Hello, Andrey - 

for your 1st question, I do have a straceTrace in other
*FailureProcessor*.log

Thread [name="tcp-disco-msg-worker-#2%XXXGrid%", id=445, state=RUNNABLE,
blockCnt=0, waitCnt=320611]
at sun.management.ThreadImpl.dumpThreads0(Native Method)
at sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:454)
at o.a.i.i.util.IgniteUtils.dumpThreads(IgniteUtils.java:1364)
at
o.a.i.i.processors.failure.FailureProcessor.process(FailureProcessor.java:128)
- locked o.a.i.i.processors.failure.FailureProcessor@6d7b017b
at
o.a.i.i.processors.failure.FailureProcessor.process(FailureProcessor.java:104)
at
o.a.i.i.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1829)
at
o.a.i.i.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
at o.a.i.i.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
at o.a.i.i.util.worker.GridWorker.onIdle(GridWorker.java:297)
at
o.a.i.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663)
at
o.a.i.spi.discovery.tcp.ServerImpl$RingMessageWorker$$Lambda$211/210453.run(Unknown
Source)
at
o.a.i.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181)
at
o.a.i.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700)
at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:120)
at
o.a.i.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119)
at o.a.i.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite 2.7 Errors

2019-03-21 Thread Philip Wu
before that , there was:

2019-03-20 22:28:45,028 WARNING [G (tcp-disco-msg-worker-#2%XXXGrid%)]
Thread [name="grid-nio-worker-tcp-comm-1-#73%XXXGrid%", id=415,
state=RUNNABLE, blockCnt=0, waitCnt=0]




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite 2.7 Errors

2019-03-21 Thread Philip Wu
Hello, Andrey - actually this is the sequence of events in time order:

2019-03-20 22:28:44,999 WARNING [IgniteKernal%XXXGrid
(jvm-pause-detector-worker)] Possible too long JVM pause: 928937
milliseconds

2019-03-20 22:28:45,014 SEVERE [G (tcp-disco-msg-worker-#2%XXXGrid%)]
Blocked system-critical thread has been detected. This can lead to
cluster-wide undefined behaviour [threadName=grid-nio-worker-tcp-comm-1,
blockedFor=928s]

019-03-20 22:28:45,021 WARN  [FailoverTransport (ActiveMQ Transport:
Transport ) failed , attempting to automatically reconnect:
java.io.EOFException

2019-03-20 22:28:45,021 ERROR
[ActiveMQDelegate=>MasterServiceGlobalConnection (ActiveMQ Transport: 
transport Interrupted
2019-03-20 22:28:45,023 WARN  [FailoverTransport (ActiveMQ Transport: 
Transport () failed , attempting to automatically reconnect:
java.io.EOFException

2019-03-20 22:28:45,028 WARNING [G (tcp-disco-msg-worker-#2%EnfusionGrid%)]
Thread [name="grid-nio-worker-tcp-comm-1-#73%EnfusionGrid%", id=415,
state=RUNNABLE, blockCnt=0, waitCnt=0]

2019-03-20 22:28:45,028 WARN  [FailoverTransport (ActiveMQ Transport: )]
Transport  failed , attempting to automatically reconnect:
java.io.EOFException
2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)] Exception in thread "ActiveMQ InactivityMonitor
WriteCheckTimer" java.lang.NullPointerException
2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)] at
org.apache.activemq.transport.AbstractInactivityMonitor.writeCheck(AbstractInactivityMonitor.java:219)
2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)] at
org.apache.activemq.transport.AbstractInactivityMonitor$3.run(AbstractInactivityMonitor.java:153)
2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)] at
org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33)
2019-03-20 22:28:45,030 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)] at java.util.TimerThread.mainLoop(Timer.java:555)
2019-03-20 22:28:45,030 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)] at java.util.TimerThread.run(Timer.java:505)



2019-03-20 22:28:45,044 ERROR
[ActiveMQDelegate=>MasterServiceGlobalConnection (ActiveMQ Transport: ]
transport Interrupted

2019-03-20 22:28:45,044 SEVERE [ (tcp-disco-msg-worker-#2%XXXGrid%)]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler
[ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
[name=grid-nio-worker-tcp-comm-1, igniteInstanceName=XXXGrid,
finished=false, heartbeatTs=1553137996031]]]: class
org.apache.ignite.IgniteException: GridWorker
[name=grid-nio-worker-tcp-comm-1, igniteInstanceName=XXXGrid,
finished=false, heartbeatTs=1553137996031]
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
at
org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
at
org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119)
at
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)



2019-03-20 22:28:45,052 WARNING [FailureProcessor
(tcp-disco-msg-worker-#2%XXXGrid%)] No deadlocked threads detected.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite 2.7 Errors

2019-03-21 Thread Philip Wu
Hello, Andrey -

I see this:

2019-03-20 22:28:45,052 WARNING [FailureProcessor
(tcp-disco-msg-worker-#2%XXXGrid%)] No deadlocked threads detected.

is that what you mean?

--- 

Also, prior to crash, I see this:
not sure if it is related.

2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)] Exception in thread "ActiveMQ InactivityMonitor
WriteCheckTimer" java.lang.NullPointerException
2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)] at
org.apache.activemq.transport.AbstractInactivityMonitor.writeCheck(AbstractInactivityMonitor.java:219)
2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)] at
org.apache.activemq.transport.AbstractInactivityMonitor$3.run(AbstractInactivityMonitor.java:153)
2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)] at
org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33)
2019-03-20 22:28:45,030 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)] at java.util.TimerThread.mainLoop(Timer.java:555)
2019-03-20 22:28:45,030 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)] at java.util.TimerThread.run(Timer.java:505)





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite 2.7 Errors

2019-03-21 Thread Philip Wu
Thanks, llya!

Actually it happened in PROD system again last night ... even with
NoOpFailureHandler.

I am rolling back to Ignite 2.5 or 2.6 for now. Thanks!

2019-03-20 22:28:45,044 SEVERE [ (tcp-disco-msg-worker-#2%XXXGrid%)]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler
[ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
[name=grid-nio-worker-tcp-comm-1, igniteInstanceName=XXXGrid,
finished=false, heartbeatTs=1553137996031]]]: class
org.apache.ignite.IgniteException: GridWorker
[name=grid-nio-worker-tcp-comm-1, igniteInstanceName=XXXGrid,
finished=false, heartbeatTs=1553137996031]
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
at
org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
at
org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119)
at
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite 2.7 Errors

2019-03-20 Thread Philip Wu
Thank you, IIya!

We ended up using 

cfg.setFailureHandler(new NoOpFailureHandler());

it silenced the errors and no more stack dumps, etc. and it seems to work
like in 2.5 and 2.6, with no other changes.

I am still curious if in the future I can take that line out if 2.7 is more
stable or 2.8.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Ignite 2.7 Errors

2019-03-19 Thread Philip Wu
Hi, recently we upgraded Ignite from 2.5 to 2.7, got the following error.

Is this configurational, or known bug in 2.7?


2019-03-18 15:44:23,383 SEVERE [ (tcp-disco-msg-worker-#2%XXXGrid%)]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]],
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
o.a.i.IgniteException: GridWorker [name=grid-nio-worker-tcp-comm-9,
igniteInstanceName=XXXGrid, finished=false, heartbeatTs=1552941767243]]]:
class org.apache.ignite.IgniteException: GridWorker
[name=grid-nio-worker-tcp-comm-9, igniteInstanceName=XXXGrid,
finished=false, heartbeatTs=1552941767243]



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/