Re: Error while adding the node the baseline topology
This message actually looks worrisome: 2019-10-22 10:31:42,441][WARN ][data-streamer-stripe-3-#52][PageMemoryImpl] Parking thread=data-streamer-stripe-3-#52 for timeout (ms)=771038 It means that Ignite's throttling algorithm has decided to put a thread to sleep for 771 seconds. Can you share your persistence configuration (DataStorageConfiguration or PersistenceStorageConfiguration). Thanks, Stan On Thu, Oct 31, 2019 at 2:39 AM Denis Magda wrote: > Have you tried to turn of the failure handling following the previously > shared documentation page? It looks like some timeouts need to be tuned. > > Denis > > On Friday, October 25, 2019, krkumar24061...@gmail.com < > krkumar24061...@gmail.com> wrote: > >> Hi - The application is doing two things, one thread is writing 2kb size >> events to the ignite cache as a key value and other thread is executing >> ignite SQLs thru ignite jdbc connections. The throughput is anything >> between >> 25K to 40K events per second on the cache size. We are using data streamer >> for writing the key value cache. The cluster has 4 nodes with 198GB ram >> and >> 48 cores. >> >> We got a similar error again and here is the error description: >> >> [2019-10-25 10:16:45,399][ERROR][disco-event-worker-#142][G] Blocked >> system-critical thread has been detected. This can lead to cluster-wide >> undefined behaviour [threadName=data-streamer-stripe-0, blockedFor=2032s] >> [2019-10-25 10:16:45,399][WARN ][disco-event-worker-#142][G] Thread >> [name="data-streamer-stripe-0-#49", id=80, state=WAITING, blockCnt=7, >> waitCnt=5352642] >> >> [2019-10-25 10:16:45,399][ERROR][disco-event-worker-#142][root] Critical >> system error detected. Will be handled accordingly to configured handler >> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, >> super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, >> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext >> [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker >> [name=data-streamer-stripe-0, igniteInstanceName=null, finished=false, >> heartbeatTs=1572010973019]]] >> >> Thanx and Regards, >> KR Kumar >> >> >> >> -- >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >> > > > -- > - > Denis > >
Re: Error while adding the node the baseline topology
Have you tried to turn of the failure handling following the previously shared documentation page? It looks like some timeouts need to be tuned. Denis On Friday, October 25, 2019, krkumar24061...@gmail.com < krkumar24061...@gmail.com> wrote: > Hi - The application is doing two things, one thread is writing 2kb size > events to the ignite cache as a key value and other thread is executing > ignite SQLs thru ignite jdbc connections. The throughput is anything > between > 25K to 40K events per second on the cache size. We are using data streamer > for writing the key value cache. The cluster has 4 nodes with 198GB ram and > 48 cores. > > We got a similar error again and here is the error description: > > [2019-10-25 10:16:45,399][ERROR][disco-event-worker-#142][G] Blocked > system-critical thread has been detected. This can lead to cluster-wide > undefined behaviour [threadName=data-streamer-stripe-0, blockedFor=2032s] > [2019-10-25 10:16:45,399][WARN ][disco-event-worker-#142][G] Thread > [name="data-streamer-stripe-0-#49", id=80, state=WAITING, blockCnt=7, > waitCnt=5352642] > > [2019-10-25 10:16:45,399][ERROR][disco-event-worker-#142][root] Critical > system error detected. Will be handled accordingly to configured handler > [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, > SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext > [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker > [name=data-streamer-stripe-0, igniteInstanceName=null, finished=false, > heartbeatTs=1572010973019]]] > > Thanx and Regards, > KR Kumar > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ > -- - Denis
Re: Error while adding the node the baseline topology
Hi - The application is doing two things, one thread is writing 2kb size events to the ignite cache as a key value and other thread is executing ignite SQLs thru ignite jdbc connections. The throughput is anything between 25K to 40K events per second on the cache size. We are using data streamer for writing the key value cache. The cluster has 4 nodes with 198GB ram and 48 cores. We got a similar error again and here is the error description: [2019-10-25 10:16:45,399][ERROR][disco-event-worker-#142][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=data-streamer-stripe-0, blockedFor=2032s] [2019-10-25 10:16:45,399][WARN ][disco-event-worker-#142][G] Thread [name="data-streamer-stripe-0-#49", id=80, state=WAITING, blockCnt=7, waitCnt=5352642] [2019-10-25 10:16:45,399][ERROR][disco-event-worker-#142][root] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-0, igniteInstanceName=null, finished=false, heartbeatTs=1572010973019]]] Thanx and Regards, KR Kumar -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Error while adding the node the baseline topology
Hi, What is the application doing while you are changing the topology? Is the cluster under the load? Generally, we've added critical failure handlers in the latest version of Ignite and the message reported is printed out by them: https://apacheignite.readme.io/docs/critical-failures-handling - Denis On Tue, Oct 22, 2019 at 7:57 AM KR Kumar wrote: > Hi guys - I am running into the following issue when trying to add a node > to the baseline topology? Its happening only after we had upgraded from 2.3 > to 2.75. Any pointers would be appreciated. > > 2019-10-22 10:31:42,441][WARN > ][data-streamer-stripe-3-#52][PageMemoryImpl] Parking > thread=data-streamer-stripe-3-#52 for timeout > (ms)=771038 > [2019-10-22 10:31:45,635][ERROR][tcp-disco-msg-worker-#2][G] Blocked > system-critical thread has been detected. This can lead to cluster-wide > undefined behaviour [threadName=data-streamer-stripe-30, blockedFor=95s] > [2019-10-22 10:31:45,635][WARN ][tcp-disco-msg-worker-#2][G] Thread > [name="data-streamer-stripe-30-#79", id=110, state=TIMED_WAITING, > blockCnt=0, waitCnt=36470] > > [2019-10-22 10:31:45,637][ERROR][tcp-disco-msg-worker-#2][root] Critical > system error detected. Will be handled accordingly to configured handler > [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, > SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext > [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker > [name=data-streamer-stripe-30, igniteInstanceName=null, finished=false, > heartbeatTs=1571754609956]]] > class org.apache.ignite.IgniteException: GridWorker > [name=data-streamer-stripe-30, igniteInstanceName=null, finished=false, > heartbeatTs=1571754609956] > > Thanx and Regards, > KR Kumar >
Error while adding the node the baseline topology
Hi guys - I am running into the following issue when trying to add a node to the baseline topology? Its happening only after we had upgraded from 2.3 to 2.75. Any pointers would be appreciated. 2019-10-22 10:31:42,441][WARN ][data-streamer-stripe-3-#52][PageMemoryImpl] Parking thread=data-streamer-stripe-3-#52 for timeout (ms)=771038 [2019-10-22 10:31:45,635][ERROR][tcp-disco-msg-worker-#2][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=data-streamer-stripe-30, blockedFor=95s] [2019-10-22 10:31:45,635][WARN ][tcp-disco-msg-worker-#2][G] Thread [name="data-streamer-stripe-30-#79", id=110, state=TIMED_WAITING, blockCnt=0, waitCnt=36470] [2019-10-22 10:31:45,637][ERROR][tcp-disco-msg-worker-#2][root] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-30, igniteInstanceName=null, finished=false, heartbeatTs=1571754609956]]] class org.apache.ignite.IgniteException: GridWorker [name=data-streamer-stripe-30, igniteInstanceName=null, finished=false, heartbeatTs=1571754609956] Thanx and Regards, KR Kumar