[ 
https://issues.apache.org/jira/browse/IGNITE-12714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-12714:
---------------------------------------
    Description: 
Scenario:
1. Start 3 data nodes 
2. Start load with a streamer on 6 clients
3. Start data nodes restarter

Result:
Keys weren't loaded in all (1000) caches.
In the server node log I see:
{noformat}
[2019-07-17 16:52:36,881][ERROR][tcp-disco-msg-worker-#2] Blocked 
system-critical thread has been detected. This can lead to cluster-wide 
undefined behaviour [threadName=data-streamer-stripe-7, blockedFor=16s]
[2019-07-17 16:52:36,883][WARN ][tcp-disco-msg-worker-#2] Thread 
[name="data-streamer-stripe-7-#24", id=43, state=WAITING, blockCnt=111, 
waitCnt=169964]
[2019-07-17 16:52:36,885][ERROR][tcp-disco-msg-worker-#2] Critical system error 
detected. Will be handled accordingly to configured handler 
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-7, 
igniteInstanceName=null, finished=false, heartbeatTs=1563371540069]]]
org.apache.ignite.IgniteException: GridWorker [name=data-streamer-stripe-7, 
igniteInstanceName=null, finished=false, heartbeatTs=1563371540069]
    at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1838)
 ~[ignite-core-2.5.9.jar:2.5.9]
    at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1833)
 ~[ignite-core-2.5.9.jar:2.5.9]
    at 
org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:230)
 ~[ignite-core-2.5.9.jar:2.5.9]
    at 
org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) 
~[ignite-core-2.5.9.jar:2.5.9]
    at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2804)
 ~[ignite-core-2.5.9.jar:2.5.9]
    at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7568)
 [ignite-core-2.5.9.jar:2.5.9]
    at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2866)
 [ignite-core-2.5.9.jar:2.5.9]
    at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) 
[ignite-core-2.5.9.jar:2.5.9]
    at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7506)
 [ignite-core-2.5.9.jar:2.5.9]
    at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) 
[ignite-core-2.5.9.jar:2.5.9]
{noformat}


*Solution:*
Increase timeout to 2 min 
org.apache.ignite.IgniteSystemProperties#IGNITE_SYSTEM_WORKER_BLOCKED_TIMEOUT

  was:
Scenario:
1. Start 3 data nodes 
2. Start load with a streamer on 6 clients
3. Start data nodes restarter

Result:
Keys weren't loaded in all (1000) caches.
In the server node log I see:
{noformat}
[2019-07-17 16:52:36,881][ERROR][tcp-disco-msg-worker-#2] Blocked 
system-critical thread has been detected. This can lead to cluster-wide 
undefined behaviour [threadName=data-streamer-stripe-7, blockedFor=16s]
[2019-07-17 16:52:36,883][WARN ][tcp-disco-msg-worker-#2] Thread 
[name="data-streamer-stripe-7-#24", id=43, state=WAITING, blockCnt=111, 
waitCnt=169964]
[2019-07-17 16:52:36,885][ERROR][tcp-disco-msg-worker-#2] Critical system error 
detected. Will be handled accordingly to configured handler 
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-7, 
igniteInstanceName=null, finished=false, heartbeatTs=1563371540069]]]
org.apache.ignite.IgniteException: GridWorker [name=data-streamer-stripe-7, 
igniteInstanceName=null, finished=false, heartbeatTs=1563371540069]
    at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1838)
 ~[ignite-core-2.5.9.jar:2.5.9]
    at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1833)
 ~[ignite-core-2.5.9.jar:2.5.9]
    at 
org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:230)
 ~[ignite-core-2.5.9.jar:2.5.9]
    at 
org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) 
~[ignite-core-2.5.9.jar:2.5.9]
    at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2804)
 ~[ignite-core-2.5.9.jar:2.5.9]
    at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7568)
 [ignite-core-2.5.9.jar:2.5.9]
    at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2866)
 [ignite-core-2.5.9.jar:2.5.9]
    at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) 
[ignite-core-2.5.9.jar:2.5.9]
    at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7506)
 [ignite-core-2.5.9.jar:2.5.9]
    at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) 
[ignite-core-2.5.9.jar:2.5.9]
{noformat}

Logs: ftp://gg@172.25.2.50/poc-tester-logs/1723/log-2019-07-17-17-33-23
Log with dumps: 
ftp://gg@172.25.2.50/poc-tester-logs/1723/log-2019-07-17-17-33-23/servers/172.25.1.12/poc-tester-server-172.25.1.12-id-0-2019-07-17-16-46-58.log-1-2019-07-17.log.gz


*Solution:*
Increase timeout to 2 min 
org.apache.ignite.IgniteSystemProperties#IGNITE_SYSTEM_WORKER_BLOCKED_TIMEOUT


> Absence of default value of IGNITE_SYSTEM_WORKER_BLOCKED TIMEOUT
> ----------------------------------------------------------------
>
>                 Key: IGNITE-12714
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12714
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Anton Kalashnikov
>            Assignee: Anton Kalashnikov
>            Priority: Major
>
> Scenario:
> 1. Start 3 data nodes 
> 2. Start load with a streamer on 6 clients
> 3. Start data nodes restarter
> Result:
> Keys weren't loaded in all (1000) caches.
> In the server node log I see:
> {noformat}
> [2019-07-17 16:52:36,881][ERROR][tcp-disco-msg-worker-#2] Blocked 
> system-critical thread has been detected. This can lead to cluster-wide 
> undefined behaviour [threadName=data-streamer-stripe-7, blockedFor=16s]
> [2019-07-17 16:52:36,883][WARN ][tcp-disco-msg-worker-#2] Thread 
> [name="data-streamer-stripe-7-#24", id=43, state=WAITING, blockCnt=111, 
> waitCnt=169964]
> [2019-07-17 16:52:36,885][ERROR][tcp-disco-msg-worker-#2] Critical system 
> error detected. Will be handled accordingly to configured handler 
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-7, 
> igniteInstanceName=null, finished=false, heartbeatTs=1563371540069]]]
> org.apache.ignite.IgniteException: GridWorker [name=data-streamer-stripe-7, 
> igniteInstanceName=null, finished=false, heartbeatTs=1563371540069]
>     at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1838)
>  ~[ignite-core-2.5.9.jar:2.5.9]
>     at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1833)
>  ~[ignite-core-2.5.9.jar:2.5.9]
>     at 
> org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:230)
>  ~[ignite-core-2.5.9.jar:2.5.9]
>     at 
> org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) 
> ~[ignite-core-2.5.9.jar:2.5.9]
>     at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2804)
>  ~[ignite-core-2.5.9.jar:2.5.9]
>     at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7568)
>  [ignite-core-2.5.9.jar:2.5.9]
>     at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2866)
>  [ignite-core-2.5.9.jar:2.5.9]
>     at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) 
> [ignite-core-2.5.9.jar:2.5.9]
>     at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7506)
>  [ignite-core-2.5.9.jar:2.5.9]
>     at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) 
> [ignite-core-2.5.9.jar:2.5.9]
> {noformat}
> *Solution:*
> Increase timeout to 2 min 
> org.apache.ignite.IgniteSystemProperties#IGNITE_SYSTEM_WORKER_BLOCKED_TIMEOUT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to