Re: Clients got disconnected during the endurance testing

2021-06-09 Thread Ilya Kasnacheev
Hello!

This may happen if your cluster has a long PME and connection pool is
exhausted. You need to check server nodes' logs for suspicious messages.

Regards,
-- 
Ilya Kasnacheev


чт, 3 июн. 2021 г. в 10:35, Naveen :

> HI All
>
> We are using Ignite 2.8.1 and carrying the endurance test lasting for 7 to
> 12 hours.
> Test ran for almost 6 hours and all of a sudden clients got disconnected
> and
> seeing the below logs
> what could be the reason for this behavior, we have enough resources like
> RAM, CPU during that time
>
>
> [2021-06-03 00:08:08,172][WARN ][tcp-disco-msg-worker-[8761dfbe
> 10.119.10.63:47500]-#2][root] Possible failure suppressed accordingly to a
> configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false,
> timeout=0, super=AbstractFailureHandler
> [ignoredFailureTypes=UnmodifiableSet
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
> o.a.i.IgniteException: GridWorker [name=grid-timeout-worker,
> igniteInstanceName=null, finished=false, heartbeatTs=1622664488170]]]
> class org.apache.ignite.IgniteException: GridWorker
> [name=grid-timeout-worker, igniteInstanceName=null, finished=false,
> heartbeatTs=1622664488170]
> at
>
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1810)
> at
>
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1805)
> at
>
> org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:234)
> at
>
> org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
> at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2858)
> at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7759)
> at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2946)
> at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7697)
> at
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:61)
>
>
> [2021-06-03 00:08:08,172][WARN
> ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform
> handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:45160]
> [2021-06-03 00:08:08,172][WARN
> ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform
> handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:35382]
> [2021-06-03 00:08:08,174][WARN
> ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform
> handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:60720]
> [2021-06-03 00:08:08,174][WARN
> ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform
> handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:54156]
> [2021-06-03 00:08:08,174][WARN
> ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform
> handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:55260]
> [2021-06-03 00:08:08,174][WARN
> ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform
> handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:32804]
> [2021-06-03 00:08:08,174][WARN
> ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform
> handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:54822]
> [2021-06-03 00:08:08,175][WARN
> ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform
> handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:60692]
> [2021-06-03 00:08:08,175][WARN
> ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform
> handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:45316]
>
>
> Thread=[name=auth-#47, id=92], state=WAITING
> Locked pages = []
> Locked pages log: name=auth-#47 time=(1622664488172, 2021-06-03
> 00:08:08.172)
>
>
> Thread=[name=checkpoint-runner-#65, id=112], state=WAITING
> Locked pages = []
> Locked pages log: name=checkpoint-runner-#65 time=(1622664488172,
> 2021-06-03
> 00:08:08.172)
>
>
> Thread=[name=checkpoint-runner-#66, id=113], state=WAITING
> Locked pages = []
> Locked pages log: name=checkpoint-runner-#66 time=(1622664488172,
> 2021-06-03
> 00:08:08.172)
>
>
> Thread=[name=checkpoint-runner-#67, id=114], state=WAITING
> Locked pages = []
> Locked pages log: name=checkpoint-runner-#67 time=(1622664488172,
> 2021-06-03
> 00:08:08.172)
>
>
> Thread=[name=checkpoint-runner-#68, id=115], state=WAITING
> Locked pages = []
> Thread=[name=checkpoint-runner-#68, id=115], state=WAITING
> Locked pages = []
> Locked pages log: name=checkpoint-runner-#68 time=(1622664488172,
> 2021-06-03
> 00:08:08.172)
>
>
> Thread=[name=client-connector-#72, id=123], state=BLOCKED
> Locked pages = []
> Locked pages log: name=client-connector-#7

Clients got disconnected during the endurance testing

2021-06-03 Thread Naveen
HI All

We are using Ignite 2.8.1 and carrying the endurance test lasting for 7 to
12 hours.
Test ran for almost 6 hours and all of a sudden clients got disconnected and
seeing the below logs
what could be the reason for this behavior, we have enough resources like
RAM, CPU during that time


[2021-06-03 00:08:08,172][WARN ][tcp-disco-msg-worker-[8761dfbe
10.119.10.63:47500]-#2][root] Possible failure suppressed accordingly to a
configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false,
timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
o.a.i.IgniteException: GridWorker [name=grid-timeout-worker,
igniteInstanceName=null, finished=false, heartbeatTs=1622664488170]]]
class org.apache.ignite.IgniteException: GridWorker
[name=grid-timeout-worker, igniteInstanceName=null, finished=false,
heartbeatTs=1622664488170]
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1810)
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1805)
at
org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:234)
at
org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2858)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7759)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2946)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7697)
at
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:61)


[2021-06-03 00:08:08,172][WARN
][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform
handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:45160]
[2021-06-03 00:08:08,172][WARN
][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform
handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:35382]
[2021-06-03 00:08:08,174][WARN
][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform
handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:60720]
[2021-06-03 00:08:08,174][WARN
][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform
handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:54156]
[2021-06-03 00:08:08,174][WARN
][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform
handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:55260]
[2021-06-03 00:08:08,174][WARN
][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform
handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:32804]
[2021-06-03 00:08:08,174][WARN
][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform
handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:54822]
[2021-06-03 00:08:08,175][WARN
][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform
handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:60692]
[2021-06-03 00:08:08,175][WARN
][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform
handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:45316]


Thread=[name=auth-#47, id=92], state=WAITING
Locked pages = []
Locked pages log: name=auth-#47 time=(1622664488172, 2021-06-03
00:08:08.172)


Thread=[name=checkpoint-runner-#65, id=112], state=WAITING
Locked pages = []
Locked pages log: name=checkpoint-runner-#65 time=(1622664488172, 2021-06-03
00:08:08.172)


Thread=[name=checkpoint-runner-#66, id=113], state=WAITING
Locked pages = []
Locked pages log: name=checkpoint-runner-#66 time=(1622664488172, 2021-06-03
00:08:08.172)


Thread=[name=checkpoint-runner-#67, id=114], state=WAITING
Locked pages = []
Locked pages log: name=checkpoint-runner-#67 time=(1622664488172, 2021-06-03
00:08:08.172)


Thread=[name=checkpoint-runner-#68, id=115], state=WAITING
Locked pages = []
Thread=[name=checkpoint-runner-#68, id=115], state=WAITING
Locked pages = []
Locked pages log: name=checkpoint-runner-#68 time=(1622664488172, 2021-06-03
00:08:08.172)


Thread=[name=client-connector-#72, id=123], state=BLOCKED
Locked pages = []
Locked pages log: name=client-connector-#72 time=(1622664488172, 2021-06-03
00:08:08.172)


Thread=[name=client-connector-#73, id=124], state=BLOCKED
Locked pages = []
Locked pages log: name=client-connector-#73 time=(1622664488172, 2021-06-03
00:08:08.172)


Thread=[name=client-connector-#74, id=125], state=BLOCKED
Locked pages = []
Locked pages log: name=client-connector-#74 time=(1622664488172, 2021-06-03
00:08:08.172)


Thread=[name=client-connector-#75, id=126],