Re: Clients got disconnected during the endurance testing
Hello! This may happen if your cluster has a long PME and connection pool is exhausted. You need to check server nodes' logs for suspicious messages. Regards, -- Ilya Kasnacheev чт, 3 июн. 2021 г. в 10:35, Naveen : > HI All > > We are using Ignite 2.8.1 and carrying the endurance test lasting for 7 to > 12 hours. > Test ran for almost 6 hours and all of a sudden clients got disconnected > and > seeing the below logs > what could be the reason for this behavior, we have enough resources like > RAM, CPU during that time > > > [2021-06-03 00:08:08,172][WARN ][tcp-disco-msg-worker-[8761dfbe > 10.119.10.63:47500]-#2][root] Possible failure suppressed accordingly to a > configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, > timeout=0, super=AbstractFailureHandler > [ignoredFailureTypes=UnmodifiableSet > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], > failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class > o.a.i.IgniteException: GridWorker [name=grid-timeout-worker, > igniteInstanceName=null, finished=false, heartbeatTs=1622664488170]]] > class org.apache.ignite.IgniteException: GridWorker > [name=grid-timeout-worker, igniteInstanceName=null, finished=false, > heartbeatTs=1622664488170] > at > > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1810) > at > > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1805) > at > > org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:234) > at > > org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) > at > > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2858) > at > > org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7759) > at > > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2946) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at > > org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7697) > at > org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:61) > > > [2021-06-03 00:08:08,172][WARN > ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform > handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:45160] > [2021-06-03 00:08:08,172][WARN > ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform > handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:35382] > [2021-06-03 00:08:08,174][WARN > ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform > handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:60720] > [2021-06-03 00:08:08,174][WARN > ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform > handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:54156] > [2021-06-03 00:08:08,174][WARN > ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform > handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:55260] > [2021-06-03 00:08:08,174][WARN > ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform > handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:32804] > [2021-06-03 00:08:08,174][WARN > ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform > handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:54822] > [2021-06-03 00:08:08,175][WARN > ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform > handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:60692] > [2021-06-03 00:08:08,175][WARN > ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform > handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:45316] > > > Thread=[name=auth-#47, id=92], state=WAITING > Locked pages = [] > Locked pages log: name=auth-#47 time=(1622664488172, 2021-06-03 > 00:08:08.172) > > > Thread=[name=checkpoint-runner-#65, id=112], state=WAITING > Locked pages = [] > Locked pages log: name=checkpoint-runner-#65 time=(1622664488172, > 2021-06-03 > 00:08:08.172) > > > Thread=[name=checkpoint-runner-#66, id=113], state=WAITING > Locked pages = [] > Locked pages log: name=checkpoint-runner-#66 time=(1622664488172, > 2021-06-03 > 00:08:08.172) > > > Thread=[name=checkpoint-runner-#67, id=114], state=WAITING > Locked pages = [] > Locked pages log: name=checkpoint-runner-#67 time=(1622664488172, > 2021-06-03 > 00:08:08.172) > > > Thread=[name=checkpoint-runner-#68, id=115], state=WAITING > Locked pages = [] > Thread=[name=checkpoint-runner-#68, id=115], state=WAITING > Locked pages = [] > Locked pages log: name=checkpoint-runner-#68 time=(1622664488172, > 2021-06-03 > 00:08:08.172) > > > Thread=[name=client-connector-#72, id=123], state=BLOCKED > Locked pages = [] > Locked pages log: name=client-connector-#7
Clients got disconnected during the endurance testing
HI All We are using Ignite 2.8.1 and carrying the endurance test lasting for 7 to 12 hours. Test ran for almost 6 hours and all of a sudden clients got disconnected and seeing the below logs what could be the reason for this behavior, we have enough resources like RAM, CPU during that time [2021-06-03 00:08:08,172][WARN ][tcp-disco-msg-worker-[8761dfbe 10.119.10.63:47500]-#2][root] Possible failure suppressed accordingly to a configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=grid-timeout-worker, igniteInstanceName=null, finished=false, heartbeatTs=1622664488170]]] class org.apache.ignite.IgniteException: GridWorker [name=grid-timeout-worker, igniteInstanceName=null, finished=false, heartbeatTs=1622664488170] at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1810) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1805) at org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:234) at org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2858) at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7759) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2946) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7697) at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:61) [2021-06-03 00:08:08,172][WARN ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:45160] [2021-06-03 00:08:08,172][WARN ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:35382] [2021-06-03 00:08:08,174][WARN ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:60720] [2021-06-03 00:08:08,174][WARN ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:54156] [2021-06-03 00:08:08,174][WARN ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:55260] [2021-06-03 00:08:08,174][WARN ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:32804] [2021-06-03 00:08:08,174][WARN ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:54822] [2021-06-03 00:08:08,175][WARN ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:60692] [2021-06-03 00:08:08,175][WARN ][grid-timeout-worker-#23][ClientListenerNioListener] Unable to perform handshake within timeout [timeout=24, remoteAddr=/10.129.4.13:45316] Thread=[name=auth-#47, id=92], state=WAITING Locked pages = [] Locked pages log: name=auth-#47 time=(1622664488172, 2021-06-03 00:08:08.172) Thread=[name=checkpoint-runner-#65, id=112], state=WAITING Locked pages = [] Locked pages log: name=checkpoint-runner-#65 time=(1622664488172, 2021-06-03 00:08:08.172) Thread=[name=checkpoint-runner-#66, id=113], state=WAITING Locked pages = [] Locked pages log: name=checkpoint-runner-#66 time=(1622664488172, 2021-06-03 00:08:08.172) Thread=[name=checkpoint-runner-#67, id=114], state=WAITING Locked pages = [] Locked pages log: name=checkpoint-runner-#67 time=(1622664488172, 2021-06-03 00:08:08.172) Thread=[name=checkpoint-runner-#68, id=115], state=WAITING Locked pages = [] Thread=[name=checkpoint-runner-#68, id=115], state=WAITING Locked pages = [] Locked pages log: name=checkpoint-runner-#68 time=(1622664488172, 2021-06-03 00:08:08.172) Thread=[name=client-connector-#72, id=123], state=BLOCKED Locked pages = [] Locked pages log: name=client-connector-#72 time=(1622664488172, 2021-06-03 00:08:08.172) Thread=[name=client-connector-#73, id=124], state=BLOCKED Locked pages = [] Locked pages log: name=client-connector-#73 time=(1622664488172, 2021-06-03 00:08:08.172) Thread=[name=client-connector-#74, id=125], state=BLOCKED Locked pages = [] Locked pages log: name=client-connector-#74 time=(1622664488172, 2021-06-03 00:08:08.172) Thread=[name=client-connector-#75, id=126],