[ https://issues.apache.org/jira/browse/IGNITE-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Goncharuk resolved IGNITE-7212. -------------------------------------- Resolution: Fixed Fixed in master. > Load stoped after server node kill > ---------------------------------- > > Key: IGNITE-7212 > URL: https://issues.apache.org/jira/browse/IGNITE-7212 > Project: Ignite > Issue Type: Bug > Components: general > Affects Versions: 2.4 > Reporter: Ilya Suntsov > Assignee: Alexey Goncharuk > Priority: Critical > Attachments: cfg_log_master_1.zip > > > Scenario: > * Start 4 servers > * Start load tasks on 5 clients > * Kill 1 server > * Waiting for rebalancing > * Kill 1 server > Result: > After the kill of second servers node load stoped. > In servers logs I see messages like this: > {noformat} > [2017-12-15 11:23:50][DEBUG][grid-nio-worker-tcp-comm-0-#41] Remote client > closed connection: GridSelectorNioSessionImpl [worker=DirectNioClientWorker > [super=AbstractNioClientWorker [idx=0, bytesRcvd=130952565, > bytesSent=131203245, bytesRcvd0=3069200, bytesSent0=3068083, select=true, > super=GridWorker [name=grid-nio-worker-tcp-comm-0, igniteInstanceName=null, > finished=false, hashCode=1748650517, interrupted=false, > runner=grid-nio-worker-tcp-comm-0-#41]]], > writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], > readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], > inRecovery=GridNioRecoveryDescriptor [acked=1024, resendCnt=0, rcvCnt=1026, > sentCnt=1029, reserved=true, lastAck=1024, nodeLeft=false, > node=TcpDiscoveryNode [id=b7cfaa4e-b3b7-4485-a421-c731d9ed869d, > addrs=[127.0.0.1, 172.31.20.3], > sockAddrs=[ip-172-31-20-3.us-east-2.compute.internal/172.31.20.3:47500, > /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, > lastExchangeTime=1513335739604, loc=false, ver=2.4.0#20171214-sha1:da782958, > isClient=false], connected=false, connectCnt=1, queueLimit=4096, > reserveCnt=1, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor > [acked=1024, resendCnt=0, rcvCnt=1026, sentCnt=1029, reserved=true, > lastAck=1024, nodeLeft=false, node=TcpDiscoveryNode > [id=b7cfaa4e-b3b7-4485-a421-c731d9ed869d, addrs=[127.0.0.1, 172.31.20.3], > sockAddrs=[ip-172-31-20-3.us-east-2.compute.internal/172.31.20.3:47500, > /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, > lastExchangeTime=1513335739604, loc=false, ver=2.4.0#20171214-sha1:da782958, > isClient=false], connected=false, connectCnt=1, queueLimit=4096, > reserveCnt=1, pairedConnections=false], super=GridNioSessionImpl > [locAddr=/172.31.23.220:41732, > rmtAddr=ip-172-31-20-3.us-east-2.compute.internal/172.31.20.3:47100, > createTime=1513335774008, closeTime=0, bytesSent=131203245, > bytesRcvd=130952565, bytesSent0=3068083, bytesRcvd0=3069200, > sndSchedTime=1513335774008, lastSndTime=1513337029027, > lastRcvTime=1513337029027, readsPaused=false, > filterChain=FilterChain[filters=[GridNioCodecFilter > [parser=org.apache.ignite.internal.util.nio.GridDirectParser@11ae7d3b, > directMode=true], GridConnectionBytesVerifyFilter], accepted=false]] > [2017-12-15 11:23:50][WARN ][tcp-disco-msg-worker-#2] Failed to send message > to next node [msg=TcpDiscoveryConnectionCheckMessage > [super=TcpDiscoveryAbstractMessage [sndNodeId=null, > id=6c7f6d95061-c3cf9fe4-ab13-4d95-ace3-84a54cd73e08, verifierNodeId=null, > topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], > next=TcpDiscoveryNode [id=b7cfaa4e-b3b7-4485-a421-c731d9ed869d, > addrs=[127.0.0.1, 172.31.20.3], > sockAddrs=[ip-172-31-20-3.us-east-2.compute.internal/172.31.20.3:47500, > /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, > lastExchangeTime=1513335739604, loc=false, ver=2.4.0#20171214-sha1:da782958, > isClient=false], errMsg=Failed to send message to next node > [msg=TcpDiscoveryConnectionCheckMessage [super=TcpDiscoveryAbstractMessage > [sndNodeId=null, id=6c7f6d95061-c3cf9fe4-ab13-4d95-ace3-84a54cd73e08, > verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, > isClient=false]], next=ClusterNode [id=b7cfaa4e-b3b7-4485-a421-c731d9ed869d, > order=1, addr=[127.0.0.1, 172.31.20.3], daemon=false]]] > [2017-12-15 11:23:50][DEBUG][grid-nio-worker-tcp-comm-0-#41] Session was > closed but there are unacknowledged messages, will try to reconnect > [rmtNode=b7cfaa4e-b3b7-4485-a421-c731d9ed869d] > [2017-12-15 11:23:50][DEBUG][tcp-comm-worker-#1] Recovery reconnect > [rmtNode=b7cfaa4e-b3b7-4485-a421-c731d9ed869d] > [2017-12-15 11:23:50][DEBUG][tcp-comm-worker-#1] Creating NIO client to node: > TcpDiscoveryNode [id=b7cfaa4e-b3b7-4485-a421-c731d9ed869d, addrs=[127.0.0.1, > 172.31.20.3], > sockAddrs=[ip-172-31-20-3.us-east-2.compute.internal/172.31.20.3:47500, > /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, > lastExchangeTime=1513335739604, loc=false, ver=2.4.0#20171214-sha1:da782958, > isClient=false] > [2017-12-15 11:23:50][DEBUG][tcp-comm-worker-#1] Addresses resolved from > attributes [rmtNode=b7cfaa4e-b3b7-4485-a421-c731d9ed869d, > addrs=[ip-172-31-20-3.us-east-2.compute.internal/172.31.20.3:47100, > /127.0.0.1:47100], isRmtAddrsExist=true] > [2017-12-15 11:23:50][DEBUG][tcp-comm-worker-#1] Client creation failed > [addr=ip-172-31-20-3.us-east-2.compute.internal/172.31.20.3:47100, > err=java.net.ConnectException: Connection refused] > [2017-12-15 11:23:50][WARN ][tcp-comm-worker-#1] Connect timed out (consider > increasing 'failureDetectionTimeout' configuration property) > [addr=ip-172-31-20-3.us-east-2.compute.internal/172.31.20.3:47100, > failureDetectionTimeout=10000] > [2017-12-15 11:23:50][WARN ][disco-event-worker-#61] Node FAILED: > TcpDiscoveryNode [id=b7cfaa4e-b3b7-4485-a421-c731d9ed869d, addrs=[127.0.0.1, > 172.31.20.3], > sockAddrs=[ip-172-31-20-3.us-east-2.compute.internal/172.31.20.3:47500, > /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, > lastExchangeTime=1513335739604, loc=false, ver=2.4.0#20171214-sha1:da782958, > isClient=false] > [2017-12-15 11:23:50][DEBUG][tcp-comm-worker-#1] Skipping local address > [addr=/127.0.0.1:47100, locAddrs=[172.31.20.3, 127.0.0.1], > node=TcpDiscoveryNode [id=b7cfaa4e-b3b7-4485-a421-c731d9ed869d, > addrs=[127.0.0.1, 172.31.20.3], > sockAddrs=[ip-172-31-20-3.us-east-2.compute.internal/172.31.20.3:47500, > /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, > lastExchangeTime=1513335739604, loc=false, ver=2.4.0#20171214-sha1:da782958, > isClient=false]] > [2017-12-15 11:23:50][DEBUG][tcp-comm-worker-#1] Skipping local address > [addr=/127.0.0.1:47100, locAddrs=[172.31.20.3, 127.0.0.1], > node=TcpDiscoveryNode [id=b7cfaa4e-b3b7-4485-a421-c731d9ed869d, > addrs=[127.0.0.1, 172.31.20.3], > sockAddrs=[ip-172-31-20-3.us-east-2.compute.internal/172.31.20.3:47500, > /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, > lastExchangeTime=1513335739604, loc=false, ver=2.4.0#20171214-sha1:da782958, > isClient=false]] > {noformat} > Logs and configs was attached to this ticket. -- This message was sent by Atlassian JIRA (v6.4.14#64029)