[ https://issues.apache.org/jira/browse/IGNITE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541697#comment-16541697 ]
Anton Vinogradov commented on IGNITE-8783: ------------------------------------------ Hang reason found at {{org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager#createClientLatch}} you can see code {noformat} // There is final ack for created latch. if (pendingAcks.containsKey(latchId)) { latch.complete(); pendingAcks.remove(latchId); // this cause pending acks loss when coordinator failure was not handled yet (eg. we handling another node fail) } else clientLatches.put(latchId, latch); {noformat} so, I propose to replace this code with simple {noformat} clientLatches.put(latchId, latch); {noformat} [~Jokser], Could you please explain idea of handling final message from old_coordinator? As far as I see - latches will be recreated on each topology change and acks will be resent. > Failover tests periodically cause hanging of the whole Data Structures suite > on TC > ---------------------------------------------------------------------------------- > > Key: IGNITE-8783 > URL: https://issues.apache.org/jira/browse/IGNITE-8783 > Project: Ignite > Issue Type: Bug > Components: data structures > Reporter: Ivan Rakov > Assignee: Anton Vinogradov > Priority: Major > Labels: MakeTeamcityGreenAgain > > History of suite runs: > https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_DataStructures&tab=buildTypeHistoryList&branch_IgniteTests24Java8=%3Cdefault%3E > Chance of suite hang is 18% in master (based on previous 50 runs). > Hang is always caused by one of the following failover tests: > {noformat} > GridCacheReplicatedDataStructuresFailoverSelfTest#testAtomicSequenceConstantTopologyChange > GridCachePartitionedDataStructuresFailoverSelfTest#testFairReentrantLockConstantTopologyChangeNonFailoverSafe > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)