[ 
https://issues.apache.org/jira/browse/IGNITE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541697#comment-16541697
 ] 

Anton Vinogradov commented on IGNITE-8783:
------------------------------------------

Hang reason found 
at 
{{org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager#createClientLatch}}
you can see code
{noformat}
 // There is final ack for created latch.
if (pendingAcks.containsKey(latchId)) {
        latch.complete();
        pendingAcks.remove(latchId); // this cause pending acks loss when 
coordinator failure was not handled yet (eg. we handling another node fail)
}
else
        clientLatches.put(latchId, latch);
{noformat}

so, I propose to replace this code with simple 

{noformat}
clientLatches.put(latchId, latch);
{noformat}

[~Jokser],
Could you please explain idea of handling final message from old_coordinator?
As far as I see - latches will be recreated on each topology change and acks 
will be resent.

> Failover tests periodically cause hanging of the whole Data Structures suite 
> on TC
> ----------------------------------------------------------------------------------
>
>                 Key: IGNITE-8783
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8783
>             Project: Ignite
>          Issue Type: Bug
>          Components: data structures
>            Reporter: Ivan Rakov
>            Assignee: Anton Vinogradov
>            Priority: Major
>              Labels: MakeTeamcityGreenAgain
>
> History of suite runs: 
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_DataStructures&tab=buildTypeHistoryList&branch_IgniteTests24Java8=%3Cdefault%3E
> Chance of suite hang is 18% in master (based on previous 50 runs).
> Hang is always caused by one of the following failover tests:
> {noformat}
> GridCacheReplicatedDataStructuresFailoverSelfTest#testAtomicSequenceConstantTopologyChange
> GridCachePartitionedDataStructuresFailoverSelfTest#testFairReentrantLockConstantTopologyChangeNonFailoverSafe
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to