Igor Seliverstov created IGNITE-11238:
-----------------------------------------

             Summary: Possible hang on exchange
                 Key: IGNITE-11238
                 URL: https://issues.apache.org/jira/browse/IGNITE-11238
             Project: Ignite
          Issue Type: Bug
          Components: general
            Reporter: Igor Seliverstov


Currently we may hang on exchange for a while (two network timeouts) waiting 
for release a latch (see {{GridDhtPartitionsExchangeFuture#waitPartitionRelease 
releaseLatch}}) in case a processing topology version has not been added to 
discovery history yet but client acknowledge already received by coordinator:

{code:java}
[2019-02-06 
17:43:17,009][ERROR][sys-#43%mvcc.CacheMvccPartitionedSqlCoordinatorFailoverTest0%][ExchangeLatchManager]
 Topology AffinityTopologyVersion [topVer=24, minorTopVer=0] not found in 
discovery history ; consider increasing IGNITE_DISCOVERY_HISTORY_SIZE property. 
Current value is -1
class org.apache.ignite.IgniteException: Topology AffinityTopologyVersion 
[topVer=24, minorTopVer=0] not found in discovery history ; consider increasing 
IGNITE_DISCOVERY_HISTORY_SIZE property. Current value is -1
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.aliveNodesForTopologyVer(ExchangeLatchManager.java:260)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.getLatchCoordinator(ExchangeLatchManager.java:302)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.processAck(ExchangeLatchManager.java:351)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.lambda$new$0(ExchangeLatchManager.java:121)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1561)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1189)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager$8.run(GridIoManager.java:1086)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
{code}

This way the received ack won't be processed, so, we will be waiting for retry:
{code:java}
                    // Try to resend ack.
                    releaseLatch.countDown();
{code}

To solve the issue we need to test whether the version is present in discovery 
history and put it into a pending map if i isn't so (see 
{{ExchangeLatchManager#pendingAcks}})



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to