Igor Seliverstov created IGNITE-11238: -----------------------------------------
Summary: Possible hang on exchange Key: IGNITE-11238 URL: https://issues.apache.org/jira/browse/IGNITE-11238 Project: Ignite Issue Type: Bug Components: general Reporter: Igor Seliverstov Currently we may hang on exchange for a while (two network timeouts) waiting for release a latch (see {{GridDhtPartitionsExchangeFuture#waitPartitionRelease releaseLatch}}) in case a processing topology version has not been added to discovery history yet but client acknowledge already received by coordinator: {code:java} [2019-02-06 17:43:17,009][ERROR][sys-#43%mvcc.CacheMvccPartitionedSqlCoordinatorFailoverTest0%][ExchangeLatchManager] Topology AffinityTopologyVersion [topVer=24, minorTopVer=0] not found in discovery history ; consider increasing IGNITE_DISCOVERY_HISTORY_SIZE property. Current value is -1 class org.apache.ignite.IgniteException: Topology AffinityTopologyVersion [topVer=24, minorTopVer=0] not found in discovery history ; consider increasing IGNITE_DISCOVERY_HISTORY_SIZE property. Current value is -1 at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.aliveNodesForTopologyVer(ExchangeLatchManager.java:260) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.getLatchCoordinator(ExchangeLatchManager.java:302) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.processAck(ExchangeLatchManager.java:351) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.lambda$new$0(ExchangeLatchManager.java:121) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1561) at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1189) at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127) at org.apache.ignite.internal.managers.communication.GridIoManager$8.run(GridIoManager.java:1086) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} This way the received ack won't be processed, so, we will be waiting for retry: {code:java} // Try to resend ack. releaseLatch.countDown(); {code} To solve the issue we need to test whether the version is present in discovery history and put it into a pending map if i isn't so (see {{ExchangeLatchManager#pendingAcks}}) -- This message was sent by Atlassian JIRA (v7.6.3#76005)