[ https://issues.apache.org/jira/browse/IGNITE-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ivan Rakov updated IGNITE-12709: -------------------------------- Release Note: Fixed potential partition map exchange hanging on Zookeeper discovery clusters > Server latch initialized after client latch in Zookeeper discovery > ------------------------------------------------------------------ > > Key: IGNITE-12709 > URL: https://issues.apache.org/jira/browse/IGNITE-12709 > Project: Ignite > Issue Type: Bug > Reporter: Anton Kalashnikov > Assignee: Anton Kalashnikov > Priority: Major > Fix For: 2.9 > > Time Spent: 20m > Remaining Estimate: 0h > > The coordinator node missed latch message from the client because it doesn't > receive a triggered message of exchange. So it leads to infinity wait of > answer from the coordinator. > {noformat} > [2019-10-23 > 12:49:42,110]\[ERROR]\[sys-#39470%continuous.GridEventConsumeSelfTest0%]\[GridIoManager] > An error occurred processing the message \[msg=GridIoMessage \[plc=2, > topic=TOPIC_EXCHANGE, topicOrd=31, ordered=fa > lse, timeout=0, skipOnTimeout=false, > msg=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.LatchAckMessage@7699f4f2], > nodeId=857a40a8-f384-4740-816c-dd54d3a00001]. > class org.apache.ignite.IgniteException: Topology AffinityTopologyVersion > \[topVer=54, minorTopVer=0] not found in discovery history ; consider > increasing IGNITE_DISCOVERY_HISTORY_SIZE property. Current value is > -1 > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.aliveNodesForTopologyVer(ExchangeLatchManager.java:292) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.getLatchCoordinator(ExchangeLatchManager.java:334) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.processAck(ExchangeLatchManager.java:379) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.lambda$new$0(ExchangeLatchManager.java:119) > at > org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1632) > at > org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1252) > at > org.apache.ignite.internal.managers.communication.GridIoManager.access$4300(GridIoManager.java:143) > at > org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1143) > at > org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:50) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > [2019-10-23 12:50:02,106]\[WARN > ]\[exchange-worker-#39517%continuous.GridEventConsumeSelfTest1%]\[GridDhtPartitionsExchangeFuture] > Unable to await partitions release latch within timeout: ClientLatch > \[coordinator=ZookeeperClusterNode \[id=760ca6b5-f30b-4c40-81b1-5b602c200000, > addrs=\[127.0.0.1], order=1, loc=false, client=false], ackSent=true, > super=CompletableLatch \[id=CompletableLatchUid \[id=exchange, > topVer=AffinityTopologyVersion \[topVer=54, minorTopVer=0]]]] > [2019-10-23 12:50:02,192]\[WARN > ]\[exchange-worker-#39469%continuous.GridEventConsumeSelfTest0%]\[GridDhtPartitionsExchangeFuture] > Unable to await partitions release latch within timeout: ServerLatch > \[permits=1, pendingAcks=HashSet \[06c3094b-c1f3-4fe8-81e8-22cb66000002], > super=CompletableLatch \[id=CompletableLatchUid \[id=exchange, > topVer=AffinityTopologyVersion \[topVer=54, minorTopVer=0]]]] > {noformat} > Reproduced by > org.apache.ignite.internal.processors.continuous.GridEventConsumeSelfTest#testMultithreadedWithNodeRestart -- This message was sent by Atlassian Jira (v8.3.4#803005)