[ https://issues.apache.org/jira/browse/IGNITE-23618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kirill Tkalenko updated IGNITE-23618: ------------------------------------- Epic Link: IGNITE-20166 > Fix dead lock when restoring metastorage > ---------------------------------------- > > Key: IGNITE-23618 > URL: https://issues.apache.org/jira/browse/IGNITE-23618 > Project: Ignite > Issue Type: Bug > Reporter: Kirill Tkalenko > Assignee: Kirill Tkalenko > Priority: Major > Labels: ignite-3 > Fix For: 3.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Dead lock was found during metastorage recovery, stack trace of the problem: > {noformat} > [2024-11-05T11:33:20,693][WARN > ][%iicrt_ccbp_1%common-scheduler-0][FailureManager] Possible failure > suppressed according to a configured handler [hnd=NoOpFailureHandler > [super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], > failureCtx=SYSTEM_WORKER_BLOCKED] > org.apache.ignite.lang.IgniteException: A critical thread is blocked > for 804 ms that is more than the allowed 500 ms, it is > "%iicrt_ccbp_1%MessagingService-inbound-0-0" prio=10 Id=4987 WAITING on > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@558a94f4 owned > by "%iicrt_ccbp_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0" Id=5020 > at java.base@11.0.17/jdk.internal.misc.Unsafe.park(Native Method) > - waiting on > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@558a94f4 > at > java.base@11.0.17/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194) > at > java.base@11.0.17/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885) > at > java.base@11.0.17/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:917) > at > java.base@11.0.17/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1240) > at > java.base@11.0.17/java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:959) > at > app//org.apache.ignite.internal.metastorage.server.AbstractKeyValueStorage.setRecoveryRevisionsListener(AbstractKeyValueStorage.java:318) > at > app//org.apache.ignite.internal.metastorage.impl.RecoveryRevisionsListenerImpl.completeRecoveryFinishFutureIfPossible(RecoveryRevisionsListenerImpl.java:92) > at > app//org.apache.ignite.internal.metastorage.impl.RecoveryRevisionsListenerImpl.setTargetRevisions(RecoveryRevisionsListenerImpl.java:73) > at > app//org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.lambda$recover$1(MetaStorageManagerImpl.java:327) > at > app//org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$$Lambda$1899/0x0000000800bf4c40.accept(Unknown > Source) > at > java.base@11.0.17/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:714) > at > java.base@11.0.17/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > at > java.base@11.0.17/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) > at > app//org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$sendWithRetry$39(RaftGroupServiceImpl.java:592) > at > app//org.apache.ignite.internal.raft.RaftGroupServiceImpl$$Lambda$1798/0x0000000800bb6c40.accept(Unknown > Source) > at > java.base@11.0.17/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) > at > java.base@11.0.17/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) > at > java.base@11.0.17/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > at > java.base@11.0.17/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) > at > app//org.apache.ignite.internal.network.DefaultMessagingService.onInvokeResponse(DefaultMessagingService.java:587) > at > app//org.apache.ignite.internal.network.DefaultMessagingService.handleInvokeResponse(DefaultMessagingService.java:478) > at > app//org.apache.ignite.internal.network.DefaultMessagingService.lambda$handleMessageFromNetwork$4(DefaultMessagingService.java:412) > at > app//org.apache.ignite.internal.network.DefaultMessagingService$$Lambda$1866/0x0000000800bde040.run(Unknown > Source) > at > java.base@11.0.17/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base@11.0.17/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base@11.0.17/java.lang.Thread.run(Thread.java:834) > > Number of locked synchronizers = 2 > - java.util.concurrent.locks.ReentrantLock$NonfairSync@c55b7e > - java.util.concurrent.ThreadPoolExecutor$Worker@526479eb > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)