[
https://issues.apache.org/jira/browse/IGNITE-28543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kirill Sizov updated IGNITE-28543:
----------------------------------
Labels: ignite-3 (was: )
> VacuumTxStateReplicaRequest times out
> -------------------------------------
>
> Key: IGNITE-28543
> URL: https://issues.apache.org/jira/browse/IGNITE-28543
> Project: Ignite
> Issue Type: Bug
> Reporter: Kirill Sizov
> Priority: Major
> Labels: ignite-3
>
> We have many failed vacuum attempts with the following stacktrace:
> {noformat}
> 2026-01-20 05:25:27:479 +0000
> [ERROR][%cac-dpd-cde-gg-aks-dev-1%partition-operations-9][FailureManager]
> Critical system error detected. Will be handled accordingly to configured
> handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=CRITICAL_ERROR,
> failureCtxId=476be4a6-0f5c-451d-b53f-294d309a2369]
> org.apache.ignite.internal.failure.StackTraceCapturingException:
> IGN-CMN-65535 Failed to vacuum tx states from the persistent storage.
> TraceId:f5b8d83b
> at
> org.apache.ignite.internal.failure.FailureManager.process(FailureManager.java:191)
> at
> org.apache.ignite.internal.failure.FailureManager.process(FailureManager.java:168)
> at
> org.apache.ignite.internal.tx.impl.PersistentTxStateVacuumizer.lambda$vacuumPersistentTxStates$0(PersistentTxStateVacuumizer.java:147)
> at
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown
> Source)
> at
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
> Source)
> at
> java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
> at
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown
> Source)
> at
> org.apache.ignite.internal.replicator.ReplicaService.lambda$sendToReplicaRaw$1(ReplicaService.java:148)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: java.util.concurrent.CompletionException:
> org.apache.ignite.internal.replicator.exception.ReplicationTimeoutException:
> IGN-REP-3 Replication is timed out [replicaGrpId=2297_part_21]
> TraceId:f5b8d83b
> at
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown
> Source)
> at
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown
> Source)
> at
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown
> Source)
> ... 6 more
> {noformat}
> And this:
> {noformat}
> hread [name="%cac-dpd-cde-gg-aks-dev-1%partition-operations-10", id=290,
> state=RUNNABLE, blockCnt=2610322, waitCnt=1208642]
> at
> app//org.apache.ignite.internal.network.direct.stream.DirectByteBufferStreamImplV1.write(DirectByteBufferStreamImplV1.java:2156)
> at
> app//org.apache.ignite.internal.network.direct.stream.DirectByteBufferStreamImplV1.writeCollection(DirectByteBufferStreamImplV1.java:932)
> at
> app//org.apache.ignite.internal.network.direct.stream.DirectByteBufferStreamImplV1.writeSet(DirectByteBufferStreamImplV1.java:950)
> at
> app//org.apache.ignite.internal.network.direct.DirectMessageWriter.writeSet(DirectMessageWriter.java:453)
> at
> app//org.apache.ignite.internal.tx.message.VacuumTxStatesCommandSerializer.writeMessage(VacuumTxStatesCommandSerializer.java:35)
> at
> app//org.apache.ignite.internal.tx.message.VacuumTxStatesCommandSerializer.writeMessage(VacuumTxStatesCommandSerializer.java:9)
> at
> app//org.apache.ignite.internal.network.direct.stream.DirectByteBufferStreamImplV1.writeMessage(DirectByteBufferStreamImplV1.java:856)
> at
> app//org.apache.ignite.internal.raft.util.OptimizedMarshaller.marshall(OptimizedMarshaller.java:120)
> at
> app//org.apache.ignite.internal.table.distributed.schema.ThreadLocalPartitionCommandsMarshaller.marshall(ThreadLocalPartitionCommandsMarshaller.java:44)
> at
> app//org.apache.ignite.internal.raft.client.RaftGroupServiceImpl.lambda$run$44(RaftGroupServiceImpl.java:561)
> at
> app//org.apache.ignite.internal.raft.client.RaftGroupServiceImpl$$Lambda/0x0000000800cc3ed8.apply(Unknown
> Source)
> at
> app//org.apache.ignite.internal.raft.client.RetryContext.<init>(RetryContext.java:116)
> at
> app//org.apache.ignite.internal.raft.client.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:680)
> at
> app//org.apache.ignite.internal.raft.client.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:635)
> at
> app//org.apache.ignite.internal.raft.client.RaftGroupServiceImpl.run(RaftGroupServiceImpl.java:574)
> at
> app//org.apache.ignite.internal.raft.client.RaftGroupServiceImpl.run(RaftGroupServiceImpl.java:545)
> at
> app//org.apache.ignite.internal.raft.client.TopologyAwareRaftGroupService.run(TopologyAwareRaftGroupService.java:487)
> at
> app//org.apache.ignite.internal.raft.ExecutorInclinedRaftCommandRunner.run(ExecutorInclinedRaftCommandRunner.java:36)
> at
> app//org.apache.ignite.internal.partition.replicator.ReplicationRaftCommandApplicator.applyCommand(ReplicationRaftCommandApplicator.java:89)
> at
> app//org.apache.ignite.internal.partition.replicator.handlers.VacuumTxStateReplicaRequestHandler.handle(VacuumTxStateReplicaRequestHandler.java:48)
> at
> app//org.apache.ignite.internal.partition.replicator.ZonePartitionReplicaListener.processZoneReplicaRequest(ZonePartitionReplicaListener.java:305)
> at
> app//org.apache.ignite.internal.partition.replicator.ZonePartitionReplicaListener.processRequest(ZonePartitionReplicaListener.java:241)
> at
> app//org.apache.ignite.internal.partition.replicator.ZonePartitionReplicaListener.lambda$invoke$0(ZonePartitionReplicaListener.java:208)
> at
> app//org.apache.ignite.internal.partition.replicator.ZonePartitionReplicaListener$$Lambda/0x0000000800f73b58.apply(Unknown
> Source)
> at
> [email protected]/java.util.concurrent.CompletableFuture.uniComposeStage(Unknown
> Source)
> at
> [email protected]/java.util.concurrent.CompletableFuture.thenCompose(Unknown
> Source)
> at
> app//org.apache.ignite.internal.partition.replicator.ZonePartitionReplicaListener.invoke(ZonePartitionReplicaListener.java:208)
> at
> app//org.apache.ignite.internal.replicator.ZonePartitionReplicaImpl.processRequest(ZonePartitionReplicaImpl.java:67)
> at
> app//org.apache.ignite.internal.replicator.ReplicaManager.handleReplicaRequest(ReplicaManager.java:382)
> at
> app//org.apache.ignite.internal.replicator.ReplicaManager.lambda$onReplicaMessageReceived$0(ReplicaManager.java:313)
> at
> app//org.apache.ignite.internal.replicator.ReplicaManager$$Lambda/0x0000000800f774b8.run(Unknown
> Source)
> at
> [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
> at
> [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
> at [email protected]/java.lang.Thread.runWith(Unknown Source)
> at [email protected]/java.lang.Thread.run(Unknown Source)
> Locked synchronizers:
> java.util.concurrent.ThreadPoolExecutor$Worker@297a2b4b
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)