Kirill Sizov created IGNITE-28543:
-------------------------------------

             Summary: VacuumTxStateReplicaRequest times out
                 Key: IGNITE-28543
                 URL: https://issues.apache.org/jira/browse/IGNITE-28543
             Project: Ignite
          Issue Type: Bug
            Reporter: Kirill Sizov


We have many failed vacuum attempts with the following stacktrace:

{noformat}
2026-01-20 05:25:27:479 +0000 
[ERROR][%cac-dpd-cde-gg-aks-dev-1%partition-operations-9][FailureManager] 
Critical system error detected. Will be handled accordingly to configured 
handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=CRITICAL_ERROR, 
failureCtxId=476be4a6-0f5c-451d-b53f-294d309a2369]
org.apache.ignite.internal.failure.StackTraceCapturingException: IGN-CMN-65535 
Failed to vacuum tx states from the persistent storage. TraceId:f5b8d83b
        at 
org.apache.ignite.internal.failure.FailureManager.process(FailureManager.java:191)
        at 
org.apache.ignite.internal.failure.FailureManager.process(FailureManager.java:168)
        at 
org.apache.ignite.internal.tx.impl.PersistentTxStateVacuumizer.lambda$vacuumPersistentTxStates$0(PersistentTxStateVacuumizer.java:147)
        at 
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)
        at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
 Source)
        at 
java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
        at 
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown 
Source)
        at 
org.apache.ignite.internal.replicator.ReplicaService.lambda$sendToReplicaRaw$1(ReplicaService.java:148)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
Source)
        at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.util.concurrent.CompletionException: 
org.apache.ignite.internal.replicator.exception.ReplicationTimeoutException: 
IGN-REP-3 Replication is timed out [replicaGrpId=2297_part_21] TraceId:f5b8d83b
        at 
java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source)
        at 
java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown 
Source)
        at 
java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown 
Source)
        ... 6 more

{noformat}


And this:

{noformat}
hread [name="%cac-dpd-cde-gg-aks-dev-1%partition-operations-10", id=290, 
state=RUNNABLE, blockCnt=2610322, waitCnt=1208642]
        at 
app//org.apache.ignite.internal.network.direct.stream.DirectByteBufferStreamImplV1.write(DirectByteBufferStreamImplV1.java:2156)
        at 
app//org.apache.ignite.internal.network.direct.stream.DirectByteBufferStreamImplV1.writeCollection(DirectByteBufferStreamImplV1.java:932)
        at 
app//org.apache.ignite.internal.network.direct.stream.DirectByteBufferStreamImplV1.writeSet(DirectByteBufferStreamImplV1.java:950)
        at 
app//org.apache.ignite.internal.network.direct.DirectMessageWriter.writeSet(DirectMessageWriter.java:453)
        at 
app//org.apache.ignite.internal.tx.message.VacuumTxStatesCommandSerializer.writeMessage(VacuumTxStatesCommandSerializer.java:35)
        at 
app//org.apache.ignite.internal.tx.message.VacuumTxStatesCommandSerializer.writeMessage(VacuumTxStatesCommandSerializer.java:9)
        at 
app//org.apache.ignite.internal.network.direct.stream.DirectByteBufferStreamImplV1.writeMessage(DirectByteBufferStreamImplV1.java:856)
        at 
app//org.apache.ignite.internal.raft.util.OptimizedMarshaller.marshall(OptimizedMarshaller.java:120)
        at 
app//org.apache.ignite.internal.table.distributed.schema.ThreadLocalPartitionCommandsMarshaller.marshall(ThreadLocalPartitionCommandsMarshaller.java:44)
        at 
app//org.apache.ignite.internal.raft.client.RaftGroupServiceImpl.lambda$run$44(RaftGroupServiceImpl.java:561)
        at 
app//org.apache.ignite.internal.raft.client.RaftGroupServiceImpl$$Lambda/0x0000000800cc3ed8.apply(Unknown
 Source)
        at 
app//org.apache.ignite.internal.raft.client.RetryContext.<init>(RetryContext.java:116)
        at 
app//org.apache.ignite.internal.raft.client.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:680)
        at 
app//org.apache.ignite.internal.raft.client.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:635)
        at 
app//org.apache.ignite.internal.raft.client.RaftGroupServiceImpl.run(RaftGroupServiceImpl.java:574)
        at 
app//org.apache.ignite.internal.raft.client.RaftGroupServiceImpl.run(RaftGroupServiceImpl.java:545)
        at 
app//org.apache.ignite.internal.raft.client.TopologyAwareRaftGroupService.run(TopologyAwareRaftGroupService.java:487)
        at 
app//org.apache.ignite.internal.raft.ExecutorInclinedRaftCommandRunner.run(ExecutorInclinedRaftCommandRunner.java:36)
        at 
app//org.apache.ignite.internal.partition.replicator.ReplicationRaftCommandApplicator.applyCommand(ReplicationRaftCommandApplicator.java:89)
        at 
app//org.apache.ignite.internal.partition.replicator.handlers.VacuumTxStateReplicaRequestHandler.handle(VacuumTxStateReplicaRequestHandler.java:48)
        at 
app//org.apache.ignite.internal.partition.replicator.ZonePartitionReplicaListener.processZoneReplicaRequest(ZonePartitionReplicaListener.java:305)
        at 
app//org.apache.ignite.internal.partition.replicator.ZonePartitionReplicaListener.processRequest(ZonePartitionReplicaListener.java:241)
        at 
app//org.apache.ignite.internal.partition.replicator.ZonePartitionReplicaListener.lambda$invoke$0(ZonePartitionReplicaListener.java:208)
        at 
app//org.apache.ignite.internal.partition.replicator.ZonePartitionReplicaListener$$Lambda/0x0000000800f73b58.apply(Unknown
 Source)
        at 
[email protected]/java.util.concurrent.CompletableFuture.uniComposeStage(Unknown 
Source)
        at 
[email protected]/java.util.concurrent.CompletableFuture.thenCompose(Unknown 
Source)
        at 
app//org.apache.ignite.internal.partition.replicator.ZonePartitionReplicaListener.invoke(ZonePartitionReplicaListener.java:208)
        at 
app//org.apache.ignite.internal.replicator.ZonePartitionReplicaImpl.processRequest(ZonePartitionReplicaImpl.java:67)
        at 
app//org.apache.ignite.internal.replicator.ReplicaManager.handleReplicaRequest(ReplicaManager.java:382)
        at 
app//org.apache.ignite.internal.replicator.ReplicaManager.lambda$onReplicaMessageReceived$0(ReplicaManager.java:313)
        at 
app//org.apache.ignite.internal.replicator.ReplicaManager$$Lambda/0x0000000800f774b8.run(Unknown
 Source)
        at 
[email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
Source)
        at 
[email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
Source)
        at [email protected]/java.lang.Thread.runWith(Unknown Source)
        at [email protected]/java.lang.Thread.run(Unknown Source)

    Locked synchronizers:
        java.util.concurrent.ThreadPoolExecutor$Worker@297a2b4b

{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to