[ 
https://issues.apache.org/jira/browse/IGNITE-25808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Chudov updated IGNITE-25808:
----------------------------------
    Description: 
If there are hundreds of tables and there is a delay in the placement driver 
TopologyTracker for any reason, the placement driver knows nothing about the 
topology change and keeps choosing the offline node as a candidate. It is not a 
critical issue, but may fill the logs with warnings in just a few seconds:
{code:java}
 2025-05-06 10:06:44:672 +0000 
[WARNING][%gridgain-2.novalocal%JRaft-FSMCaller-Disruptor-metastorage_group_stripe_0-0][LeaseNegotiator]
 Lease was not negotiated due to exception [lease=Lease 
[leaseholder=gridgain-5.novalocal, 
leaseholderId=3844f6cb-ee97-49bd-8d40-5b96d0a6bc82, accepted=false, 
startTime=HybridTimestamp [physical=2025-05-06 10:06:44:602 +0000, logical=114, 
composite=114460328237596786], expirationTime=HybridTimestamp 
[physical=2025-05-06 10:08:44:602 +0000, logical=0, 
composite=114460336101916672], prolongable=true, proposedCandidate=null, 
replicationGroupId=590_part_13]]
org.apache.ignite.internal.network.UnresolvableConsistentIdException: 
IGN-NETWORK-1 TraceId:57c4ef42-4786-431b-a6ff-a7d79edcfd6a Recipient consistent 
ID cannot be resolved: gridgain-5.novalocal
    at 
org.apache.ignite.internal.network.DefaultMessagingService.invoke(DefaultMessagingService.java:231)
    at 
org.apache.ignite.internal.network.MessagingService.invoke(MessagingService.java:190)
    at 
org.apache.ignite.internal.placementdriver.negotiation.LeaseNegotiator.negotiate(LeaseNegotiator.java:63)
    at 
org.apache.ignite.internal.placementdriver.LeaseUpdater$Updater.lambda$updateLeaseBatchInternal$0(LeaseUpdater.java:590)
    at 
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
    at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
    at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
    at 
java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
    at 
org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$sendWithRetry$49(RaftGroupServiceImpl.java:603)
    at 
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
    at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
    at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
    at 
java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
    at 
org.apache.ignite.internal.network.DefaultMessagingService.onInvokeResponse(DefaultMessagingService.java:575)
    at 
org.apache.ignite.internal.network.DefaultMessagingService.send0(DefaultMessagingService.java:261)
    at 
org.apache.ignite.internal.network.DefaultMessagingService.respond(DefaultMessagingService.java:204)
    at 
org.apache.ignite.internal.network.MessagingService.respond(MessagingService.java:107)
    at 
org.apache.ignite.raft.jraft.rpc.impl.IgniteRpcServer$NetworkRpcContext.sendResponse(IgniteRpcServer.java:240)
    at 
org.apache.ignite.raft.jraft.rpc.impl.ActionRequestProcessor.sendResponse(ActionRequestProcessor.java:268)
    at 
org.apache.ignite.raft.jraft.rpc.impl.ActionRequestProcessor$1.result(ActionRequestProcessor.java:177)
    at 
org.apache.ignite.internal.raft.server.impl.JraftServerImpl$WriteCommandIterator$1.result(JraftServerImpl.java:995)
    at 
org.apache.ignite.internal.metastorage.server.raft.MetaStorageWriteHandler$ResultCachingClosure.result(MetaStorageWriteHandler.java:453)
    at 
org.apache.ignite.internal.metastorage.server.raft.MetaStorageWriteHandler.handleWriteWithTime(MetaStorageWriteHandler.java:217)
    at 
org.apache.ignite.internal.metastorage.server.raft.MetaStorageWriteHandler.handleNonCachedWriteCommand(MetaStorageWriteHandler.java:153)
    at 
org.apache.ignite.internal.metastorage.server.raft.MetaStorageWriteHandler.handleWriteCommand(MetaStorageWriteHandler.java:123)
    at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
    at 
org.apache.ignite.internal.metastorage.server.raft.MetaStorageListener.onWrite(MetaStorageListener.java:193)
    at 
org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:825)
    at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:570)
    at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:536)
    at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:454)
    at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:123)
    at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:117)
    at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:322)
    at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:279)
    at 
com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:167)
    at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:122)
    at java.base/java.lang.Thread.run(Thread.java:840){code}
As a possible solution, we may throttle this log.

*Scenario:*
 * big number of replication groups
 * lease updater chooses lease candidates
 * one node leaves the cluster
 * lease negotiator fails to start the lease agreements for leases that have 
the left node as a candidate
 * on each lease, it writes an error to log and there are too many of them (as 
the number of replication groups is big). 

We can print the error just once per negotiation round.

  was:
If there are hundreds of tables and there is a delay in the placement driver 
TopologyTracker for any reason, the placement driver knows nothing about the 
topology change and keeps choosing the offline node as a candidate. It is not a 
critical issue, but may fill the logs with warnings in just a few seconds:
{code:java}
 2025-05-06 10:06:44:672 +0000 
[WARNING][%gridgain-2.novalocal%JRaft-FSMCaller-Disruptor-metastorage_group_stripe_0-0][LeaseNegotiator]
 Lease was not negotiated due to exception [lease=Lease 
[leaseholder=gridgain-5.novalocal, 
leaseholderId=3844f6cb-ee97-49bd-8d40-5b96d0a6bc82, accepted=false, 
startTime=HybridTimestamp [physical=2025-05-06 10:06:44:602 +0000, logical=114, 
composite=114460328237596786], expirationTime=HybridTimestamp 
[physical=2025-05-06 10:08:44:602 +0000, logical=0, 
composite=114460336101916672], prolongable=true, proposedCandidate=null, 
replicationGroupId=590_part_13]]
org.apache.ignite.internal.network.UnresolvableConsistentIdException: 
IGN-NETWORK-1 TraceId:57c4ef42-4786-431b-a6ff-a7d79edcfd6a Recipient consistent 
ID cannot be resolved: gridgain-5.novalocal
    at 
org.apache.ignite.internal.network.DefaultMessagingService.invoke(DefaultMessagingService.java:231)
    at 
org.apache.ignite.internal.network.MessagingService.invoke(MessagingService.java:190)
    at 
org.apache.ignite.internal.placementdriver.negotiation.LeaseNegotiator.negotiate(LeaseNegotiator.java:63)
    at 
org.apache.ignite.internal.placementdriver.LeaseUpdater$Updater.lambda$updateLeaseBatchInternal$0(LeaseUpdater.java:590)
    at 
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
    at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
    at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
    at 
java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
    at 
org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$sendWithRetry$49(RaftGroupServiceImpl.java:603)
    at 
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
    at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
    at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
    at 
java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
    at 
org.apache.ignite.internal.network.DefaultMessagingService.onInvokeResponse(DefaultMessagingService.java:575)
    at 
org.apache.ignite.internal.network.DefaultMessagingService.send0(DefaultMessagingService.java:261)
    at 
org.apache.ignite.internal.network.DefaultMessagingService.respond(DefaultMessagingService.java:204)
    at 
org.apache.ignite.internal.network.MessagingService.respond(MessagingService.java:107)
    at 
org.apache.ignite.raft.jraft.rpc.impl.IgniteRpcServer$NetworkRpcContext.sendResponse(IgniteRpcServer.java:240)
    at 
org.apache.ignite.raft.jraft.rpc.impl.ActionRequestProcessor.sendResponse(ActionRequestProcessor.java:268)
    at 
org.apache.ignite.raft.jraft.rpc.impl.ActionRequestProcessor$1.result(ActionRequestProcessor.java:177)
    at 
org.apache.ignite.internal.raft.server.impl.JraftServerImpl$WriteCommandIterator$1.result(JraftServerImpl.java:995)
    at 
org.apache.ignite.internal.metastorage.server.raft.MetaStorageWriteHandler$ResultCachingClosure.result(MetaStorageWriteHandler.java:453)
    at 
org.apache.ignite.internal.metastorage.server.raft.MetaStorageWriteHandler.handleWriteWithTime(MetaStorageWriteHandler.java:217)
    at 
org.apache.ignite.internal.metastorage.server.raft.MetaStorageWriteHandler.handleNonCachedWriteCommand(MetaStorageWriteHandler.java:153)
    at 
org.apache.ignite.internal.metastorage.server.raft.MetaStorageWriteHandler.handleWriteCommand(MetaStorageWriteHandler.java:123)
    at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
    at 
org.apache.ignite.internal.metastorage.server.raft.MetaStorageListener.onWrite(MetaStorageListener.java:193)
    at 
org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:825)
    at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:570)
    at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:536)
    at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:454)
    at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:123)
    at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:117)
    at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:322)
    at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:279)
    at 
com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:167)
    at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:122)
    at java.base/java.lang.Thread.run(Thread.java:840){code}
As a possible solution, we may throttle this log.
 


> Lease negotiator may flood the log if a candidate left and topology tracker 
> not updated
> ---------------------------------------------------------------------------------------
>
>                 Key: IGNITE-25808
>                 URL: https://issues.apache.org/jira/browse/IGNITE-25808
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Denis Chudov
>            Priority: Major
>              Labels: ignite-3
>
> If there are hundreds of tables and there is a delay in the placement driver 
> TopologyTracker for any reason, the placement driver knows nothing about the 
> topology change and keeps choosing the offline node as a candidate. It is not 
> a critical issue, but may fill the logs with warnings in just a few seconds:
> {code:java}
>  2025-05-06 10:06:44:672 +0000 
> [WARNING][%gridgain-2.novalocal%JRaft-FSMCaller-Disruptor-metastorage_group_stripe_0-0][LeaseNegotiator]
>  Lease was not negotiated due to exception [lease=Lease 
> [leaseholder=gridgain-5.novalocal, 
> leaseholderId=3844f6cb-ee97-49bd-8d40-5b96d0a6bc82, accepted=false, 
> startTime=HybridTimestamp [physical=2025-05-06 10:06:44:602 +0000, 
> logical=114, composite=114460328237596786], expirationTime=HybridTimestamp 
> [physical=2025-05-06 10:08:44:602 +0000, logical=0, 
> composite=114460336101916672], prolongable=true, proposedCandidate=null, 
> replicationGroupId=590_part_13]]
> org.apache.ignite.internal.network.UnresolvableConsistentIdException: 
> IGN-NETWORK-1 TraceId:57c4ef42-4786-431b-a6ff-a7d79edcfd6a Recipient 
> consistent ID cannot be resolved: gridgain-5.novalocal
>     at 
> org.apache.ignite.internal.network.DefaultMessagingService.invoke(DefaultMessagingService.java:231)
>     at 
> org.apache.ignite.internal.network.MessagingService.invoke(MessagingService.java:190)
>     at 
> org.apache.ignite.internal.placementdriver.negotiation.LeaseNegotiator.negotiate(LeaseNegotiator.java:63)
>     at 
> org.apache.ignite.internal.placementdriver.LeaseUpdater$Updater.lambda$updateLeaseBatchInternal$0(LeaseUpdater.java:590)
>     at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
>     at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
>     at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
>     at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
>     at 
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$sendWithRetry$49(RaftGroupServiceImpl.java:603)
>     at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
>     at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
>     at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
>     at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
>     at 
> org.apache.ignite.internal.network.DefaultMessagingService.onInvokeResponse(DefaultMessagingService.java:575)
>     at 
> org.apache.ignite.internal.network.DefaultMessagingService.send0(DefaultMessagingService.java:261)
>     at 
> org.apache.ignite.internal.network.DefaultMessagingService.respond(DefaultMessagingService.java:204)
>     at 
> org.apache.ignite.internal.network.MessagingService.respond(MessagingService.java:107)
>     at 
> org.apache.ignite.raft.jraft.rpc.impl.IgniteRpcServer$NetworkRpcContext.sendResponse(IgniteRpcServer.java:240)
>     at 
> org.apache.ignite.raft.jraft.rpc.impl.ActionRequestProcessor.sendResponse(ActionRequestProcessor.java:268)
>     at 
> org.apache.ignite.raft.jraft.rpc.impl.ActionRequestProcessor$1.result(ActionRequestProcessor.java:177)
>     at 
> org.apache.ignite.internal.raft.server.impl.JraftServerImpl$WriteCommandIterator$1.result(JraftServerImpl.java:995)
>     at 
> org.apache.ignite.internal.metastorage.server.raft.MetaStorageWriteHandler$ResultCachingClosure.result(MetaStorageWriteHandler.java:453)
>     at 
> org.apache.ignite.internal.metastorage.server.raft.MetaStorageWriteHandler.handleWriteWithTime(MetaStorageWriteHandler.java:217)
>     at 
> org.apache.ignite.internal.metastorage.server.raft.MetaStorageWriteHandler.handleNonCachedWriteCommand(MetaStorageWriteHandler.java:153)
>     at 
> org.apache.ignite.internal.metastorage.server.raft.MetaStorageWriteHandler.handleWriteCommand(MetaStorageWriteHandler.java:123)
>     at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
>     at 
> org.apache.ignite.internal.metastorage.server.raft.MetaStorageListener.onWrite(MetaStorageListener.java:193)
>     at 
> org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:825)
>     at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:570)
>     at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:536)
>     at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:454)
>     at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:123)
>     at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:117)
>     at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:322)
>     at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:279)
>     at 
> com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:167)
>     at 
> com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:122)
>     at java.base/java.lang.Thread.run(Thread.java:840){code}
> As a possible solution, we may throttle this log.
> *Scenario:*
>  * big number of replication groups
>  * lease updater chooses lease candidates
>  * one node leaves the cluster
>  * lease negotiator fails to start the lease agreements for leases that have 
> the left node as a candidate
>  * on each lease, it writes an error to log and there are too many of them 
> (as the number of replication groups is big). 
> We can print the error just once per negotiation round.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to