Alexey Scherbakov created IGNITE-28509:
------------------------------------------

             Summary: Avoid internal error exception if raft leader is not 
available
                 Key: IGNITE-28509
                 URL: https://issues.apache.org/jira/browse/IGNITE-28509
             Project: Ignite
          Issue Type: Improvement
            Reporter: Alexey Scherbakov


I've observed the following exception in the scenario, there an implicit 
transaction is attemtped over partition with lost majority.

It causes the following exception:
{noformat}
cause = {IgniteException@21643} "org.apache.ignite.lang.IgniteException: 
IGN-CMN-65535 Send with retry timed out [retryCount = 50, groupId = 19_part_0, 
traceId = null, request = 
org.apache.ignite.raft.jraft.rpc.GetLeaderRequestImpl, originCommand = 
org.apache.ignite.internal.partition.replicator.network.command.UpdateCommandV2Impl,
 retryReasons = [[time=1775133944141, msg=Peer idrrt_tmrimilsp_3:0 returned 
code UNKNOWN: No leader at term 0.; attemptWaitDuration=203, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,141], [time=1775133944343, msg=Peer 
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,343], [time=1775133944547, msg=Peer 
idrrt_tmrimilsp_3:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,547], [time=1775133944749, msg=Peer 
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1"
 errorPrefix = "IGN"
 groupName = "CMN"
 code = 131071
 traceId = {UUID@25463} "53b49c5d-a27f-4d0c-b3b0-8b539e792813"
 backtrace = {Object[5]@25464} 
 detailMessage = "Send with retry timed out [retryCount = 50, groupId = 
19_part_0, traceId = null, request = 
org.apache.ignite.raft.jraft.rpc.GetLeaderRequestImpl, originCommand = 
org.apache.ignite.internal.partition.replicator.network.command.UpdateCommandV2Impl,
 retryReasons = [[time=1775133944141, msg=Peer idrrt_tmrimilsp_3:0 returned 
code UNKNOWN: No leader at term 0.; attemptWaitDuration=203, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,141], [time=1775133944343, msg=Peer 
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,343], [time=1775133944547, msg=Peer 
idrrt_tmrimilsp_3:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,547], [time=1775133944749, msg=Peer 
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,749], [time=177"
 cause = {TimeoutException@25466} "java.util.concurrent.TimeoutException: Send 
with retry timed out [retryCount = 50, groupId = 19_part_0, traceId = null, 
request = org.apache.ignite.raft.jraft.rpc.GetLeaderRequestImpl, originCommand 
= 
org.apache.ignite.internal.partition.replicator.network.command.UpdateCommandV2Impl,
 retryReasons = [[time=1775133944141, msg=Peer idrrt_tmrimilsp_3:0 returned 
code UNKNOWN: No leader at term 0.; attemptWaitDuration=203, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,141], [time=1775133944343, msg=Peer 
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,343], [time=1775133944547, msg=Peer 
idrrt_tmrimilsp_3:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,547], [time=1775133944749, msg=Peer 
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, attemptStartT"
 stackTrace = {StackTraceElement[19]@25470} 
 depth = 19
 suppressedExceptions = {Collections$EmptyList@25468}  size = 0{noformat}
A leader is not available, and the internal exception is reported to the 
caller, because no mapping for TimeoutException exists.

We need to provide meaningful exception to the user in this scenario.

мы тут не можем достучаться до лидера и выбрасываем наружу INTERNAL_ERROR, 
потому что мапинга для TimeoutException нет



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to