[ 
https://issues.apache.org/jira/browse/IGNITE-28509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Scherbakov updated IGNITE-28509:
---------------------------------------
    Summary: Avoid internal error then raft leader is not available  (was: 
Avoid internal error exception if raft leader is not available)

> Avoid internal error then raft leader is not available
> ------------------------------------------------------
>
>                 Key: IGNITE-28509
>                 URL: https://issues.apache.org/jira/browse/IGNITE-28509
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Alexey Scherbakov
>            Priority: Major
>              Labels: ignite-3
>
> I've observed the following exception in the scenario, there an implicit 
> transaction is attemtped over partition with lost majority.
> It causes the following exception:
> {noformat}
> cause = {IgniteException@21643} "org.apache.ignite.lang.IgniteException: 
> IGN-CMN-65535 Send with retry timed out [retryCount = 50, groupId = 
> 19_part_0, traceId = null, request = 
> org.apache.ignite.raft.jraft.rpc.GetLeaderRequestImpl, originCommand = 
> org.apache.ignite.internal.partition.replicator.network.command.UpdateCommandV2Impl,
>  retryReasons = [[time=1775133944141, msg=Peer idrrt_tmrimilsp_3:0 returned 
> code UNKNOWN: No leader at term 0.; attemptWaitDuration=203, 
> attemptDuration=1, attemptStartTime=2026-04-02T15:45:44,141], 
> [time=1775133944343, msg=Peer idrrt_tmrimilsp_0:0 returned code UNKNOWN: No 
> leader at term 0.; attemptWaitDuration=202, attemptDuration=1, 
> attemptStartTime=2026-04-02T15:45:44,343], [time=1775133944547, msg=Peer 
> idrrt_tmrimilsp_3:0 returned code UNKNOWN: No leader at term 0.; 
> attemptWaitDuration=202, attemptDuration=1, 
> attemptStartTime=2026-04-02T15:45:44,547], [time=1775133944749, msg=Peer 
> idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
> attemptWaitDuration=202, attemptDuration=1"
>  errorPrefix = "IGN"
>  groupName = "CMN"
>  code = 131071
>  traceId = {UUID@25463} "53b49c5d-a27f-4d0c-b3b0-8b539e792813"
>  backtrace = {Object[5]@25464} 
>  detailMessage = "Send with retry timed out [retryCount = 50, groupId = 
> 19_part_0, traceId = null, request = 
> org.apache.ignite.raft.jraft.rpc.GetLeaderRequestImpl, originCommand = 
> org.apache.ignite.internal.partition.replicator.network.command.UpdateCommandV2Impl,
>  retryReasons = [[time=1775133944141, msg=Peer idrrt_tmrimilsp_3:0 returned 
> code UNKNOWN: No leader at term 0.; attemptWaitDuration=203, 
> attemptDuration=1, attemptStartTime=2026-04-02T15:45:44,141], 
> [time=1775133944343, msg=Peer idrrt_tmrimilsp_0:0 returned code UNKNOWN: No 
> leader at term 0.; attemptWaitDuration=202, attemptDuration=1, 
> attemptStartTime=2026-04-02T15:45:44,343], [time=1775133944547, msg=Peer 
> idrrt_tmrimilsp_3:0 returned code UNKNOWN: No leader at term 0.; 
> attemptWaitDuration=202, attemptDuration=1, 
> attemptStartTime=2026-04-02T15:45:44,547], [time=1775133944749, msg=Peer 
> idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
> attemptWaitDuration=202, attemptDuration=1, 
> attemptStartTime=2026-04-02T15:45:44,749], [time=177"
>  cause = {TimeoutException@25466} "java.util.concurrent.TimeoutException: 
> Send with retry timed out [retryCount = 50, groupId = 19_part_0, traceId = 
> null, request = org.apache.ignite.raft.jraft.rpc.GetLeaderRequestImpl, 
> originCommand = 
> org.apache.ignite.internal.partition.replicator.network.command.UpdateCommandV2Impl,
>  retryReasons = [[time=1775133944141, msg=Peer idrrt_tmrimilsp_3:0 returned 
> code UNKNOWN: No leader at term 0.; attemptWaitDuration=203, 
> attemptDuration=1, attemptStartTime=2026-04-02T15:45:44,141], 
> [time=1775133944343, msg=Peer idrrt_tmrimilsp_0:0 returned code UNKNOWN: No 
> leader at term 0.; attemptWaitDuration=202, attemptDuration=1, 
> attemptStartTime=2026-04-02T15:45:44,343], [time=1775133944547, msg=Peer 
> idrrt_tmrimilsp_3:0 returned code UNKNOWN: No leader at term 0.; 
> attemptWaitDuration=202, attemptDuration=1, 
> attemptStartTime=2026-04-02T15:45:44,547], [time=1775133944749, msg=Peer 
> idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
> attemptWaitDuration=202, attemptDuration=1, attemptStartT"
>  stackTrace = {StackTraceElement[19]@25470} 
>  depth = 19
>  suppressedExceptions = {Collections$EmptyList@25468}  size = 0{noformat}
> A leader is not available, and the internal exception is reported to the 
> caller, because no mapping for TimeoutException exists.
> We need to provide meaningful exception to the user in this scenario.
> мы тут не можем достучаться до лидера и выбрасываем наружу INTERNAL_ERROR, 
> потому что мапинга для TimeoutException нет



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to