[
https://issues.apache.org/jira/browse/IGNITE-28509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Scherbakov updated IGNITE-28509:
---------------------------------------
Summary: Avoid internal error then raft leader is not available (was:
Avoid internal error exception if raft leader is not available)
> Avoid internal error then raft leader is not available
> ------------------------------------------------------
>
> Key: IGNITE-28509
> URL: https://issues.apache.org/jira/browse/IGNITE-28509
> Project: Ignite
> Issue Type: Improvement
> Reporter: Alexey Scherbakov
> Priority: Major
> Labels: ignite-3
>
> I've observed the following exception in the scenario, there an implicit
> transaction is attemtped over partition with lost majority.
> It causes the following exception:
> {noformat}
> cause = {IgniteException@21643} "org.apache.ignite.lang.IgniteException:
> IGN-CMN-65535 Send with retry timed out [retryCount = 50, groupId =
> 19_part_0, traceId = null, request =
> org.apache.ignite.raft.jraft.rpc.GetLeaderRequestImpl, originCommand =
> org.apache.ignite.internal.partition.replicator.network.command.UpdateCommandV2Impl,
> retryReasons = [[time=1775133944141, msg=Peer idrrt_tmrimilsp_3:0 returned
> code UNKNOWN: No leader at term 0.; attemptWaitDuration=203,
> attemptDuration=1, attemptStartTime=2026-04-02T15:45:44,141],
> [time=1775133944343, msg=Peer idrrt_tmrimilsp_0:0 returned code UNKNOWN: No
> leader at term 0.; attemptWaitDuration=202, attemptDuration=1,
> attemptStartTime=2026-04-02T15:45:44,343], [time=1775133944547, msg=Peer
> idrrt_tmrimilsp_3:0 returned code UNKNOWN: No leader at term 0.;
> attemptWaitDuration=202, attemptDuration=1,
> attemptStartTime=2026-04-02T15:45:44,547], [time=1775133944749, msg=Peer
> idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.;
> attemptWaitDuration=202, attemptDuration=1"
> errorPrefix = "IGN"
> groupName = "CMN"
> code = 131071
> traceId = {UUID@25463} "53b49c5d-a27f-4d0c-b3b0-8b539e792813"
> backtrace = {Object[5]@25464}
> detailMessage = "Send with retry timed out [retryCount = 50, groupId =
> 19_part_0, traceId = null, request =
> org.apache.ignite.raft.jraft.rpc.GetLeaderRequestImpl, originCommand =
> org.apache.ignite.internal.partition.replicator.network.command.UpdateCommandV2Impl,
> retryReasons = [[time=1775133944141, msg=Peer idrrt_tmrimilsp_3:0 returned
> code UNKNOWN: No leader at term 0.; attemptWaitDuration=203,
> attemptDuration=1, attemptStartTime=2026-04-02T15:45:44,141],
> [time=1775133944343, msg=Peer idrrt_tmrimilsp_0:0 returned code UNKNOWN: No
> leader at term 0.; attemptWaitDuration=202, attemptDuration=1,
> attemptStartTime=2026-04-02T15:45:44,343], [time=1775133944547, msg=Peer
> idrrt_tmrimilsp_3:0 returned code UNKNOWN: No leader at term 0.;
> attemptWaitDuration=202, attemptDuration=1,
> attemptStartTime=2026-04-02T15:45:44,547], [time=1775133944749, msg=Peer
> idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.;
> attemptWaitDuration=202, attemptDuration=1,
> attemptStartTime=2026-04-02T15:45:44,749], [time=177"
> cause = {TimeoutException@25466} "java.util.concurrent.TimeoutException:
> Send with retry timed out [retryCount = 50, groupId = 19_part_0, traceId =
> null, request = org.apache.ignite.raft.jraft.rpc.GetLeaderRequestImpl,
> originCommand =
> org.apache.ignite.internal.partition.replicator.network.command.UpdateCommandV2Impl,
> retryReasons = [[time=1775133944141, msg=Peer idrrt_tmrimilsp_3:0 returned
> code UNKNOWN: No leader at term 0.; attemptWaitDuration=203,
> attemptDuration=1, attemptStartTime=2026-04-02T15:45:44,141],
> [time=1775133944343, msg=Peer idrrt_tmrimilsp_0:0 returned code UNKNOWN: No
> leader at term 0.; attemptWaitDuration=202, attemptDuration=1,
> attemptStartTime=2026-04-02T15:45:44,343], [time=1775133944547, msg=Peer
> idrrt_tmrimilsp_3:0 returned code UNKNOWN: No leader at term 0.;
> attemptWaitDuration=202, attemptDuration=1,
> attemptStartTime=2026-04-02T15:45:44,547], [time=1775133944749, msg=Peer
> idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.;
> attemptWaitDuration=202, attemptDuration=1, attemptStartT"
> stackTrace = {StackTraceElement[19]@25470}
> depth = 19
> suppressedExceptions = {Collections$EmptyList@25468} size = 0{noformat}
> A leader is not available, and the internal exception is reported to the
> caller, because no mapping for TimeoutException exists.
> We need to provide meaningful exception to the user in this scenario.
> мы тут не можем достучаться до лидера и выбрасываем наружу INTERNAL_ERROR,
> потому что мапинга для TimeoutException нет
--
This message was sent by Atlassian Jira
(v8.20.10#820010)