Alexey Scherbakov created IGNITE-28509:
------------------------------------------
Summary: Avoid internal error exception if raft leader is not
available
Key: IGNITE-28509
URL: https://issues.apache.org/jira/browse/IGNITE-28509
Project: Ignite
Issue Type: Improvement
Reporter: Alexey Scherbakov
I've observed the following exception in the scenario, there an implicit
transaction is attemtped over partition with lost majority.
It causes the following exception:
{noformat}
cause = {IgniteException@21643} "org.apache.ignite.lang.IgniteException:
IGN-CMN-65535 Send with retry timed out [retryCount = 50, groupId = 19_part_0,
traceId = null, request =
org.apache.ignite.raft.jraft.rpc.GetLeaderRequestImpl, originCommand =
org.apache.ignite.internal.partition.replicator.network.command.UpdateCommandV2Impl,
retryReasons = [[time=1775133944141, msg=Peer idrrt_tmrimilsp_3:0 returned
code UNKNOWN: No leader at term 0.; attemptWaitDuration=203, attemptDuration=1,
attemptStartTime=2026-04-02T15:45:44,141], [time=1775133944343, msg=Peer
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.;
attemptWaitDuration=202, attemptDuration=1,
attemptStartTime=2026-04-02T15:45:44,343], [time=1775133944547, msg=Peer
idrrt_tmrimilsp_3:0 returned code UNKNOWN: No leader at term 0.;
attemptWaitDuration=202, attemptDuration=1,
attemptStartTime=2026-04-02T15:45:44,547], [time=1775133944749, msg=Peer
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.;
attemptWaitDuration=202, attemptDuration=1"
errorPrefix = "IGN"
groupName = "CMN"
code = 131071
traceId = {UUID@25463} "53b49c5d-a27f-4d0c-b3b0-8b539e792813"
backtrace = {Object[5]@25464}
detailMessage = "Send with retry timed out [retryCount = 50, groupId =
19_part_0, traceId = null, request =
org.apache.ignite.raft.jraft.rpc.GetLeaderRequestImpl, originCommand =
org.apache.ignite.internal.partition.replicator.network.command.UpdateCommandV2Impl,
retryReasons = [[time=1775133944141, msg=Peer idrrt_tmrimilsp_3:0 returned
code UNKNOWN: No leader at term 0.; attemptWaitDuration=203, attemptDuration=1,
attemptStartTime=2026-04-02T15:45:44,141], [time=1775133944343, msg=Peer
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.;
attemptWaitDuration=202, attemptDuration=1,
attemptStartTime=2026-04-02T15:45:44,343], [time=1775133944547, msg=Peer
idrrt_tmrimilsp_3:0 returned code UNKNOWN: No leader at term 0.;
attemptWaitDuration=202, attemptDuration=1,
attemptStartTime=2026-04-02T15:45:44,547], [time=1775133944749, msg=Peer
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.;
attemptWaitDuration=202, attemptDuration=1,
attemptStartTime=2026-04-02T15:45:44,749], [time=177"
cause = {TimeoutException@25466} "java.util.concurrent.TimeoutException: Send
with retry timed out [retryCount = 50, groupId = 19_part_0, traceId = null,
request = org.apache.ignite.raft.jraft.rpc.GetLeaderRequestImpl, originCommand
=
org.apache.ignite.internal.partition.replicator.network.command.UpdateCommandV2Impl,
retryReasons = [[time=1775133944141, msg=Peer idrrt_tmrimilsp_3:0 returned
code UNKNOWN: No leader at term 0.; attemptWaitDuration=203, attemptDuration=1,
attemptStartTime=2026-04-02T15:45:44,141], [time=1775133944343, msg=Peer
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.;
attemptWaitDuration=202, attemptDuration=1,
attemptStartTime=2026-04-02T15:45:44,343], [time=1775133944547, msg=Peer
idrrt_tmrimilsp_3:0 returned code UNKNOWN: No leader at term 0.;
attemptWaitDuration=202, attemptDuration=1,
attemptStartTime=2026-04-02T15:45:44,547], [time=1775133944749, msg=Peer
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.;
attemptWaitDuration=202, attemptDuration=1, attemptStartT"
stackTrace = {StackTraceElement[19]@25470}
depth = 19
suppressedExceptions = {Collections$EmptyList@25468} size = 0{noformat}
A leader is not available, and the internal exception is reported to the
caller, because no mapping for TimeoutException exists.
We need to provide meaningful exception to the user in this scenario.
мы тут не можем достучаться до лидера и выбрасываем наружу INTERNAL_ERROR,
потому что мапинга для TimeoutException нет
--
This message was sent by Atlassian Jira
(v8.20.10#820010)