[ 
https://issues.apache.org/jira/browse/IGNITE-28509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Scherbakov updated IGNITE-28509:
---------------------------------------
    Description: 
I've observed the following exception in the scenario, having an implicit 
transaction attemtp to map into a partition with lost majority.

It causes the following exception:
{noformat}
cause = {IgniteException@21643} "org.apache.ignite.lang.IgniteException: 
IGN-CMN-65535 Send with retry timed out [retryCount = 50, groupId = 19_part_0, 
traceId = null, request = 
org.apache.ignite.raft.jraft.rpc.GetLeaderRequestImpl, originCommand = 
org.apache.ignite.internal.partition.replicator.network.command.UpdateCommandV2Impl,
 retryReasons = [[time=1775133944141, msg=Peer idrrt_tmrimilsp_3:0 returned 
code UNKNOWN: No leader at term 0.; attemptWaitDuration=203, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,141], [time=1775133944343, msg=Peer 
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,343], [time=1775133944547, msg=Peer 
idrrt_tmrimilsp_3:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,547], [time=1775133944749, msg=Peer 
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1"
 errorPrefix = "IGN"
 groupName = "CMN"
 code = 131071
 traceId = {UUID@25463} "53b49c5d-a27f-4d0c-b3b0-8b539e792813"
 backtrace = {Object[5]@25464} 
 detailMessage = "Send with retry timed out [retryCount = 50, groupId = 
19_part_0, traceId = null, request = 
org.apache.ignite.raft.jraft.rpc.GetLeaderRequestImpl, originCommand = 
org.apache.ignite.internal.partition.replicator.network.command.UpdateCommandV2Impl,
 retryReasons = [[time=1775133944141, msg=Peer idrrt_tmrimilsp_3:0 returned 
code UNKNOWN: No leader at term 0.; attemptWaitDuration=203, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,141], [time=1775133944343, msg=Peer 
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,343], [time=1775133944547, msg=Peer 
idrrt_tmrimilsp_3:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,547], [time=1775133944749, msg=Peer 
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,749], [time=177"
 cause = {TimeoutException@25466} "java.util.concurrent.TimeoutException: Send 
with retry timed out [retryCount = 50, groupId = 19_part_0, traceId = null, 
request = org.apache.ignite.raft.jraft.rpc.GetLeaderRequestImpl, originCommand 
= 
org.apache.ignite.internal.partition.replicator.network.command.UpdateCommandV2Impl,
 retryReasons = [[time=1775133944141, msg=Peer idrrt_tmrimilsp_3:0 returned 
code UNKNOWN: No leader at term 0.; attemptWaitDuration=203, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,141], [time=1775133944343, msg=Peer 
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,343], [time=1775133944547, msg=Peer 
idrrt_tmrimilsp_3:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,547], [time=1775133944749, msg=Peer 
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, attemptStartT"
 stackTrace = {StackTraceElement[19]@25470} 
 depth = 19
 suppressedExceptions = {Collections$EmptyList@25468}  size = 0{noformat}
A leader is not available, and the internal exception is reported to the 
caller, because no mapping for TimeoutException exists.

We need to provide meaningful exception to the user in this scenario.

  was:
I've observed the following exception in the scenario, having an implicit 
transaction attemtp to map into a partition with lost majority.

It causes the following exception:
{noformat}
cause = {IgniteException@21643} "org.apache.ignite.lang.IgniteException: 
IGN-CMN-65535 Send with retry timed out [retryCount = 50, groupId = 19_part_0, 
traceId = null, request = 
org.apache.ignite.raft.jraft.rpc.GetLeaderRequestImpl, originCommand = 
org.apache.ignite.internal.partition.replicator.network.command.UpdateCommandV2Impl,
 retryReasons = [[time=1775133944141, msg=Peer idrrt_tmrimilsp_3:0 returned 
code UNKNOWN: No leader at term 0.; attemptWaitDuration=203, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,141], [time=1775133944343, msg=Peer 
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,343], [time=1775133944547, msg=Peer 
idrrt_tmrimilsp_3:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,547], [time=1775133944749, msg=Peer 
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1"
 errorPrefix = "IGN"
 groupName = "CMN"
 code = 131071
 traceId = {UUID@25463} "53b49c5d-a27f-4d0c-b3b0-8b539e792813"
 backtrace = {Object[5]@25464} 
 detailMessage = "Send with retry timed out [retryCount = 50, groupId = 
19_part_0, traceId = null, request = 
org.apache.ignite.raft.jraft.rpc.GetLeaderRequestImpl, originCommand = 
org.apache.ignite.internal.partition.replicator.network.command.UpdateCommandV2Impl,
 retryReasons = [[time=1775133944141, msg=Peer idrrt_tmrimilsp_3:0 returned 
code UNKNOWN: No leader at term 0.; attemptWaitDuration=203, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,141], [time=1775133944343, msg=Peer 
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,343], [time=1775133944547, msg=Peer 
idrrt_tmrimilsp_3:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,547], [time=1775133944749, msg=Peer 
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,749], [time=177"
 cause = {TimeoutException@25466} "java.util.concurrent.TimeoutException: Send 
with retry timed out [retryCount = 50, groupId = 19_part_0, traceId = null, 
request = org.apache.ignite.raft.jraft.rpc.GetLeaderRequestImpl, originCommand 
= 
org.apache.ignite.internal.partition.replicator.network.command.UpdateCommandV2Impl,
 retryReasons = [[time=1775133944141, msg=Peer idrrt_tmrimilsp_3:0 returned 
code UNKNOWN: No leader at term 0.; attemptWaitDuration=203, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,141], [time=1775133944343, msg=Peer 
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,343], [time=1775133944547, msg=Peer 
idrrt_tmrimilsp_3:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, 
attemptStartTime=2026-04-02T15:45:44,547], [time=1775133944749, msg=Peer 
idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
attemptWaitDuration=202, attemptDuration=1, attemptStartT"
 stackTrace = {StackTraceElement[19]@25470} 
 depth = 19
 suppressedExceptions = {Collections$EmptyList@25468}  size = 0{noformat}
A leader is not available, and the internal exception is reported to the 
caller, because no mapping for TimeoutException exists.

We need to provide meaningful exception to the user in this scenario.

мы тут не можем достучаться до лидера и выбрасываем наружу INTERNAL_ERROR, 
потому что мапинга для TimeoutException нет


> Avoid internal error then raft leader is not available
> ------------------------------------------------------
>
>                 Key: IGNITE-28509
>                 URL: https://issues.apache.org/jira/browse/IGNITE-28509
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Alexey Scherbakov
>            Priority: Major
>              Labels: ignite-3
>
> I've observed the following exception in the scenario, having an implicit 
> transaction attemtp to map into a partition with lost majority.
> It causes the following exception:
> {noformat}
> cause = {IgniteException@21643} "org.apache.ignite.lang.IgniteException: 
> IGN-CMN-65535 Send with retry timed out [retryCount = 50, groupId = 
> 19_part_0, traceId = null, request = 
> org.apache.ignite.raft.jraft.rpc.GetLeaderRequestImpl, originCommand = 
> org.apache.ignite.internal.partition.replicator.network.command.UpdateCommandV2Impl,
>  retryReasons = [[time=1775133944141, msg=Peer idrrt_tmrimilsp_3:0 returned 
> code UNKNOWN: No leader at term 0.; attemptWaitDuration=203, 
> attemptDuration=1, attemptStartTime=2026-04-02T15:45:44,141], 
> [time=1775133944343, msg=Peer idrrt_tmrimilsp_0:0 returned code UNKNOWN: No 
> leader at term 0.; attemptWaitDuration=202, attemptDuration=1, 
> attemptStartTime=2026-04-02T15:45:44,343], [time=1775133944547, msg=Peer 
> idrrt_tmrimilsp_3:0 returned code UNKNOWN: No leader at term 0.; 
> attemptWaitDuration=202, attemptDuration=1, 
> attemptStartTime=2026-04-02T15:45:44,547], [time=1775133944749, msg=Peer 
> idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
> attemptWaitDuration=202, attemptDuration=1"
>  errorPrefix = "IGN"
>  groupName = "CMN"
>  code = 131071
>  traceId = {UUID@25463} "53b49c5d-a27f-4d0c-b3b0-8b539e792813"
>  backtrace = {Object[5]@25464} 
>  detailMessage = "Send with retry timed out [retryCount = 50, groupId = 
> 19_part_0, traceId = null, request = 
> org.apache.ignite.raft.jraft.rpc.GetLeaderRequestImpl, originCommand = 
> org.apache.ignite.internal.partition.replicator.network.command.UpdateCommandV2Impl,
>  retryReasons = [[time=1775133944141, msg=Peer idrrt_tmrimilsp_3:0 returned 
> code UNKNOWN: No leader at term 0.; attemptWaitDuration=203, 
> attemptDuration=1, attemptStartTime=2026-04-02T15:45:44,141], 
> [time=1775133944343, msg=Peer idrrt_tmrimilsp_0:0 returned code UNKNOWN: No 
> leader at term 0.; attemptWaitDuration=202, attemptDuration=1, 
> attemptStartTime=2026-04-02T15:45:44,343], [time=1775133944547, msg=Peer 
> idrrt_tmrimilsp_3:0 returned code UNKNOWN: No leader at term 0.; 
> attemptWaitDuration=202, attemptDuration=1, 
> attemptStartTime=2026-04-02T15:45:44,547], [time=1775133944749, msg=Peer 
> idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
> attemptWaitDuration=202, attemptDuration=1, 
> attemptStartTime=2026-04-02T15:45:44,749], [time=177"
>  cause = {TimeoutException@25466} "java.util.concurrent.TimeoutException: 
> Send with retry timed out [retryCount = 50, groupId = 19_part_0, traceId = 
> null, request = org.apache.ignite.raft.jraft.rpc.GetLeaderRequestImpl, 
> originCommand = 
> org.apache.ignite.internal.partition.replicator.network.command.UpdateCommandV2Impl,
>  retryReasons = [[time=1775133944141, msg=Peer idrrt_tmrimilsp_3:0 returned 
> code UNKNOWN: No leader at term 0.; attemptWaitDuration=203, 
> attemptDuration=1, attemptStartTime=2026-04-02T15:45:44,141], 
> [time=1775133944343, msg=Peer idrrt_tmrimilsp_0:0 returned code UNKNOWN: No 
> leader at term 0.; attemptWaitDuration=202, attemptDuration=1, 
> attemptStartTime=2026-04-02T15:45:44,343], [time=1775133944547, msg=Peer 
> idrrt_tmrimilsp_3:0 returned code UNKNOWN: No leader at term 0.; 
> attemptWaitDuration=202, attemptDuration=1, 
> attemptStartTime=2026-04-02T15:45:44,547], [time=1775133944749, msg=Peer 
> idrrt_tmrimilsp_0:0 returned code UNKNOWN: No leader at term 0.; 
> attemptWaitDuration=202, attemptDuration=1, attemptStartT"
>  stackTrace = {StackTraceElement[19]@25470} 
>  depth = 19
>  suppressedExceptions = {Collections$EmptyList@25468}  size = 0{noformat}
> A leader is not available, and the internal exception is reported to the 
> caller, because no mapping for TimeoutException exists.
> We need to provide meaningful exception to the user in this scenario.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to