[jira] [Commented] (IGNITE-24942) StackOverflowError in PartitionMover

Roman Puchkovskiy (Jira) Wed, 02 Jul 2025 00:39:05 -0700


    [ 
https://issues.apache.org/jira/browse/IGNITE-24942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987353#comment-17987353
 ]


Roman Puchkovskiy commented on IGNITE-24942:
--------------------------------------------

Retries are now scheduled instead of being executed right away.

Exception handling improvements are split to IGNITE-25815.

Race between starting a raft node and registering a replica (the race was found 
while working on this issue) is described in IGNITE-25814.

> StackOverflowError in PartitionMover
> ------------------------------------
>
>                 Key: IGNITE-24942
>                 URL: https://issues.apache.org/jira/browse/IGNITE-24942
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Roman Puchkovskiy
>            Assignee: Roman Puchkovskiy
>            Priority: Major
>              Labels: ignite-3
>             Fix For: 3.1
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> PartitionMover makes a retry on an exception. Retries are made on each 
> exception (including those that are not retriable), there is no retry limit 
> and the retries might happen in the same thread, which sometimes leads to an 
> infinite loop (resulting in StackOverflowError) if something is broken.
>  # We need to differentiate which exceptions are retryable and which are not
>  # For non-retryable ones, we should call FailureManager right away and stop 
> retrying
>  # For retryable ones, we should add a retry counter and stop handling an 
> exception as a retryable when the counter reaches some limit (that is, stop 
> retrying and notify FailureManager)
>  # Maybe we should initiate a retry in a separate thread pool to avoid stack 
> overflow if there are many retries (or simply pick max retry count that is 
> not big enough to trigger stack overflow)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IGNITE-24942) StackOverflowError in PartitionMover

Reply via email to