[
https://issues.apache.org/jira/browse/IGNITE-25815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roman Puchkovskiy updated IGNITE-25815:
---------------------------------------
Description:
PartitionMover makes a retry on an exception. Retries are made on each
exception (including those that are not retriable), which sometimes leads to an
infinite loop if something is broken.
# We need to differentiate which exceptions are retryable and which are not
# For non-retryable ones, we could call FailureManager right away and stop
retrying; or we could change replica state to some special error state to avoid
crashing the node, but at the same time indicate that something is wrong
# For retryable ones, we should add a retry counter and stop handling an
exception as a retryable when the counter reaches some limit (that is, stop
retrying and notify FailureManager or switch replica state to error)
> Improve exception handling in PartitionMover
> --------------------------------------------
>
> Key: IGNITE-25815
> URL: https://issues.apache.org/jira/browse/IGNITE-25815
> Project: Ignite
> Issue Type: Improvement
> Reporter: Roman Puchkovskiy
> Priority: Major
> Labels: ignite-3
>
> PartitionMover makes a retry on an exception. Retries are made on each
> exception (including those that are not retriable), which sometimes leads to
> an infinite loop if something is broken.
> # We need to differentiate which exceptions are retryable and which are not
> # For non-retryable ones, we could call FailureManager right away and stop
> retrying; or we could change replica state to some special error state to
> avoid crashing the node, but at the same time indicate that something is wrong
> # For retryable ones, we should add a retry counter and stop handling an
> exception as a retryable when the counter reaches some limit (that is, stop
> retrying and notify FailureManager or switch replica state to error)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)