[ 
https://issues.apache.org/jira/browse/IGNITE-26421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Lapin updated IGNITE-26421:
-------------------------------------
    Description: 
Currently we may halt component stop if it doesn't fit some time bound which 
has no sense. E.g. in 
PartitionReplicaLifecycleManager#cleanUpPartitionsResources we will terminate 
partitions stop if it doesn't fit in 30 seconds
{code:java}
allOf(stopPartitionsFuture).get(30, TimeUnit.SECONDS);{code}
Sometimes, especially,  on slow machines 30 seconds might not be enough. In 
that case we won't stop some replcias which will lead to an assertion error on 
Loza stop because there will be some alive raft groups, which is not expected.

Proper behaviour should be following:
 * Stop each component for whatever time it takes.
 * In case of stop timeout exceedance, log with greater details what node is 
doing: e.g. which partition is stopping, etc.
 * In case exceptions trigger FH.

> Make node stop  time unbounded
> ------------------------------
>
>                 Key: IGNITE-26421
>                 URL: https://issues.apache.org/jira/browse/IGNITE-26421
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Alexander Lapin
>            Priority: Major
>
> Currently we may halt component stop if it doesn't fit some time bound which 
> has no sense. E.g. in 
> PartitionReplicaLifecycleManager#cleanUpPartitionsResources we will terminate 
> partitions stop if it doesn't fit in 30 seconds
> {code:java}
> allOf(stopPartitionsFuture).get(30, TimeUnit.SECONDS);{code}
> Sometimes, especially,  on slow machines 30 seconds might not be enough. In 
> that case we won't stop some replcias which will lead to an assertion error 
> on Loza stop because there will be some alive raft groups, which is not 
> expected.
> Proper behaviour should be following:
>  * Stop each component for whatever time it takes.
>  * In case of stop timeout exceedance, log with greater details what node is 
> doing: e.g. which partition is stopping, etc.
>  * In case exceptions trigger FH.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to