[ 
https://issues.apache.org/jira/browse/IGNITE-25281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mirza Aliev updated IGNITE-25281:
---------------------------------
    Description: 
h3. Motivation 

We need to add a test that ensures a node can be stopped cleanly even when 
MetaStorage is unavailable. 

This issue was observed in a scenario where doStableKeySwitch holds the 
TableManager's busyLock, preventing {{beforeNodeStop}} from completing. 
Additionally, a {{TimeoutException}} is thrown inside {{doStableKeySwitch}} 
when calling {{metaStorageMgr.invoke(...).get()}}, because the only MetaStorage 
node is already stopped. This exception currently leads to a call to the 
FailureHandler. However we have agreed to not call FH in the case of 
{{TimeoutException}} in the ticket 
https://issues.apache.org/jira/browse/IGNITE-25278. Also we have agreed to add 
recovery mechanism for stable switch intents after MetaStorage becomes 
available in https://issues.apache.org/jira/browse/IGNITE-25276


The test should reproduce general situation when  MetaStorage become 
unavailable and confirm that the stop process completes gracefully without 
hanging, even if MetaStorage is down.

h3. Definition of done
* The test must be implemented 

  was:
h3. Motivation 

We need to add a test that ensures a node can be stopped cleanly even when 
MetaStorage is unavailable. 

This issue was observed in a scenario where doStableKeySwitch holds the 
TableManager's busyLock, preventing {{beforeNodeStop}} from completing. 
Additionally, a {{TimeoutException}} is thrown inside {{doStableKeySwitch}} 
when calling {{metaStorageMgr.invoke(...).get()}}, because the only MetaStorage 
node is already stopped. This exception currently leads to a call to the 
FailureHandler. However we have agreed to not call FH in the case of 
{{TimeoutException}} in the ticket 
https://issues.apache.org/jira/browse/IGNITE-25278. Also we have agreed to add 
recovery mechanism for stable switch intents after MetaStorage becomes 
available in https://issues.apache.org/jira/browse/IGNITE-25276


The test should reproduce general situation when  MetaStorage become 
unavailable and confirm that the stop process completes gracefully without 
hanging, even if MetaStorage is down.


> Add test to verify node stop does not hang when MetaStorage is unavailable
> --------------------------------------------------------------------------
>
>                 Key: IGNITE-25281
>                 URL: https://issues.apache.org/jira/browse/IGNITE-25281
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Mirza Aliev
>            Priority: Major
>              Labels: ignite-3
>
> h3. Motivation 
> We need to add a test that ensures a node can be stopped cleanly even when 
> MetaStorage is unavailable. 
> This issue was observed in a scenario where doStableKeySwitch holds the 
> TableManager's busyLock, preventing {{beforeNodeStop}} from completing. 
> Additionally, a {{TimeoutException}} is thrown inside {{doStableKeySwitch}} 
> when calling {{metaStorageMgr.invoke(...).get()}}, because the only 
> MetaStorage node is already stopped. This exception currently leads to a call 
> to the FailureHandler. However we have agreed to not call FH in the case of 
> {{TimeoutException}} in the ticket 
> https://issues.apache.org/jira/browse/IGNITE-25278. Also we have agreed to 
> add recovery mechanism for stable switch intents after MetaStorage becomes 
> available in https://issues.apache.org/jira/browse/IGNITE-25276
> The test should reproduce general situation when  MetaStorage become 
> unavailable and confirm that the stop process completes gracefully without 
> hanging, even if MetaStorage is down.
> h3. Definition of done
> * The test must be implemented 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to