[
https://issues.apache.org/jira/browse/IGNITE-25281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mirza Aliev updated IGNITE-25281:
---------------------------------
Description:
h3. Motivation
We need to add a test that ensures a node can be stopped cleanly even when
MetaStorage is unavailable.
This issue was observed in a scenario where doStableKeySwitch holds the
TableManager's busyLock, preventing {{beforeNodeStop}} from completing.
Additionally, a {{TimeoutException}} is thrown inside {{doStableKeySwitch}}
when calling {{metaStorageMgr.invoke(...).get()}}, because the only MetaStorage
node is already stopped. This exception currently leads to a call to the
FailureHandler. However we have agreed to not call FH in the case of
{{TimeoutException}} in the ticket
https://issues.apache.org/jira/browse/IGNITE-25278. Also we have agreed to add
recovery mechanism for stable switch intents after MetaStorage becomes
available in https://issues.apache.org/jira/browse/IGNITE-25276
The test should reproduce general situation when MetaStorage become
unavailable and confirm that the stop process completes gracefully without
hanging, even if MetaStorage is down.
h3. Definition of done
* The test must be implemented
was:
h3. Motivation
We need to add a test that ensures a node can be stopped cleanly even when
MetaStorage is unavailable.
This issue was observed in a scenario where doStableKeySwitch holds the
TableManager's busyLock, preventing {{beforeNodeStop}} from completing.
Additionally, a {{TimeoutException}} is thrown inside {{doStableKeySwitch}}
when calling {{metaStorageMgr.invoke(...).get()}}, because the only MetaStorage
node is already stopped. This exception currently leads to a call to the
FailureHandler. However we have agreed to not call FH in the case of
{{TimeoutException}} in the ticket
https://issues.apache.org/jira/browse/IGNITE-25278. Also we have agreed to add
recovery mechanism for stable switch intents after MetaStorage becomes
available in https://issues.apache.org/jira/browse/IGNITE-25276
The test should reproduce general situation when MetaStorage become
unavailable and confirm that the stop process completes gracefully without
hanging, even if MetaStorage is down.
> Add test to verify node stop does not hang when MetaStorage is unavailable
> --------------------------------------------------------------------------
>
> Key: IGNITE-25281
> URL: https://issues.apache.org/jira/browse/IGNITE-25281
> Project: Ignite
> Issue Type: Bug
> Reporter: Mirza Aliev
> Priority: Major
> Labels: ignite-3
>
> h3. Motivation
> We need to add a test that ensures a node can be stopped cleanly even when
> MetaStorage is unavailable.
> This issue was observed in a scenario where doStableKeySwitch holds the
> TableManager's busyLock, preventing {{beforeNodeStop}} from completing.
> Additionally, a {{TimeoutException}} is thrown inside {{doStableKeySwitch}}
> when calling {{metaStorageMgr.invoke(...).get()}}, because the only
> MetaStorage node is already stopped. This exception currently leads to a call
> to the FailureHandler. However we have agreed to not call FH in the case of
> {{TimeoutException}} in the ticket
> https://issues.apache.org/jira/browse/IGNITE-25278. Also we have agreed to
> add recovery mechanism for stable switch intents after MetaStorage becomes
> available in https://issues.apache.org/jira/browse/IGNITE-25276
> The test should reproduce general situation when MetaStorage become
> unavailable and confirm that the stop process completes gracefully without
> hanging, even if MetaStorage is down.
> h3. Definition of done
> * The test must be implemented
--
This message was sent by Atlassian Jira
(v8.20.10#820010)