[
https://issues.apache.org/jira/browse/IGNITE-25363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Konstantin Orlov reassigned IGNITE-25363:
-----------------------------------------
Assignee: Konstantin Orlov
> Sql. Delayed NODE_LEFT event processing may cause query to hung
> ---------------------------------------------------------------
>
> Key: IGNITE-25363
> URL: https://issues.apache.org/jira/browse/IGNITE-25363
> Project: Ignite
> Issue Type: Bug
> Components: sql ai3
> Reporter: Konstantin Orlov
> Assignee: Konstantin Orlov
> Priority: Major
> Labels: ignite-3
>
> This problem is highlighted by test
> {{org.apache.ignite.internal.runner.app.ItDataSchemaSyncTest#checkSchemasCorrectlyRestore}}
> which sometimes fails on TC with timeout. The sequence of events as follow:
> # Given: cluster of 3 nodes, distribution zone spans all these nodes.
> # Node 1 has been restarted.
> # Notification of
> {{org.apache.ignite.internal.network.TopologyEventHandler#onDisappeared}}
> handlers are delayed on node 2 (due to metastorage lagging or whatever
> reason).
> # Query started from node 1.
> # Root fragment processed locally, {{QueryBatchRequest}} came to node 2
> before {{QueryStartRequest}}. This step is crucial since it puts not
> completed future to mailbox registry
> ({{org.apache.ignite.internal.sql.engine.exec.MailboxRegistryImpl#locals}}).
> # {{TopologyEventHandler}}'s are notified on node 2. This step causes
> {{onNodeLeft}} handler to be chained to the future from previous step.
> # {{QueryStartRequest}} came to node 2. Query fragment is created an
> immediately closed by {{onNodeLeft}} handler.
> The problem is that {{onNodeLeft}} handler is applied to a query started on a
> topology which takes into account node restart. We have to ignore such
> outdated events.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)