[jira] [Assigned] (IGNITE-25363) Sql. Delayed NODE_LEFT event processing may cause query to hung

Konstantin Orlov (Jira) Tue, 13 May 2025 09:35:43 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-25363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Konstantin Orlov reassigned IGNITE-25363:
-----------------------------------------

    Assignee: Konstantin Orlov

> Sql. Delayed NODE_LEFT event processing may cause query to hung
> ---------------------------------------------------------------
>
>                 Key: IGNITE-25363
>                 URL: https://issues.apache.org/jira/browse/IGNITE-25363
>             Project: Ignite
>          Issue Type: Bug
>          Components: sql ai3
>            Reporter: Konstantin Orlov
>            Assignee: Konstantin Orlov
>            Priority: Major
>              Labels: ignite-3
>
> This problem is highlighted by test 
> {{org.apache.ignite.internal.runner.app.ItDataSchemaSyncTest#checkSchemasCorrectlyRestore}}
>  which sometimes fails on TC with timeout. The sequence of events as follow:
>  # Given: cluster of 3 nodes, distribution zone spans all these nodes.
>  # Node 1 has been restarted.
>  # Notification of 
> {{org.apache.ignite.internal.network.TopologyEventHandler#onDisappeared}} 
> handlers are delayed on node 2 (due to metastorage lagging or whatever 
> reason).
>  # Query started from node 1.
>  # Root fragment processed locally, {{QueryBatchRequest}} came to node 2 
> before {{QueryStartRequest}}. This step is crucial since it puts not 
> completed future to mailbox registry 
> ({{org.apache.ignite.internal.sql.engine.exec.MailboxRegistryImpl#locals}}).
>  # {{TopologyEventHandler}}'s are notified on node 2. This step causes 
> {{onNodeLeft}} handler to be chained to the future from previous step.
> # {{QueryStartRequest}} came to node 2. Query fragment is created an 
> immediately closed by {{onNodeLeft}} handler.
> The problem is that {{onNodeLeft}} handler is applied to a query started on a 
> topology which takes into account node restart. We have to ignore such 
> outdated events.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IGNITE-25363) Sql. Delayed NODE_LEFT event processing may cause query to hung

Reply via email to