Roman Puchkovskiy created IGNITE-19655:
------------------------------------------
Summary: Distributed Sql keeps mapping query fragments to a node
that has already left
Key: IGNITE-19655
URL: https://issues.apache.org/jira/browse/IGNITE-19655
Project: Ignite
Issue Type: Bug
Reporter: Roman Puchkovskiy
Assignee: Maksim Zhuravkov
Fix For: 3.0.0-beta2
There are two test failures:
[https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7271211?expandCode+Inspection=true&expandBuildProblemsSection=true&hideProblemsFromDependencies=false&expandBuildTestsSection=true&hideTestsFromDependencies=false]
and
[https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7272905?hideProblemsFromDependencies=false&hideTestsFromDependencies=false&expandCode+Inspection=true&expandBuildProblemsSection=true&expandBuildChangesSection=true&expandBuildTestsSection=true]
(org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.entriesKeepAppendedAfterSnapshotInstallation
and
org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.snapshotInstallTimeoutDoesNotBreakSubsequentInstallsWhenSecondAttemptIsIdenticalToFirst,
correspondingly).
In both cases, the test code creates a table with 3 replicas on a cluster of 3
nodes, then it stops the last node and tries to make an insert using one of the
2 remaining nodes. The RAFT majority (2 of 3) is still preserved, so the insert
should succeed. It's understood that the insert might be issued before the
remaining nodes understand that the third node has left, so we have a retry
mechanism in place, it makes up to 5 attempts for almost 8 seconds (in total).
But in both the failed runs, each of 5 attempts failed because a fragment of
the INSERT query was mapped to the missing node. This seems to be a bad luck
(as the tests pass most of the time, fail rate is about 2.5%), but anyway: the
SQL engine does not seem to care about the fact that the node has already left.
Probably, the SQL engine should track the Logical Topology events and avoid
mapping query fragments to the missing nodes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)