Ignite TC Bot created IGNITE-28684:
--------------------------------------
Summary: MultiDataCenterRingTest.testRing can fail when random
cluster has one surviving data center
Key: IGNITE-28684
URL: https://issues.apache.org/jira/browse/IGNITE-28684
Project: Ignite
Issue Type: Bug
Reporter: Ignite TC Bot
h3. Failure
TeamCity SPI (Discovery) failure on master-like code path:
* Test:
{code}org.apache.ignite.spi.discovery.tcp.MultiDataCenterRingTest.testRing{code}
* Suite:
{code}org.apache.ignite.testsuites.IgniteSpiDiscoverySelfTestSuite{code}
* Build: https://ci2.ignite.apache.org/viewLog.html?buildId=9060064
* Error: {code}java.lang.AssertionError: expected:<2> but was:<0>{code}
h3. Likely root cause
{code}MultiDataCenterRingTest.generateRandomDcOrderCluster(int cnt){code}
assigns every node to {code}DC0{code} or {code}DC1{code} using
{code}ThreadLocalRandom{code}. The test then stops node {code}cnt - 1{code} and
node {code}0{code}, and expects {code}checkHops(2){code} to remain true.
This is not guaranteed. If all surviving server nodes after those stops are
assigned to the same data center, {code}TcpDiscoveryNodesRing.nextNode(){code}
sorted by {code}MdcAwareNodesComparator{code} has no cross-DC boundary and
{code}checkHops(2){code} counts {code}0{code} hops. That exactly matches the
TeamCity failure.
h3. Minimal fix
Make the test topology deterministic enough to preserve both DCs among nodes
that survive the explicit stops. For example, force two surviving node indexes,
such as {code}1{code} and {code}2{code}, into different DCs, and keep the
existing random assignment for the remaining nodes:
{code:java}
String dcId;
if (i == 1)
dcId = DC_ID_0;
else if (i == 2)
dcId = DC_ID_1;
else
dcId = rnd.nextBoolean() ? DC_ID_0 : DC_ID_1;
System.setProperty(IgniteSystemProperties.IGNITE_DATA_CENTER_ID, dcId);
{code}
This keeps the random join order coverage but removes the invalid all-one-DC
outcome for the second assertion.
h3. Files to inspect
*
{code}modules/core/src/test/java/org/apache/ignite/spi/discovery/tcp/MultiDataCenterRingTest.java{code}
*
{code}modules/core/src/main/java/org/apache/ignite/spi/discovery/tcp/internal/TcpDiscoveryNodesRing.java{code}
*
{code}modules/core/src/main/java/org/apache/ignite/spi/discovery/tcp/internal/MdcAwareNodesComparator.java{code}
h3. Retry
Retry is justified as a short-term mitigation because the failure depends on
random DC assignment. It does not fix the test defect.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)