anton-vinogradov commented on code in PR #13238:
URL: https://github.com/apache/ignite/pull/13238#discussion_r3476211814


##########
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/ValidationOnNodeJoinUtils.java:
##########
@@ -631,6 +637,33 @@ private static void checkMemoryConfiguration(ClusterNode 
rmt, GridKernalContext
         return null;
     }
 
+    /**
+     * Analyzes affinity settings of a provided {@link CacheConfiguration} to 
inspect if it provides guarantees
+     * that partitions of the cache will be spread across all datacenters 
presented in cluster.
+     *
+     * @return {@code true} if affinity settings guarantee spreading 
partitions across all datacenters and {@code false} otherwise.
+     */
+    static boolean isAffinityConfigurationMdcSafe(CacheConfiguration cc) {
+        if (cc.getCacheMode() == REPLICATED)
+            return true;
+
+        AffinityFunction affFunc = cc.getAffinity();
+
+        if (affFunc instanceof RendezvousAffinityFunction) {
+            IgniteBiPredicate<ClusterNode, List<ClusterNode>> filter = 
((RendezvousAffinityFunction)affFunc).getAffinityBackupFilter();
+
+            if (filter instanceof ClusterNodeAttributeAffinityBackupFilter 
attrFilter) {
+                if 
(!F.asList(attrFilter.getAttributeNames()).contains(ATTR_DATA_CENTER_ID))
+                    return false;
+            }
+
+            if (!(filter instanceof MdcAffinityBackupFilter) && !(filter 
instanceof ClusterNodeAttributeColocatedBackupFilter))
+                return false;

Review Comment:
   `ClusterNodeAttributeAffinityBackupFilter` places backups on nodes with a 
**different** value of the attribute, so with `DATA_CENTER_ID` it forces the 
backup into another DC — i.e. it *is* MDC-safe. But it's neither 
`MdcAffinityBackupFilter` nor `ClusterNodeAttributeColocatedBackupFilter`, so 
this `if` returns `false` for it. By the time we reach here such a filter is 
already guaranteed to contain `DATA_CENTER_ID` (otherwise we'd have returned at 
the check above), so it's safe to accept:
   
   ```suggestion
               if (!(filter instanceof MdcAffinityBackupFilter)
                   && !(filter instanceof 
ClusterNodeAttributeColocatedBackupFilter)
                   && !(filter instanceof 
ClusterNodeAttributeAffinityBackupFilter))
                   return false;
   ```
   
   This case isn't covered today: `CACHE_WITH_MDC_SAFE_ATTRIBUTE_FILTER` uses 
exactly this filter, but `MdcCacheMetricsTest` only checks its *distribution* 
metric, never `IsCacheAffinityConfigurationMdcSafe` — worth adding that 
assertion.
   
   Minor follow-up for the multi-attribute case: 
`getAttributeNames().contains(DATA_CENTER_ID)` is necessary but not sufficient 
— with e.g. `[DATA_CENTER_ID, RACK]` the AND-semantics still allow a backup in 
the same DC but a different rack.



##########
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java:
##########
@@ -2594,6 +2601,35 @@ private void updateDurationHistogram(long duration) {
             cctx.exchange().blockingDurationHistogram().value(duration);
     }
 
+    /**
+     * Updates metric for a partition distribution across data centers for a 
given cache group.
+     *
+     * @param grp Cache group the metric should be recalculated for.
+     * @param assignment New assignment for the cache group.
+     */
+    private void updateMdcMetrics(CacheGroupContext grp, AffinityAssignment 
assignment) {
+        BaselineTopology top = 
cctx.discovery().discoCache().state().baselineTopology();
+        if (top != null) {
+            int numberOfDataCenters = top.numberOfDatacenters();

Review Comment:
   `numberOfDatacenters()` returns `-1` when DCs aren't configured, but that 
isn't handled here: for a non-MDC cluster we still iterate every partition of 
every non-replicated group on the exchange thread (allocating a stream per 
partition) and then discard the result — on every PME. An early return honours 
the `-1` contract and skips the work for non-MDC clusters; it also avoids 
leaving the metric at its default `true` when the DC count is unknown:
   
   ```suggestion
               int numberOfDataCenters = top.numberOfDatacenters();
   
               if (numberOfDataCenters < 2)
                   return;
   ```



##########
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/CacheMetricsImpl.java:
##########
@@ -246,6 +246,12 @@ public class CacheMetricsImpl implements CacheMetrics {
     /** Conflict resolver merged entries count. */
     private LongAdderMetric rslvrMergedCnt;
 
+    /** */
+    private Boolean affCfgMdcSafe;
+
+    /** */
+    private Boolean mdcSafePartDistrib;

Review Comment:
   These are written from the exchange thread 
(`setMdcSafePartitionDistribution`) and read by metric exporters / JMX from 
other threads. Without `volatile` there's no happens-before, so readers may 
observe a stale value — more likely to actually surface on weakly-ordered (ARM) 
CPUs. The other metric fields here are thread-safe (`LongAdderMetric`); these 
should be too:
   
   ```suggestion
       private volatile Boolean affCfgMdcSafe;
   
       /** */
       private volatile Boolean mdcSafePartDistrib;
   ```



##########
modules/core/src/main/java/org/apache/ignite/internal/processors/cluster/BaselineTopology.java:
##########
@@ -274,6 +276,20 @@ public Map<String, Object> attributes(Object consId) {
         return nodeMap.get(consId);
     }
 
+    /**
+     * Calculates number of datacenters presented in current baseline.
+     *
+     * @return Number of datacenters presented in the baseline or {@code -1} 
if unknown.
+     */
+    public int numberOfDatacenters() {
+        Collection<Map<String, Object>> allNodesAttrs = nodeMap.values();
+
+        if (!allNodesAttrs.isEmpty() && 
allNodesAttrs.iterator().next().get(ATTR_DATA_CENTER_ID) != null)
+            return (int)allNodesAttrs.stream().map(m -> 
m.get(ATTR_DATA_CENTER_ID)).distinct().count();
+
+        return -1;

Review Comment:
   Inspecting only the first node's attributes makes this return `-1` whenever 
the *first* baseline node happens to lack the DC attribute — even if every 
other node has it. That's exactly the partial-misconfiguration case this 
feature is meant to catch, and it would silently mark the whole cluster as 
safe. Counting across all nodes (ignoring nulls) is more robust:
   
   ```suggestion
           long dcs = nodeMap.values().stream()
               .map(m -> m.get(ATTR_DATA_CENTER_ID))
               .filter(dc -> dc != null)
               .distinct()
               .count();
   
           return dcs == 0 ? -1 : (int)dcs;
   ```



##########
modules/core/src/test/java/org/apache/ignite/internal/processors/cache/MdcCacheMetricsTest.java:
##########
@@ -0,0 +1,506 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache;
+
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import org.apache.ignite.IgniteSystemProperties;
+import 
org.apache.ignite.cache.affinity.rendezvous.ClusterNodeAttributeAffinityBackupFilter;
+import 
org.apache.ignite.cache.affinity.rendezvous.ClusterNodeAttributeColocatedBackupFilter;
+import org.apache.ignite.cache.affinity.rendezvous.MdcAffinityBackupFilter;
+import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.DataRegionConfiguration;
+import org.apache.ignite.configuration.DataStorageConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.lang.IgniteBiPredicate;
+import org.apache.ignite.spi.metric.BooleanMetric;
+import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
+import org.junit.Test;
+
+import static org.apache.ignite.cache.CacheMode.PARTITIONED;
+import static 
org.apache.ignite.internal.IgniteNodeAttributes.ATTR_DATA_CENTER_ID;
+import static 
org.apache.ignite.internal.processors.metric.impl.MetricUtils.cacheMetricsRegistryName;
+
+/**
+ * Test for new cache metrics for highlighting two data safety issues in Multi 
DataCenter environments:
+ * 1. If cache configuration doesn't specify an affinity backup filter that 
could guarantee presence of data copy in each DC.
+ * 2. If cluster topology changed in such a way that partition copies are not 
spread across all available DCs.
+ */
+public class MdcCacheMetricsTest extends GridCommonAbstractTest {
+    /** */
+    private static final int NODES_NUMBER = 5;
+
+    /** */
+    private static final String CACHE_WITH_MDC_FILTER = "mdcSafeCache0";
+
+    /** */
+    private static final String CACHE_WITH_COLOCATED_FILTER = "mdcSafeCache1";
+
+    /** */
+    private static final String CACHE_WITH_MDC_SAFE_ATTRIBUTE_FILTER = 
"mdcSafeCache2";
+
+    /** */
+    private static final String MDC_UNSAFE_CACHE = "mdcUnsafeCache0";
+
+    /** */
+    private static final String CACHE_WITH_MDC_UNSAFE_ATTRIBUTE_FILTER = 
"mdcUnsafeCache1";
+
+    /** */
+    private static final String STRETCHED_CELL_ATTR_NAME = "DC_CELL_ATTR";
+
+    /** */
+    private static final String ATTR_FOR_UNSAFE_ATTR_FILTER = 
"MDC_UNAWARE_ATTR";
+
+    /** */
+    private static final String[] STRETCHED_CELL_IDS = {"CELL_0", "CELL_1"};
+
+    /** */
+    private static final String DC_ID_0 = "DC_0";
+
+    /** */
+    private static final String DC_ID_1 = "DC_1";
+
+    /** */
+    private static final String AFFINITY_CFG_MDC_SAFE_METRIC_NAME = 
"IsCacheAffinityConfigurationMdcSafe";
+
+    /** */
+    private static final String PARTITION_DISTRIBUTION_SAFE_METRIC_NAME = 
"IsCachePartitionDistributionSafe";
+
+    /** */
+    private String dcId;
+
+    /** */
+    private String cellId;
+
+    /** */
+    private boolean useStaticCaches;
+
+    /** */
+    private boolean persistenceEnabled;
+
+    /** */
+    private final Set<String> allCaches = new HashSet<>();
+
+    /** */
+    private final Set<String> mdcSafeCaches = new HashSet<>();
+
+    /** {@inheritDoc} */
+    @Override protected void beforeTest() throws Exception {
+        super.beforeTest();
+
+        stopAllGrids();
+
+        allCaches.clear();
+        mdcSafeCaches.clear();
+    }
+
+    /** {@inheritDoc} */
+    @Override protected void afterTest() throws Exception {
+        super.afterTest();
+
+        stopAllGrids();
+
+        cleanPersistenceDir();
+
+        allCaches.clear();
+        mdcSafeCaches.clear();
+    }
+
+    /** {@inheritDoc} */
+    @Override protected IgniteConfiguration getConfiguration(String 
igniteInstanceName) throws Exception {
+        IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);
+
+        cfg.setDataStorageConfiguration(new DataStorageConfiguration()
+            .setDefaultDataRegionConfiguration(new DataRegionConfiguration()
+                .setPersistenceEnabled(persistenceEnabled)
+                .setMaxSize(32 * 1024 * 1024)
+            ));
+
+        if (useStaticCaches) {
+            CacheConfiguration mdcSafeCacheCfg0 = prepareCacheCfg(
+                CACHE_WITH_MDC_FILTER,
+                new MdcAffinityBackupFilter(2, 1),
+                true);
+
+            CacheConfiguration mdcSafeCacheCfg1 = prepareCacheCfg(
+                CACHE_WITH_COLOCATED_FILTER,
+                new 
ClusterNodeAttributeColocatedBackupFilter(STRETCHED_CELL_ATTR_NAME),
+                true);
+
+            CacheConfiguration mdcUnsafeCacheCfg0 = 
prepareCacheCfg(MDC_UNSAFE_CACHE, null, false);
+
+            CacheConfiguration mdcUnsafeCacheCfg1 = prepareCacheCfg(
+                CACHE_WITH_MDC_UNSAFE_ATTRIBUTE_FILTER,
+                new 
ClusterNodeAttributeAffinityBackupFilter(ATTR_FOR_UNSAFE_ATTR_FILTER),
+                false);
+
+            cfg.setCacheConfiguration(mdcSafeCacheCfg0, mdcSafeCacheCfg1, 
mdcUnsafeCacheCfg0, mdcUnsafeCacheCfg1);
+        }
+
+        if (!cfg.isClientMode())
+            cfg.setUserAttributes(F.asMap(
+                STRETCHED_CELL_ATTR_NAME,
+                cellId,
+                IgniteSystemProperties.IGNITE_DATA_CENTER_ID,
+                dcId));
+
+        return cfg;
+    }
+
+    /** */
+    private CacheConfiguration prepareCacheCfg(
+        String cacheName,
+        IgniteBiPredicate<ClusterNode, List<ClusterNode>> affBackupFilter,
+        boolean affCfgMdcSafe) {
+        return prepareCacheCfg(cacheName, affBackupFilter, affCfgMdcSafe, 
null);
+    }
+
+    /** */
+    private CacheConfiguration prepareCacheCfg(
+        String cacheName,
+        IgniteBiPredicate<ClusterNode, List<ClusterNode>> affBackupFilter,
+        boolean affCfgMdcSafe,
+        String cacheGroupName) {
+        CacheConfiguration cacheCfg = new CacheConfiguration(cacheName)
+            .setCacheMode(PARTITIONED)
+            .setBackups(1);
+
+        if (cacheGroupName != null)
+            cacheCfg.setGroupName(cacheGroupName);
+
+        cacheCfg.setAffinity(
+            new RendezvousAffinityFunction()
+                .setPartitions(32)
+                .setAffinityBackupFilter(affBackupFilter));
+
+        if (affCfgMdcSafe)
+            mdcSafeCaches.add(cacheName);
+
+        allCaches.add(cacheName);
+
+        return cacheCfg;
+    }
+
+    /**
+     * Test verifies correctness of metric for cache configuration related to 
data distribution across DCs
+     * if caches are organized into groups.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testAffinityCfgMdcSafeMetricForCacheGroup() throws Exception {
+        startClusterAcrossDataCenters(new String[] {DC_ID_0, DC_ID_1}, 2);
+
+        IgniteEx client = startClientGrid(NODES_NUMBER - 1);
+
+        client.getOrCreateCache(
+            prepareCacheCfg(
+                CACHE_WITH_MDC_FILTER + "_0",
+                new MdcAffinityBackupFilter(2, 1),
+                true,
+                "mdcSafeCachesGroup"));
+
+        client.getOrCreateCache(
+            prepareCacheCfg(
+                CACHE_WITH_MDC_FILTER + "_1",
+                new MdcAffinityBackupFilter(2, 1),
+                true,
+                "mdcSafeCachesGroup"));
+
+        client.getOrCreateCache(
+            prepareCacheCfg(MDC_UNSAFE_CACHE + "_0", null, false, 
"mdcUnsafeCachesGroup"));
+
+        client.getOrCreateCache(
+            prepareCacheCfg(MDC_UNSAFE_CACHE + "_1", null, false, 
"mdcUnsafeCachesGroup"));
+
+        checkMdcReadyMetric();
+    }
+
+    /**
+     * Test verifies correctness of metric for partition copies distribution 
across DCs
+     * if caches are organized into groups.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testPartitionDistributionMetricForCacheGroups() throws 
Exception {
+        persistenceEnabled = true;
+
+        startClusterAcrossDataCenters(new String[] {DC_ID_0, DC_ID_1}, 2);
+
+        IgniteEx client = startClientGrid(NODES_NUMBER - 1);
+
+        client.cluster().state(ClusterState.ACTIVE);
+
+        client.getOrCreateCache(prepareCacheCfg(
+            CACHE_WITH_MDC_FILTER + "_0",
+            new MdcAffinityBackupFilter(2, 1),
+            true,
+            "mdcFilterCacheGroup"));
+        client.getOrCreateCache(prepareCacheCfg(
+            CACHE_WITH_MDC_FILTER + "_1",
+            new MdcAffinityBackupFilter(2, 1),
+            true,
+            "mdcFilterCacheGroup"));
+
+        BooleanMetric cache0DistributionSafeMetric = findMetricForCache(
+            grid(1),
+            CACHE_WITH_MDC_FILTER + "_0",
+            PARTITION_DISTRIBUTION_SAFE_METRIC_NAME);
+        BooleanMetric cache1DistributionSafeMetric = findMetricForCache(
+            grid(1),
+            CACHE_WITH_MDC_FILTER + "_1",
+            PARTITION_DISTRIBUTION_SAFE_METRIC_NAME);
+
+        assertNotNull(cache0DistributionSafeMetric);
+        assertNotNull(cache1DistributionSafeMetric);
+        assertTrue(cache0DistributionSafeMetric.value());
+        assertTrue(cache1DistributionSafeMetric.value());
+
+        stopGrid(0);
+
+        assertFalse(cache0DistributionSafeMetric.value());
+        assertFalse(cache1DistributionSafeMetric.value());
+
+        
client.cluster().setBaselineTopology(client.cluster().topologyVersion());
+
+        assertTrue(cache0DistributionSafeMetric.value());
+        assertTrue(cache1DistributionSafeMetric.value());
+    }
+
+    /**
+     * Test verifies correctness of metric for cache configuration related to 
data distribution across DCs for dynamically started caches.
+     * Metric should take a {@code false} value if cache configuration doesn't 
guarantee presence of data copy in each DC
+     * and {@code true} otherwise.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testAffinityCfgMdcSafeMetricForDynamicCaches() throws 
Exception {
+        startClusterAcrossDataCenters(new String[] {DC_ID_0, DC_ID_1}, 2);
+
+        IgniteEx client = startClientGrid(NODES_NUMBER - 1);
+
+        client.cluster().state(ClusterState.ACTIVE);
+
+        client.getOrCreateCache(
+            prepareCacheCfg(CACHE_WITH_MDC_FILTER, new 
MdcAffinityBackupFilter(2, 1), true));
+
+        client.getOrCreateCache(
+            prepareCacheCfg(CACHE_WITH_COLOCATED_FILTER, new 
ClusterNodeAttributeColocatedBackupFilter(STRETCHED_CELL_ATTR_NAME), true));
+
+        client.getOrCreateCache(
+            prepareCacheCfg(MDC_UNSAFE_CACHE, null, false));
+
+        client.getOrCreateCache(
+            prepareCacheCfg(CACHE_WITH_MDC_UNSAFE_ATTRIBUTE_FILTER,
+                new 
ClusterNodeAttributeAffinityBackupFilter(ATTR_FOR_UNSAFE_ATTR_FILTER), false));
+
+        checkMdcReadyMetric();
+    }
+
+    /**
+     * Test verifies correctness of metric for cache configuration related to 
data distribution across DCs for statically configured caches.
+     * Metric should take a {@code false} value if cache configuration doesn't 
guarantee presence of data copy in each DC
+     * and {@code true} otherwise.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testAffinityCfgMdcSafeMetricForStaticCaches() throws Exception 
{
+        useStaticCaches = true;
+
+        startClusterAcrossDataCenters(new String[] {DC_ID_0, DC_ID_1}, 2);
+
+        IgniteEx client = startClientGrid(NODES_NUMBER - 1);
+
+        client.cluster().state(ClusterState.ACTIVE);
+
+        checkMdcReadyMetric();
+    }
+
+    /**
+     * Test verifies correctness of metric for partition copies distribution 
across DCs.
+     * Metric should take a {@code false} value if there is at least one 
partition which doesn't have copies in all DCs
+     * and {@code true} otherwise.
+     * <p/>
+     * This test considers in-memory caches only.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testPartitionDistributionMetricInMemoryCaches() throws 
Exception {
+        startClusterAcrossDataCenters(new String[] {DC_ID_0, DC_ID_1}, 2);
+
+        IgniteEx client = startClientGrid(NODES_NUMBER - 1);
+
+        client.getOrCreateCache(prepareCacheCfg(CACHE_WITH_MDC_FILTER, new 
MdcAffinityBackupFilter(2, 1), true));
+        client.getOrCreateCache(prepareCacheCfg(CACHE_WITH_COLOCATED_FILTER,
+            new 
ClusterNodeAttributeColocatedBackupFilter(STRETCHED_CELL_ATTR_NAME), true));
+        
client.getOrCreateCache(prepareCacheCfg(CACHE_WITH_MDC_SAFE_ATTRIBUTE_FILTER,
+            new ClusterNodeAttributeAffinityBackupFilter(ATTR_DATA_CENTER_ID), 
true));
+
+        BooleanMetric cacheWithMdcFilterDistributionSafeMetric = 
findMetricForCache(
+            grid(1),
+            CACHE_WITH_MDC_FILTER,
+            PARTITION_DISTRIBUTION_SAFE_METRIC_NAME);
+        BooleanMetric cacheWithColocatedFilterDistributionSafeMetric = 
findMetricForCache(
+            grid(1),
+            CACHE_WITH_COLOCATED_FILTER,
+            PARTITION_DISTRIBUTION_SAFE_METRIC_NAME);
+        BooleanMetric cacheWithMdcSafeAttrFilterDistributionSafeMetric = 
findMetricForCache(
+            grid(1),
+            CACHE_WITH_MDC_SAFE_ATTRIBUTE_FILTER,
+            PARTITION_DISTRIBUTION_SAFE_METRIC_NAME
+        );
+
+        assertNotNull(cacheWithMdcFilterDistributionSafeMetric);
+        assertNotNull(cacheWithColocatedFilterDistributionSafeMetric);
+        assertNotNull(cacheWithColocatedFilterDistributionSafeMetric);

Review Comment:
   Copy-paste: this repeats the `colocated` assertion. It should null-check the 
third metric (currently `cacheWithMdcSafeAttrFilterDistributionSafeMetric` is 
never asserted non-null):
   
   ```suggestion
           assertNotNull(cacheWithMdcSafeAttrFilterDistributionSafeMetric);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to