ibessonov commented on code in PR #7250:
URL: https://github.com/apache/ignite-3/pull/7250#discussion_r2634409646


##########
modules/page-memory/src/main/java/org/apache/ignite/internal/pagememory/persistence/checkpoint/CheckpointReadWriteLock.java:
##########
@@ -182,14 +199,35 @@ public boolean hasQueuedWriters() {
     }
 
     private void onReadLock(long start, boolean taken) {
-        long elapsed = coarseCurrentTimeMillis() - start;
+        readLockWaitingThreadsGauge.decrementAndGet();
+
+        long coarseCurrentTimeMillis = coarseCurrentTimeMillis();
+        long elapsed = coarseCurrentTimeMillis - start;
 
         if (taken) {
+            if (checkpointReadLockAcquiredTime.get() == null) {

Review Comment:
   Please add a comment, it took me a while to figure out why we need this 
check. Or maybe we should base this code on the value of 
`checkpointReadLockHoldCount` that we read in the next statement instead, this 
will make code simpler



##########
modules/page-memory/src/main/java/org/apache/ignite/internal/pagememory/persistence/checkpoint/CheckpointReadWriteLock.java:
##########
@@ -182,14 +199,35 @@ public boolean hasQueuedWriters() {
     }
 
     private void onReadLock(long start, boolean taken) {
-        long elapsed = coarseCurrentTimeMillis() - start;
+        readLockWaitingThreadsGauge.decrementAndGet();
+
+        long coarseCurrentTimeMillis = coarseCurrentTimeMillis();

Review Comment:
   `coarseCurrentTimeMillis()` has an abysmal precision. I would recommend 
replacing it with `nanoTime()` as the best tool for measuring time. There's 
also `org.apache.ignite.internal.util.IgniteUtils#monotonicMs` which hides some 
complexity, but adds some extra CPU instructions



##########
modules/page-memory/src/main/java/org/apache/ignite/internal/pagememory/persistence/checkpoint/CheckpointReadWriteLockMetricSource.java:
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.pagememory.persistence.checkpoint;
+
+import java.util.List;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.IntSupplier;
+import org.apache.ignite.internal.metrics.AbstractMetricSource;
+import org.apache.ignite.internal.metrics.DistributionMetric;
+import org.apache.ignite.internal.metrics.IntGauge;
+import org.apache.ignite.internal.metrics.Metric;
+
+/**
+ * Metric source for checkpoint read lock operations.
+ *
+ * <p>This metric source tracks performance and contention characteristics of 
checkpoint read locks
+ * acquired by normal operations during database operation. These locks 
coordinate with the checkpoint
+ * write lock to ensure consistency.
+ */
+public class CheckpointReadWriteLockMetricSource extends 
AbstractMetricSource<CheckpointReadWriteLockMetricSource.Holder> {
+    private static final long[] LOCK_ACQUISITION_MILLIS = {
+            1,      // 1ms   - uncontended, fast path
+            10,     // 10ms  - minor contention
+            50,     // 100ms - moderate contention
+            100,    // 100ms - high contention
+            500,    // 500ms - checkpoint in progress?
+            1_000,  // 1s    - severe contention, reported as warning in logs
+            10_000  // 10s   - pathological case, shall be treated as an 
emergency error
+    };
+
+    private static final long[] LOCK_HOLD_MILLIS = {
+            1,      // 1ms   - very fast operation
+            10,     // 10ms  - fast single-page operation
+            50,     // 50ms  - multi-page operation
+            100,    // 100ms - complex operation
+            500,    // 500ms - batch operation
+            1_000,  // 1s    - too large batch or slow I/O
+            10_000  // 10s  - pathologically large operation
+    };
+
+    /** Counter for threads currently waiting for checkpoint read lock. */
+    private final AtomicInteger readLockWaitingThreadsGauge = new 
AtomicInteger(0);
+
+    public CheckpointReadWriteLockMetricSource() {
+        super("checkpoint.readwritelock", "Checkpoint read/wrtie lock 
metrics", "checkpoint");
+    }
+
+    /**
+     * Records the duration of a lock acquisition in nanoseconds.
+     */
+    public void recordReadLockAcquisitionTime(long acquisitionDurationMillis) {
+        Holder holder = holder();
+        if (holder != null) {
+            holder.readLockAcquisitionTime().add(acquisitionDurationMillis);
+        }
+    }
+
+    /**
+     * Records the duration of a lock hold in nanoseconds.
+     */
+    public void recordReadLockHoldDuration(long lockHoldDuration) {
+        Holder holder = holder();
+        if (holder != null) {
+            holder.readLockHoldTime().add(lockHoldDuration);
+        }
+    }
+
+    public AtomicInteger readLockWaitingThreadsGauge() {
+        return readLockWaitingThreadsGauge;
+    }
+
+    @Override
+    protected Holder createHolder() {
+        return new Holder(readLockWaitingThreadsGauge::get);
+    }
+
+    /** Metrics holder. */
+    protected static class Holder implements 
AbstractMetricSource.Holder<Holder> {
+        private final DistributionMetric readLockAcquisitionTime = new 
DistributionMetric(
+                "CheckpointReadLockAcquisitionTime",

Review Comment:
   Duplicating the word `Checkpoint` may not be necessary. We're already in a 
checkpoint metrics source



##########
modules/page-memory/src/main/java/org/apache/ignite/internal/pagememory/persistence/checkpoint/CheckpointReadWriteLockMetricSource.java:
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.pagememory.persistence.checkpoint;
+
+import java.util.List;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.IntSupplier;
+import org.apache.ignite.internal.metrics.AbstractMetricSource;
+import org.apache.ignite.internal.metrics.DistributionMetric;
+import org.apache.ignite.internal.metrics.IntGauge;
+import org.apache.ignite.internal.metrics.Metric;
+
+/**
+ * Metric source for checkpoint read lock operations.
+ *
+ * <p>This metric source tracks performance and contention characteristics of 
checkpoint read locks
+ * acquired by normal operations during database operation. These locks 
coordinate with the checkpoint
+ * write lock to ensure consistency.
+ */
+public class CheckpointReadWriteLockMetricSource extends 
AbstractMetricSource<CheckpointReadWriteLockMetricSource.Holder> {
+    private static final long[] LOCK_ACQUISITION_MILLIS = {
+            1,      // 1ms   - uncontended, fast path
+            10,     // 10ms  - minor contention
+            50,     // 100ms - moderate contention
+            100,    // 100ms - high contention
+            500,    // 500ms - checkpoint in progress?
+            1_000,  // 1s    - severe contention, reported as warning in logs
+            10_000  // 10s   - pathological case, shall be treated as an 
emergency error
+    };
+
+    private static final long[] LOCK_HOLD_MILLIS = {
+            1,      // 1ms   - very fast operation
+            10,     // 10ms  - fast single-page operation
+            50,     // 50ms  - multi-page operation
+            100,    // 100ms - complex operation
+            500,    // 500ms - batch operation
+            1_000,  // 1s    - too large batch or slow I/O
+            10_000  // 10s  - pathologically large operation
+    };
+
+    /** Counter for threads currently waiting for checkpoint read lock. */
+    private final AtomicInteger readLockWaitingThreadsGauge = new 
AtomicInteger(0);

Review Comment:
   Let's make it a `LongAdder` just in case. We can even make it 
`LongAdderMetric` right away



##########
modules/page-memory/src/main/java/org/apache/ignite/internal/pagememory/persistence/checkpoint/CheckpointReadWriteLock.java:
##########
@@ -56,10 +63,17 @@ public class CheckpointReadWriteLock {
      *
      * @param checkpointLock Checkpoint lock.
      * @param throttledLogExecutor Executor for the throttled logger.
+     * @param readLockMetricSource Metrics source.
      */
-    public CheckpointReadWriteLock(ReentrantReadWriteLockWithTracking 
checkpointLock, Executor throttledLogExecutor) {
+    public CheckpointReadWriteLock(
+            ReentrantReadWriteLockWithTracking checkpointLock,
+            Executor throttledLogExecutor,
+            CheckpointReadWriteLockMetricSource readLockMetricSource
+    ) {
         this.checkpointLock = checkpointLock;
         this.log = 
Loggers.toThrottledLogger(Loggers.forClass(CheckpointReadWriteLock.class), 
throttledLogExecutor);
+        this.metrics = readLockMetricSource;
+        this.readLockWaitingThreadsGauge = 
metrics.readLockWaitingThreadsGauge();

Review Comment:
   Not sure why you called it gauge, any hidden meaning? The name without such 
a suffix reads just as well



##########
modules/page-memory/src/main/java/org/apache/ignite/internal/pagememory/persistence/checkpoint/CheckpointReadWriteLock.java:
##########
@@ -182,14 +199,35 @@ public boolean hasQueuedWriters() {
     }
 
     private void onReadLock(long start, boolean taken) {
-        long elapsed = coarseCurrentTimeMillis() - start;
+        readLockWaitingThreadsGauge.decrementAndGet();
+
+        long coarseCurrentTimeMillis = coarseCurrentTimeMillis();
+        long elapsed = coarseCurrentTimeMillis - start;
 
         if (taken) {
+            if (checkpointReadLockAcquiredTime.get() == null) {
+                checkpointReadLockAcquiredTime.set(coarseCurrentTimeMillis);
+            }
             checkpointReadLockHoldCount.set(checkpointReadLockHoldCount.get() 
+ 1);
+            metrics.recordReadLockAcquisitionTime(elapsed);
         }
 
         if (elapsed > LONG_LOCK_THRESHOLD_MILLIS) {
             log.warn(LONG_LOCK_THROTTLE_KEY, "Checkpoint read lock took {} ms 
to acquire.", elapsed);
         }
     }
+
+    private void onReadUnlock() {
+        int previousLockCount = checkpointReadLockHoldCount.get();
+        checkpointReadLockHoldCount.set(previousLockCount - 1);
+        if (previousLockCount == 1) {
+            // fully unlocked
+            Long checkpointReadLockAcquiredTime = 
this.checkpointReadLockAcquiredTime.get();
+            if (checkpointReadLockAcquiredTime != null) {
+                this.checkpointReadLockAcquiredTime.remove();

Review Comment:
   Removing the thread local and then reinitializing it is not free, I'd 
suggest not doing it without strong reasons



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to