szetszwo commented on code in PR #10034:
URL: https://github.com/apache/ozone/pull/10034#discussion_r3046665768


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java:
##########
@@ -213,6 +215,10 @@ public void notifyNotLeader(Collection<TransactionContext> 
pendingEntries) {
       return;
     }
     LOG.info("current leader SCM steps down.");
+    SCMMetrics metrics = StorageContainerManager.getMetrics();
+    if (metrics != null) {

Review Comment:
   Seems that the returned metrics is never null.  Add a field and a helper 
method.
   ```java
     private final SCMMetrics metrics = StorageContainerManager.getMetrics();
   ```
   ```java
     private void addRatisEvent(String eventMessage) {
       LOG.info(eventMessage);
       metrics.addRatisEvent(eventMessage);
     }
   ```



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java:
##########
@@ -297,10 +310,16 @@ public void notifyLeaderChanged(RaftGroupMemberId 
groupMemberId,
 
     if (!groupMemberId.getPeerId().equals(newLeaderId)) {
       LOG.info("leader changed, yet current SCM is still follower.");
+      if (metrics != null) {
+        metrics.addRatisEvent("Leader changed to " + newLeaderId + ", yet 
current SCM is still follower.");

Review Comment:
   Let's include the id of this server.  The log message and the 
   ```java
         addRatisEvent("Leader changed to " + newLeaderId
             + ", this server " + groupMemberId.getPeerId() + " remains a 
Follower.");
   ```



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java:
##########
@@ -394,11 +413,35 @@ public void notifyLeaderReady() {
     scm.getScmContext().setLeaderReady();
     scm.getSCMServiceManager().notifyStatusChanged();
     scm.getFinalizationManager().onLeaderReady();
+    SCMMetrics metrics = StorageContainerManager.getMetrics();
+    if (metrics != null) {
+      metrics.addRatisEvent("Ready to serve requests as the leader");
+    }
   }
 
   @Override
   public void notifyConfigurationChanged(long term, long index,
       RaftProtos.RaftConfigurationProto newRaftConfiguration) {
+    SCMMetrics metrics = StorageContainerManager.getMetrics();
+    if (metrics != null) {
+      List<RaftProtos.RaftPeerProto> newPeers =
+          newRaftConfiguration.getPeersList();
+      List<RaftProtos.RaftPeerProto> newListeners =
+          newRaftConfiguration.getListenersList();
+      List<String> newPeerIds = new ArrayList<>();
+      List<String> newListenersIds = new ArrayList<>();
+      for (RaftProtos.RaftPeerProto raftPeerProto : newPeers) {
+        newPeerIds.add(RaftPeerId.valueOf(raftPeerProto.getId()).toString());
+      }
+      for (RaftProtos.RaftPeerProto raftListenerProto : newListeners) {
+        
newListenersIds.add(RaftPeerId.valueOf(raftListenerProto.getId()).toString());
+      }
+      metrics.addRatisEvent(
+          "New peers " + newPeerIds +
+              (newListenersIds.isEmpty() ? "" : ", new listeners " + 
newListenersIds) +
+              " added at term index (" +

Review Comment:
   getPeersList() returns the current peers in the conf.  It include both the 
newly added peers and the existing peers (unless they are removed).
   ```java
       addRatisEvent("notifyConfigurationChanged at " + TermIndex.valueOf(term, 
index)
           + ": " + TextFormat.shortDebugString(newRaftConfiguration));
   ```



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/metrics/SCMMetrics.java:
##########
@@ -33,6 +36,9 @@ public class SCMMetrics {
   public static final String SOURCE_NAME =
       SCMMetrics.class.getSimpleName();
 
+  private final List<String> ratisEvents = new ArrayList<>();
+  private static final int MAX_RATIS_EVENTS = 100;

Review Comment:
   Let's make it configurable?



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/metrics/SCMMetrics.java:
##########
@@ -155,6 +161,22 @@ public void decrContainerStat(ContainerStat deltaStat) {
     this.containerReportWriteCount.incr(-1 * deltaStat.getWriteCount().get());
   }
 
+  public void addRatisEvent(String event) {
+    synchronized (ratisEvents) {
+      if (ratisEvents.size() >= MAX_RATIS_EVENTS) {
+        ratisEvents.remove(0);
+      }
+      ratisEvents.add(Time.formatTime(Time.now()) + "|" + event);
+    }
+  }
+
+  @Metric("Ratis state machine events")
+  public String getRatisEvents() {
+    synchronized (ratisEvents) {
+      return String.join("\n", ratisEvents);

Review Comment:
   - Should the metrics publish only the new events?
   - How often this method is involved by the metrics system?  It could be 
inefficient if it keep creating the same, long string again and again.



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java:
##########
@@ -213,6 +215,10 @@ public void notifyNotLeader(Collection<TransactionContext> 
pendingEntries) {
       return;
     }
     LOG.info("current leader SCM steps down.");
+    SCMMetrics metrics = StorageContainerManager.getMetrics();
+    if (metrics != null) {
+      metrics.addRatisEvent("current leader SCM steps down.");

Review Comment:
   Let's include the id.
   ```java
       addRatisEvent("This server " + getId() + " has stepped down from the 
Leader.");
   ```



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/metrics/SCMMetrics.java:
##########
@@ -155,6 +161,22 @@ public void decrContainerStat(ContainerStat deltaStat) {
     this.containerReportWriteCount.incr(-1 * deltaStat.getWriteCount().get());
   }
 
+  public void addRatisEvent(String event) {
+    synchronized (ratisEvents) {
+      if (ratisEvents.size() >= MAX_RATIS_EVENTS) {
+        ratisEvents.remove(0);

Review Comment:
   Use LinkedList and removeFirst() to avoid array copying.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to