szetszwo commented on code in PR #10034:
URL: https://github.com/apache/ozone/pull/10034#discussion_r3046665768
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java:
##########
@@ -213,6 +215,10 @@ public void notifyNotLeader(Collection<TransactionContext>
pendingEntries) {
return;
}
LOG.info("current leader SCM steps down.");
+ SCMMetrics metrics = StorageContainerManager.getMetrics();
+ if (metrics != null) {
Review Comment:
Seems that the returned metrics is never null. Add a field and a helper
method.
```java
private final SCMMetrics metrics = StorageContainerManager.getMetrics();
```
```java
private void addRatisEvent(String eventMessage) {
LOG.info(eventMessage);
metrics.addRatisEvent(eventMessage);
}
```
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java:
##########
@@ -297,10 +310,16 @@ public void notifyLeaderChanged(RaftGroupMemberId
groupMemberId,
if (!groupMemberId.getPeerId().equals(newLeaderId)) {
LOG.info("leader changed, yet current SCM is still follower.");
+ if (metrics != null) {
+ metrics.addRatisEvent("Leader changed to " + newLeaderId + ", yet
current SCM is still follower.");
Review Comment:
Let's include the id of this server. The log message and the
```java
addRatisEvent("Leader changed to " + newLeaderId
+ ", this server " + groupMemberId.getPeerId() + " remains a
Follower.");
```
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java:
##########
@@ -394,11 +413,35 @@ public void notifyLeaderReady() {
scm.getScmContext().setLeaderReady();
scm.getSCMServiceManager().notifyStatusChanged();
scm.getFinalizationManager().onLeaderReady();
+ SCMMetrics metrics = StorageContainerManager.getMetrics();
+ if (metrics != null) {
+ metrics.addRatisEvent("Ready to serve requests as the leader");
+ }
}
@Override
public void notifyConfigurationChanged(long term, long index,
RaftProtos.RaftConfigurationProto newRaftConfiguration) {
+ SCMMetrics metrics = StorageContainerManager.getMetrics();
+ if (metrics != null) {
+ List<RaftProtos.RaftPeerProto> newPeers =
+ newRaftConfiguration.getPeersList();
+ List<RaftProtos.RaftPeerProto> newListeners =
+ newRaftConfiguration.getListenersList();
+ List<String> newPeerIds = new ArrayList<>();
+ List<String> newListenersIds = new ArrayList<>();
+ for (RaftProtos.RaftPeerProto raftPeerProto : newPeers) {
+ newPeerIds.add(RaftPeerId.valueOf(raftPeerProto.getId()).toString());
+ }
+ for (RaftProtos.RaftPeerProto raftListenerProto : newListeners) {
+
newListenersIds.add(RaftPeerId.valueOf(raftListenerProto.getId()).toString());
+ }
+ metrics.addRatisEvent(
+ "New peers " + newPeerIds +
+ (newListenersIds.isEmpty() ? "" : ", new listeners " +
newListenersIds) +
+ " added at term index (" +
Review Comment:
getPeersList() returns the current peers in the conf. It include both the
newly added peers and the existing peers (unless they are removed).
```java
addRatisEvent("notifyConfigurationChanged at " + TermIndex.valueOf(term,
index)
+ ": " + TextFormat.shortDebugString(newRaftConfiguration));
```
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/metrics/SCMMetrics.java:
##########
@@ -33,6 +36,9 @@ public class SCMMetrics {
public static final String SOURCE_NAME =
SCMMetrics.class.getSimpleName();
+ private final List<String> ratisEvents = new ArrayList<>();
+ private static final int MAX_RATIS_EVENTS = 100;
Review Comment:
Let's make it configurable?
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/metrics/SCMMetrics.java:
##########
@@ -155,6 +161,22 @@ public void decrContainerStat(ContainerStat deltaStat) {
this.containerReportWriteCount.incr(-1 * deltaStat.getWriteCount().get());
}
+ public void addRatisEvent(String event) {
+ synchronized (ratisEvents) {
+ if (ratisEvents.size() >= MAX_RATIS_EVENTS) {
+ ratisEvents.remove(0);
+ }
+ ratisEvents.add(Time.formatTime(Time.now()) + "|" + event);
+ }
+ }
+
+ @Metric("Ratis state machine events")
+ public String getRatisEvents() {
+ synchronized (ratisEvents) {
+ return String.join("\n", ratisEvents);
Review Comment:
- Should the metrics publish only the new events?
- How often this method is involved by the metrics system? It could be
inefficient if it keep creating the same, long string again and again.
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java:
##########
@@ -213,6 +215,10 @@ public void notifyNotLeader(Collection<TransactionContext>
pendingEntries) {
return;
}
LOG.info("current leader SCM steps down.");
+ SCMMetrics metrics = StorageContainerManager.getMetrics();
+ if (metrics != null) {
+ metrics.addRatisEvent("current leader SCM steps down.");
Review Comment:
Let's include the id.
```java
addRatisEvent("This server " + getId() + " has stepped down from the
Leader.");
```
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/metrics/SCMMetrics.java:
##########
@@ -155,6 +161,22 @@ public void decrContainerStat(ContainerStat deltaStat) {
this.containerReportWriteCount.incr(-1 * deltaStat.getWriteCount().get());
}
+ public void addRatisEvent(String event) {
+ synchronized (ratisEvents) {
+ if (ratisEvents.size() >= MAX_RATIS_EVENTS) {
+ ratisEvents.remove(0);
Review Comment:
Use LinkedList and removeFirst() to avoid array copying.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]