sumitagrawl commented on code in PR #9651:
URL: https://github.com/apache/ozone/pull/9651#discussion_r2737967493
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/HealthyPipelineSafeModeRule.java:
##########
@@ -123,81 +117,62 @@ protected synchronized boolean validate() {
LOG.info("All SCM pipelines are closed due to ongoing upgrade " +
"finalization. Bypassing healthy pipeline safemode rule.");
return true;
+ }
+ // Query PipelineManager directly for healthy pipeline count
Review Comment:
In current implementation, if Rule is satisfied once, its marked exit. Never
enter again. But with having dynamic rule check, one time may be success and
other time may be failure.
This may not satisfy the purpose.
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/HealthyPipelineSafeModeRule.java:
##########
@@ -123,81 +117,62 @@ protected synchronized boolean validate() {
LOG.info("All SCM pipelines are closed due to ongoing upgrade " +
"finalization. Bypassing healthy pipeline safemode rule.");
return true;
+ }
+ // Query PipelineManager directly for healthy pipeline count
+ List<Pipeline> openPipelines = pipelineManager.getPipelines(
+ RatisReplicationConfig.getInstance(HddsProtos.ReplicationFactor.THREE),
+ Pipeline.PipelineState.OPEN);
+
+ LOG.info("Found {} open RATIS/THREE pipelines", openPipelines.size());
+ currentHealthyPipelineCount = (int) openPipelines.stream()
+ .filter(this::isPipelineHealthy)
Review Comment:
Earlier, for every pipeline event, pipeline info is logged once. But now it
will be logged for all pipeline for every event, can result in a lot of INFO
logs for pipeline only.
Also will be triggered for EC pipeline type to be logged which is not logged
earlier.
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/HealthyPipelineSafeModeRule.java:
##########
@@ -123,81 +117,62 @@ protected synchronized boolean validate() {
LOG.info("All SCM pipelines are closed due to ongoing upgrade " +
"finalization. Bypassing healthy pipeline safemode rule.");
return true;
+ }
+ // Query PipelineManager directly for healthy pipeline count
+ List<Pipeline> openPipelines = pipelineManager.getPipelines(
+ RatisReplicationConfig.getInstance(HddsProtos.ReplicationFactor.THREE),
+ Pipeline.PipelineState.OPEN);
+
+ LOG.info("Found {} open RATIS/THREE pipelines", openPipelines.size());
+ currentHealthyPipelineCount = (int) openPipelines.stream()
+ .filter(this::isPipelineHealthy)
+ .count();
+
getSafeModeMetrics().setNumCurrentHealthyPipelines(currentHealthyPipelineCount);
+ boolean isValid = currentHealthyPipelineCount >=
healthyPipelineThresholdCount;
+ if (scmInSafeMode()) {
+ LOG.info("SCM in safe mode. Healthy pipelines: {}, threshold: {}, rule
satisfied: {}",
+ currentHealthyPipelineCount, healthyPipelineThresholdCount, isValid);
} else {
- return currentHealthyPipelineCount >= healthyPipelineThresholdCount;
+ LOG.debug("SCM not in safe mode. Healthy pipelines: {}, threshold: {}",
+ currentHealthyPipelineCount, healthyPipelineThresholdCount);
}
+ return isValid;
}
- @Override
- protected synchronized void process(Pipeline pipeline) {
- Objects.requireNonNull(pipeline, "pipeline == null");
-
- // When SCM is in safe mode for long time, already registered
- // datanode can send pipeline report again, or SCMPipelineManager will
- // create new pipelines.
-
- // Only handle RATIS + 3-replica pipelines.
- if (pipeline.getType() != HddsProtos.ReplicationType.RATIS ||
- ((RatisReplicationConfig)
pipeline.getReplicationConfig()).getReplicationFactor() !=
- HddsProtos.ReplicationFactor.THREE) {
- Logger safeModeManagerLog = SCMSafeModeManager.getLogger();
- if (safeModeManagerLog.isDebugEnabled()) {
- safeModeManagerLog.debug("Skipping pipeline safemode report processing
as Replication type isn't RATIS " +
- "or replication factor isn't 3.");
- }
- return;
- }
-
- // Skip already processed ones.
- if (processedPipelineIDs.contains(pipeline.getId())) {
- LOG.info("Skipping pipeline safemode report processing check as
pipeline: {} is already recorded.",
- pipeline.getId());
- return;
+ private boolean isPipelineHealthy(Pipeline pipeline) {
+ // Verify pipeline has all 3 nodes
+ List<DatanodeDetails> nodes = pipeline.getNodes();
+ if (nodes.size() != 3) {
+ LOG.info("Pipeline {} is not healthy: has {} nodes instead of 3",
Review Comment:
this logging will happen now **3 times** for every pipeline event, 2 times
in validate() as called pre-process() and post-process(), and one time in
getStatus(). And this will be done for every pipeline in the SCM for each event.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]