[
https://issues.apache.org/jira/browse/HDDS-15138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aryan Gupta updated HDDS-15138:
-------------------------------
Description:
This patch switches SCM safemode handling to the new EC approach.
When default replication is EC, SCM now uses a dedicated
{{ECMinDataNodeSafeModeRule}} to check if enough healthy DataNodes are
available ({{{}data + parity{}}}) before safemode exit. We no longer rely on EC
pipeline health checks for this.
A new config flag controls whether {{BackgroundPipelineCreator}} should create
{{RATIS/THREE}} pipelines in EC-default clusters:
* {{ozone.scm.pipeline.creation.ec.ratis.three.enabled=true}} -> create
RATIS/THREE pipelines in background
* {{false}} -> do not create background pipelines for EC-default
For RATIS-default clusters, behavior stays the same as before.
Tests were updated to match this behavior:
* added tests for the new EC DN safemode rule
* added tests for EC-default + flag true/false behavior
* removed old tests tied to the previous EC pipeline-validation approach
was:
Today SCM safemode pipeline exit checks are effectively hardcoded to
{{{}RATIS/THREE{}}}, which is incorrect when cluster default replication is EC.
In EC-default deployments, safemode should validate pipelines for the
configured default replication instead of only {{{}RATIS/THREE{}}}.
This patch generalizes safemode pipeline validation to use
{{{}ReplicationConfig.getDefault(conf){}}}:
* {{HealthyPipelineSafeModeRule}} now evaluates pipelines matching the
configured default replication config and uses required node count from that
config.
* {{OneReplicaPipelineSafeModeRule}} now tracks/report-validates pipelines
matching the configured default replication config.
To keep behavior consistent, {{BackgroundPipelineCreator}} is updated to
include EC pipeline creation when default replication type is EC, while
preserving existing RATIS behavior when EC is not configured.
h3. Expected outcome
* RATIS default: behavior remains unchanged.
* EC default: safemode pipeline checks validate EC pipelines, and background
creation can create EC pipelines accordingly.
h3. Validation
Added/updated SCM tests for EC-default and RATIS-default paths. Focused suite
passes:
* {{TestHealthyPipelineSafeModeRule}}
* {{TestOneReplicaPipelineSafeModeRule}}
* {{TestSCMSafeModeManager}}
* {{TestBackgroundPipelineCreator}}
h3. Note
For EC-default setups in SCM safemode paths, ensure both are set:
* {{ozone.replication.type=EC}}
* {{ozone.replication=RS-3-2-1024k}}
> SCM safemode pipeline rules should honor default EC replication config
> ----------------------------------------------------------------------
>
> Key: HDDS-15138
> URL: https://issues.apache.org/jira/browse/HDDS-15138
> Project: Apache Ozone
> Issue Type: Improvement
> Reporter: Aryan Gupta
> Assignee: Aryan Gupta
> Priority: Major
> Labels: pull-request-available
>
> This patch switches SCM safemode handling to the new EC approach.
> When default replication is EC, SCM now uses a dedicated
> {{ECMinDataNodeSafeModeRule}} to check if enough healthy DataNodes are
> available ({{{}data + parity{}}}) before safemode exit. We no longer rely on
> EC pipeline health checks for this.
> A new config flag controls whether {{BackgroundPipelineCreator}} should
> create {{RATIS/THREE}} pipelines in EC-default clusters:
> * {{ozone.scm.pipeline.creation.ec.ratis.three.enabled=true}} -> create
> RATIS/THREE pipelines in background
> * {{false}} -> do not create background pipelines for EC-default
> For RATIS-default clusters, behavior stays the same as before.
> Tests were updated to match this behavior:
> * added tests for the new EC DN safemode rule
> * added tests for EC-default + flag true/false behavior
> * removed old tests tied to the previous EC pipeline-validation approach
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]