[PR] HDDS-14989. Delay follower SCM DN server start until Ratis log catch-up. [ozone]

via GitHub Fri, 26 Jun 2026 01:26:43 -0700


ArafatKhan2198 opened a new pull request, #10617:
URL: https://github.com/apache/ozone/pull/10617


   ## What changes were proposed in this pull request?
   When an SCM follower restarts in an HA cluster, it used to start talking to 
datanodes **right away**, even while it was still catching up on the Ratis log.
   
   That caused problems:
   
   - Datanodes report containers the follower doesn’t know about yet → 
**`CONTAINER_NOT_FOUND`**
   - Or the follower tries to update container state and fails → 
**`NotLeaderException`**
   - In both cases, **replica info gets dropped**
   - If that SCM later becomes leader, containers can show **missing or wrong 
replicas**
   
   **The fix:**
   
   1. **Don’t start the datanode server in HA mode** during normal SCM startup.
   2. **Wait until catch-up is done**, then start it from `SCMStateMachine`.
   3. **Don’t let followers write container state changes** during report 
handling — only the leader should.
   
   **Why:** Replica locations are rebuilt from datanode reports. Those reports 
must only be processed **after** the SCM has replayed all committed Ratis 
entries.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-14989
   
   ## How was this patch tested?
   ### **Integration tests**
   
   TestSCMFollowerCatchupWithContainerReport - 
   
   - `testFollowerCatchupAfterContainerClose` — close-while-down (HDDS-14989 
scenario)
   - `testFollowerCatchupAfterContainerCreate` — create-while-down 
(`CONTAINER_NOT_FOUND` scenario)
   - `testFollowerCatchupOnIdleCluster` — idle cluster edge case
   
   ### **Manual test (docker-compose `ozone-ha`)**
   
   Environment: `hadoop-ozone/dist/target/ozone-2.3.0-SNAPSHOT/compose/ozone-ha`
   
   Config: RF=3, 3 datanodes, `hdds.container.report.interval=1h`, 
`ozone.scm.container.size=1GB`
   
   **Procedure** (same for with/without fix):
   
   1. Start cluster: `OZONE_REPLICATION_FACTOR=3 docker compose up -d --scale 
datanode=3`
   2. Write 50 × 1MB keys to `vol1/buck1` (containers 1–3)
   3. Stop follower **scm3**
   4. Close containers 1, 2, 3
   5. Write 50 × 1MB keys to `vol1/buck2` (creates containers 4, 5, 6 while 
scm3 is down)
   6. Restart **scm3**
   7. Transfer SCM leadership to scm3
   8. Inspect scm3 logs and `ozone admin container info` for containers 4–6
   
   **Without the fix:**
   ```
   06:57:58.622  ScmDatanodeProtocol RPC server ... listening at /0.0.0.0:9861
   06:57:58.837  CONTAINER_NOT_FOUND for Container #4
   06:57:58.837  CONTAINER_NOT_FOUND for Container #5
   06:57:58.837  CONTAINER_NOT_FOUND for Container #6
   (6 errors total — 2 datanodes × 3 containers)
   ```
   After leadership transfer, containers 4–6 had **1 replica each** (expected 
3).
   
   **With the fix:**
   ```
   07:24:28.377  Follower caught up with leader: lastAppliedIndex=49, 
leaderCommit=49
   07:24:28.378  ScmDatanodeProtocol RPC server ... listening at /0.0.0.0:9861
   ```
   - `CONTAINER_NOT_FOUND` on scm3: **0**
   - After leadership transfer, containers 4–6 each had **3 replicas** from all 
datanodes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] HDDS-14989. Delay follower SCM DN server start until Ratis log catch-up. [ozone]

Reply via email to