[jira] [Created] (HDDS-13328) Add Case for Byteman fault causing BCSID mismatch and finally Force Delete

Soumitra Sulav (Jira) Wed, 25 Jun 2025 12:36:57 -0700

Soumitra Sulav created HDDS-13328:
-------------------------------------

             Summary: Add Case for Byteman fault causing BCSID mismatch and 
finally Force Delete
                 Key: HDDS-13328
                 URL: https://issues.apache.org/jira/browse/HDDS-13328
             Project: Apache Ozone
          Issue Type: Sub-task
          Components: Ozone Manager
            Reporter: Soumitra Sulav



Add a Case for Byteman fault causing BCSID mismatch, and finally, Force Delete.


{code:java}
Modify Configuration for Faster State Transitions
Apply the following configurations to accelerate container state transitions 
and enforce pipeline limits on the cluster:
hdds.scm.replication.thread.interval: 5s
ozone.scm.datanode.pipeline.limit: 1
hdds.scm.wait.time.after.safemode.exit: 5s

Apply the byteman rule to block putblock to introduce inconsistent container 
state and notifyGroup remove to disable QuasiClosed container state.

Do this on Datanode 1 and have only 3 running datanodes in the cluster

Initiate a New Pipeline and Container
Close all active pipelines to ensure that the next write operation triggers a 
new pipeline and container creation.
Monitor Container State Across All Datanodes
Identify and observe the container state on all three datanodes.
Stop Datanode 1 & Transition the Container to a Closed State
Shut down Datanode 1 and close the associated pipeline.
Monitor the container state— it should move to the CLOSING state and eventually 
transition to CLOSED.
Verify the transition by fetching the dn-container.log.
Restart Datanode 1 & Validate Container State
Bring Datanode 1 back online and check its reported container states.
Wait for the container to transition— it should move to CLOSED but with a 0 
BCSID.
Bring Down the Remaining Datanodes & Restart SCM
Shut down any two datanodes from the remaining set.
Select two random datanodes to allow SCM to restart without entering safe mode.
Restart SCM and Datanode 1, ensuring that the container is marked EMPTY sooner 
due to a fresh volume container reload.
Confirm Container Deletion
Since the container has a 0 BCSID, Datanode 1's container should be marked 
EMPTY and queued for deletion.
Verify the container state using admin container info.
Restart Remaining Datanodes & Validate Deletion Propagation
Bring the other two datanodes back online.
Confirm whether the container deletion is propagated to these nodes.
If the container had a non-zero BCSID in an unstable cluster, a Force Delete 
may be triggered, ensuring its removal from all datanodes.
{code}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (HDDS-13328) Add Case for Byteman fault causing BCSID mismatch and finally Force Delete

Reply via email to