sreejasahithi opened a new pull request, #10414:
URL: https://github.com/apache/ozone/pull/10414

   ## What changes were proposed in this pull request?
   Implement logic to traverse all storage volumes configured in 
**hdds.datanode.dir** and discover container directories present under the 
DataNode container storage hierarchy.
   
   For each discovered container directory:
   
   - Extract the container ID from the directory name.
   - Collect the container directory path, storage volume, and directory size.
   - Determine the metadata status:
         **MISSING_METADATA** if metadata/{containerId}.container does not 
exist.
         **INVALID_METADATA** if the metadata file exists but cannot be parsed, 
or if the container ID stored in the metadata does  not match the 
directory-name container ID.
         **VALID** otherwise.
   
   Store the results as a mapping:
   containerId -> List\<ContainerOccurrence\>
   where each occurrence contains the container directory path, volume, size, 
and metadata status.
   
   Use this mapping to identify duplicate container directories by detecting 
container IDs associated with more than one on-disk occurrence across storage 
volumes on the same DataNode.
   
   ## What is the link to the Apache JIRA
   
   [HDDS-15455](https://issues.apache.org/jira/browse/HDDS-15455)
   
   ## How was this patch tested?
   Added tests
   Green CI : https://github.com/sreejasahithi/ozone/actions/runs/26815325532
   
   In docker ozone-ha cluster:
   set OZONE-SITE.XML_hdds.datanode.dir=/data/hdds0,/data/hdds1,/data/hdds2
   ```
   bash-5.1$ ozone getconf -confKey hdds.datanode.dir
   /data/hdds0,/data/hdds1,/data/hdds2
   
   bash-5.1$ ozone sh volume create /vol1
   bash-5.1$ ozone sh bucket create /vol1/bucket1
   bash-5.1$ ozone freon ockg -n 100
   
   bash-5.1$ cd 
/data/hdds0/hdds/CID-25a9e273-8a28-47b1-8c35-143b8b8206a1/current/containerDir0
   bash-5.1$ ls
   2 41 42
   bash-5.1$ cd 
/data/hdds1/hdds/CID-25a9e273-8a28-47b1-8c35-143b8b8206a1/current/containerDir0
   28 31        6
   bash-5.1$ cd 
/data/hdds2/hdds/CID-25a9e273-8a28-47b1-8c35-143b8b8206a1/current/containerDir0
   bash-5.1$ ls
   17 24        37
   ```
   Then did the following do make duplicates:
   
   ```
   CLUSTER="CID-25a9e273-8a28-47b1-8c35-143b8b8206a1"
   CID=2
   PATH_A="/data/hdds0/hdds/${CLUSTER}/current/containerDir0/${CID}"
   PATH_B="/data/hdds1/hdds/${CLUSTER}/current/containerDir0/${CID}"
   
   # partial copy ONLY — chunks, no metadata
   mkdir -p "$PATH_B/chunks"
   cp -a "$PATH_A/chunks/." "$PATH_B/chunks/"
   
   # verify: metadata should NOT exist on hdds1
   ls "$PATH_A/metadata/2.container" # exists
   ls "$PATH_B/metadata" 2>&1 # should fail
   ```
   Ran the command which properly identifies the duplicates:
   ```
   bash-5.1$ ozone debug datanode container analyze
   Duplicate container directories on this DataNode: 1
   Container 2 (2 occurrences):
    volume=/data/hdds1
    
path=/data/hdds1/hdds/CID-25a9e273-8a28-47b1-8c35-143b8b8206a1/current/containerDir0/2
    status=MISSING_METADATA size=30720 bytes
    volume=/data/hdds0
    
path=/data/hdds0/hdds/CID-25a9e273-8a28-47b1-8c35-143b8b8206a1/current/containerDir0/2
    status=VALID size=31286 bytes
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to