sreejasahithi opened a new pull request, #10414:
URL: https://github.com/apache/ozone/pull/10414
## What changes were proposed in this pull request?
Implement logic to traverse all storage volumes configured in
**hdds.datanode.dir** and discover container directories present under the
DataNode container storage hierarchy.
For each discovered container directory:
- Extract the container ID from the directory name.
- Collect the container directory path, storage volume, and directory size.
- Determine the metadata status:
**MISSING_METADATA** if metadata/{containerId}.container does not
exist.
**INVALID_METADATA** if the metadata file exists but cannot be parsed,
or if the container ID stored in the metadata does not match the
directory-name container ID.
**VALID** otherwise.
Store the results as a mapping:
containerId -> List\<ContainerOccurrence\>
where each occurrence contains the container directory path, volume, size,
and metadata status.
Use this mapping to identify duplicate container directories by detecting
container IDs associated with more than one on-disk occurrence across storage
volumes on the same DataNode.
## What is the link to the Apache JIRA
[HDDS-15455](https://issues.apache.org/jira/browse/HDDS-15455)
## How was this patch tested?
Added tests
Green CI : https://github.com/sreejasahithi/ozone/actions/runs/26815325532
In docker ozone-ha cluster:
set OZONE-SITE.XML_hdds.datanode.dir=/data/hdds0,/data/hdds1,/data/hdds2
```
bash-5.1$ ozone getconf -confKey hdds.datanode.dir
/data/hdds0,/data/hdds1,/data/hdds2
bash-5.1$ ozone sh volume create /vol1
bash-5.1$ ozone sh bucket create /vol1/bucket1
bash-5.1$ ozone freon ockg -n 100
bash-5.1$ cd
/data/hdds0/hdds/CID-25a9e273-8a28-47b1-8c35-143b8b8206a1/current/containerDir0
bash-5.1$ ls
2 41 42
bash-5.1$ cd
/data/hdds1/hdds/CID-25a9e273-8a28-47b1-8c35-143b8b8206a1/current/containerDir0
28 31 6
bash-5.1$ cd
/data/hdds2/hdds/CID-25a9e273-8a28-47b1-8c35-143b8b8206a1/current/containerDir0
bash-5.1$ ls
17 24 37
```
Then did the following do make duplicates:
```
CLUSTER="CID-25a9e273-8a28-47b1-8c35-143b8b8206a1"
CID=2
PATH_A="/data/hdds0/hdds/${CLUSTER}/current/containerDir0/${CID}"
PATH_B="/data/hdds1/hdds/${CLUSTER}/current/containerDir0/${CID}"
# partial copy ONLY — chunks, no metadata
mkdir -p "$PATH_B/chunks"
cp -a "$PATH_A/chunks/." "$PATH_B/chunks/"
# verify: metadata should NOT exist on hdds1
ls "$PATH_A/metadata/2.container" # exists
ls "$PATH_B/metadata" 2>&1 # should fail
```
Ran the command which properly identifies the duplicates:
```
bash-5.1$ ozone debug datanode container analyze
Duplicate container directories on this DataNode: 1
Container 2 (2 occurrences):
volume=/data/hdds1
path=/data/hdds1/hdds/CID-25a9e273-8a28-47b1-8c35-143b8b8206a1/current/containerDir0/2
status=MISSING_METADATA size=30720 bytes
volume=/data/hdds0
path=/data/hdds0/hdds/CID-25a9e273-8a28-47b1-8c35-143b8b8206a1/current/containerDir0/2
status=VALID size=31286 bytes
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]