Kodali Bhavya Sree created HDDS-14697:
-----------------------------------------
Summary: Data integrity is missing for the snapshot after
de-commissioning Leader OM node. Below are the steps followed:
Key: HDDS-14697
URL: https://issues.apache.org/jira/browse/HDDS-14697
Project: Apache Ozone
Issue Type: Bug
Components: Ozone Manager
Affects Versions: 2.0.0
Reporter: Kodali Bhavya Sree
Data integrity is missing for the snapshot after de-commissioning Leader OM
node. Below are the steps followed:
{code:java}
1. Create a volume and bucket with different params. Generate keys over the
bucket.
2. Calculate checksum of all the files.
3. Create two snapshot, delete one of them
4. Decommission a leader OM node
5. Validate the checksums of the file.
6. Create a snapshots after decommissioning and calculate snapdiff with the
snapshot (created) before decommissioning{code}
Checksum validation of snapshots after decommissioning is failing.
Failing after below step → Copying all objects under snapshot {{snap-j0vp8}} of
that bucket into
{{}}
{code:java}
2026-02-22 09:46:51,203|INFO|MainThread|machine.py:190 -
run()||GUID=c7d2cbe1-f9e1-4074-a0fe-00b4d295bbbe|RUNNING: ssh -l root -i
/tmp/hw-qe-keypair.pem -q -o StrictHostKeyChecking=no -o
UserKnownHostsFile=/dev/null quasar-nntvzt-3.vpc.cloudera.com "export
KRB5CCNAME=/hwqe/hadoopqe/artifacts/kerberosTickets/hrt_qa.kerberos.ticket;
export OZONE_LOGLEVEL=INFO;/opt/cloudera/parcels/CDH/bin/ozone fs -get
ofs://ozone1771736427/vol-test-workload-om-decommission-recommission-1771752191/buck-test-workload-om-decommission-recommission-1771752191/.snapshot/snap-j0vp8/*
/test_master_node_decommissioning_om_workload/workload_local1771752225"{code}
{{
}}Below Null pointer exception is seen
{code:java}
2026-02-22 09:46:53,571|INFO|MainThread|machine.py:205 -
run()||GUID=c7d2cbe1-f9e1-4074-a0fe-00b4d295bbbe|26/02/22 01:46:53 INFO
retry.RetryInvocationHandler: com.google.protobuf.ServiceException:
org.apache.hadoop.ipc_.RemoteException(java.lang.IllegalStateException):
java.lang.NullPointerException: Cannot invoke
"org.apache.hadoop.ozone.om.snapshot.OmSnapshotLocalDataManager$SnapshotVersionsMeta.getVersion()"
because the return value of
"org.apache.hadoop.ozone.om.snapshot.OmSnapshotLocalDataManager$ReadableOmSnapshotLocalDataMetaProvider.getMeta()"
is nul{code}
where definition of getMeta() is below:
{code:java}
public synchronized SnapshotVersionsMeta getMeta() throws IOException {
if (closed) {
throw new IOException("Resource has already been closed.");
}
return meta;
}
{code}
Upstream PR where these changes went in:
[+https://github.infra.cloudera.com/CDH/ozone/commit/1dda9abfa979f3282d3355e154ce025f76d48665+]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]