[
https://issues.apache.org/jira/browse/HDDS-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18089229#comment-18089229
]
Ryan Blough commented on HDDS-14435:
------------------------------------
I found what look like some problems that are actually on the
snapshotUsedNamespace side. Looks like two areas where there is a problem:
# In OmKeyDeleteRequestWithFSO.java, it looks like we put a tombstone on the
directory, but the corresponding snapshotUsedNamespace update is skipped
through the empty key check:
{code:java}
long quotaReleased = sumBlockLengths(omKeyInfo);
// Empty entries won't be added to deleted table so this key shouldn't
get added to snapshotUsed space.
boolean isKeyNonEmpty = !OmKeyInfo.isKeyEmpty(omKeyInfo);
omBucketInfo.decrUsedBytes(quotaReleased, isKeyNonEmpty);
-> omBucketInfo.decrUsedNamespace(1L, isKeyNonEmpty);{code}
https://github.com/apache/ozone/blob/cb29f193ea1d1945a8b908fbf8ee1a39bc00e5e4/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeyDeleteRequestWithFSO.java#L161-L165
This is the only namespace update in the file, so it looks to me like any any
time a user deletes a directory it doesn't get reflected in the
snapshotUsedNamespace because directories don't have blocks.
2. In OMDirectoriesPurgeRequestWithFSO.java, it looks like we always decrement
snapshotUsedNamespace on purge, with no attendant checks of whether there even
is a snapshot:
{code:java}
if (path.hasDeletedDir()) {
deletedDirNames.add(path.getDeletedDir());
BucketNameInfo bucketNameInfo = volumeBucketIdMap.get(new
VolumeBucketId(path.getVolumeId(),
path.getBucketId()));
OmBucketInfo omBucketInfo = getBucketInfo(omMetadataManager,
bucketNameInfo.getVolumeName(), bucketNameInfo.getBucketName());
if (omBucketInfo != null && omBucketInfo.getObjectID() ==
path.getBucketId()) {
--> omBucketInfo.purgeSnapshotUsedNamespace(1);
volBucketInfoMap.put(Pair.of(omBucketInfo.getVolumeName(),
omBucketInfo.getBucketName()), omBucketInfo);
}
numDirsDeleted++;
} {code}
https://github.com/apache/ozone/blob/cb29f193ea1d1945a8b908fbf8ee1a39bc00e5e4/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMDirectoriesPurgeRequestWithFSO.java#L202
As a consequence, it's common for snapshotUsedNamespace to be hugely negative
on buckets with no snapshots at all. For example, I know of a large cluster
with ~200 DNs with 68 buckets that have negative-value snapshotUsedNamespace
values, and only 6 specific buckets in the cluster have snapshots.
These wrong snapshotUsedNamespace counts will be reflected in the bucket info
output the value it displays is the sum of AOS usedNamespace +
snapshotUsedNamespace, so this problem wrecks namespace quotas even on FSO
buckets without snapshots:
{code:java}
public long getTotalBucketNamespace() {
return usedNamespace + snapshotUsedNamespace;
} {code}
> usedNamespace is being decremented incorrectly
> ----------------------------------------------
>
> Key: HDDS-14435
> URL: https://issues.apache.org/jira/browse/HDDS-14435
> Project: Apache Ozone
> Issue Type: Bug
> Affects Versions: 2.1.0
> Reporter: Wei-Chiu Chuang
> Assignee: Aswin Shakil
> Priority: Major
>
> [~sarvekshayr] reported that after HDDS-13756 seems to cause a regression:
> https://github.com/apache/ozone/pull/9115#discussion_r2685116012
> {noformat}
> After this PR, observed an issue where usedBytes is high while usedNamespace
> is 0, even though the bucket has no keys.
> It appears that usedNamespace is being decremented incorrectly. In the
> overwrite case, the namespace seems to be reduced twice, whereas it should
> done once by 1.
> Since the bucket has no data, this likely indicates a problem in how
> namespace is being updated during the commit or delete flow.
> {noformat}
> [~aswinshakil] [~smeng]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]