[
https://issues.apache.org/jira/browse/HDDS-14654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sadanand Shenoy updated HDDS-14654:
-----------------------------------
Description:
When a purged snapshot’s YAML is deleted (all versions removed), the orphan
check can still schedule that snapshot again.
The next run then tries to load the missing YAML and throws NoSuchFileException.
{code:java}
java.nio.file.NoSuchFileException: <snapshots_dir>/om.db-<snapshotId>.yaml
at java.nio.file.Files.newInputStream(Files.java:...)
at org.apache.hadoop.ozone.util.YamlSerializer.load(YamlSerializer.java:73)
at
org.apache.hadoop.ozone.om.snapshot.OmSnapshotLocalDataManager$ReadableOmSnapshotLocalDataProvider.lambda$0(OmSnapshotLocalDataManager.java:653)
at
org.apache.hadoop.ozone.om.snapshot.OmSnapshotLocalDataManager$ReadableOmSnapshotLocalDataProvider.initialize(OmSnapshotLocalDataManager.java:655)
at
org.apache.hadoop.ozone.om.snapshot.OmSnapshotLocalDataManager$ReadableOmSnapshotLocalDataProvider.<init>(OmSnapshotLocalDataManager.java:614)
at
org.apache.hadoop.ozone.om.snapshot.OmSnapshotLocalDataManager$WritableOmSnapshotLocalDataProvider.<init>(OmSnapshotLocalDataManager.java:829)
at
org.apache.hadoop.ozone.om.snapshot.OmSnapshotLocalDataManager.checkOrphanSnapshotVersions(OmSnapshotLocalDataManager.java:408)
at
org.apache.hadoop.ozone.om.snapshot.OmSnapshotLocalDataManager.checkOrphanSnapshotVersions(OmSnapshotLocalDataManager.java:399)
at
org.apache.hadoop.ozone.om.snapshot.OmSnapshotLocalDataManager.lambda$0(OmSnapshotLocalDataManager.java:383)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:...)
at java.util.concurrent.FutureTask.run(FutureTask.java:...)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:...)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:...)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:...)
at java.lang.Thread.run(Thread.java:...) {code}
Root cause
In checkForOphanVersionsAndIncrementCount, we always call
incrementOrphanCheckCount(snapshotId) when isPurgeTransactionSet or the version
changes, even if the YAML was just deleted (empty versions). That keeps the
snapshot in snapshotToBeCheckedForOrphans even though there is nothing left to
check.
was:
When a purged snapshot’s YAML is deleted (all versions removed), the orphan
check can still schedule that snapshot again.
The next run then tries to load the missing YAML and throws NoSuchFileException.
Root cause
In checkForOphanVersionsAndIncrementCount, we always call
incrementOrphanCheckCount(snapshotId) when isPurgeTransactionSet or the version
changes, even if the YAML was just deleted (empty versions). That keeps the
snapshot in snapshotToBeCheckedForOrphans even though there is nothing left to
check.
> NoSuchFileException when orphan check runs after purged snapshot YAML is
> deleted
> --------------------------------------------------------------------------------
>
> Key: HDDS-14654
> URL: https://issues.apache.org/jira/browse/HDDS-14654
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Sadanand Shenoy
> Assignee: Sadanand Shenoy
> Priority: Major
> Labels: pull-request-available
>
> When a purged snapshot’s YAML is deleted (all versions removed), the orphan
> check can still schedule that snapshot again.
> The next run then tries to load the missing YAML and throws
> NoSuchFileException.
> {code:java}
> java.nio.file.NoSuchFileException: <snapshots_dir>/om.db-<snapshotId>.yaml
> at java.nio.file.Files.newInputStream(Files.java:...)
> at
> org.apache.hadoop.ozone.util.YamlSerializer.load(YamlSerializer.java:73)
> at
> org.apache.hadoop.ozone.om.snapshot.OmSnapshotLocalDataManager$ReadableOmSnapshotLocalDataProvider.lambda$0(OmSnapshotLocalDataManager.java:653)
> at
> org.apache.hadoop.ozone.om.snapshot.OmSnapshotLocalDataManager$ReadableOmSnapshotLocalDataProvider.initialize(OmSnapshotLocalDataManager.java:655)
> at
> org.apache.hadoop.ozone.om.snapshot.OmSnapshotLocalDataManager$ReadableOmSnapshotLocalDataProvider.<init>(OmSnapshotLocalDataManager.java:614)
> at
> org.apache.hadoop.ozone.om.snapshot.OmSnapshotLocalDataManager$WritableOmSnapshotLocalDataProvider.<init>(OmSnapshotLocalDataManager.java:829)
> at
> org.apache.hadoop.ozone.om.snapshot.OmSnapshotLocalDataManager.checkOrphanSnapshotVersions(OmSnapshotLocalDataManager.java:408)
> at
> org.apache.hadoop.ozone.om.snapshot.OmSnapshotLocalDataManager.checkOrphanSnapshotVersions(OmSnapshotLocalDataManager.java:399)
> at
> org.apache.hadoop.ozone.om.snapshot.OmSnapshotLocalDataManager.lambda$0(OmSnapshotLocalDataManager.java:383)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:...)
> at java.util.concurrent.FutureTask.run(FutureTask.java:...)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:...)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:...)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:...)
> at java.lang.Thread.run(Thread.java:...) {code}
> Root cause
> In checkForOphanVersionsAndIncrementCount, we always call
> incrementOrphanCheckCount(snapshotId) when isPurgeTransactionSet or the
> version changes, even if the YAML was just deleted (empty versions). That
> keeps the snapshot in snapshotToBeCheckedForOrphans even though there is
> nothing left to check.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]