Dieter De Paepe created HBASE-29604:
---------------------------------------
Summary: BackupHFileCleaner uses flawed time based check
Key: HBASE-29604
URL: https://issues.apache.org/jira/browse/HBASE-29604
Project: HBase
Issue Type: Bug
Components: backup&restore
Affects Versions: 3.0.0-beta-1, 2.6.0, 4.0.0-alpha-1
Reporter: Dieter De Paepe
BackupHFileCleaner is responsible for preventing the cleanup of bulkloaded
HFiles that are still required by the backup & restore mechanism. It does this
using 2 checks:
* The backupsystemtable stores which HFile bulk loads are required for the
next incremental backup. Any HFile present here cannot be deleted.
* A time-based check is present to avoid recently created HFiles from being
deleted. The intention is to avoid deletion of HFiles newer than the previous
run of the cleaner. I believe is to avoid race conditions between the cleaner
and entries in the backupsystemtable that get created while the cleaner is
running.
In a single-threaded context, this works correctly.
However, the cleaner is actually used concurrently in the
hfile_cleaner-dir-scan-pool to scan multiple subdirectories in
`CleanerChore#traverseAndDelete` (line 492). This means the time-based check is
not guaranteed to protect recently created HFiles. This has a (small) chance to
cause data loss (in a backup) if an HFile is wrongfully deleted.
I also strongly suggest to add a mention to FileCleanerDelegate that
implementations should be thread-safe.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)