HemaKumar created HBASE-27404:
---------------------------------
Summary: Long running ExportSnapshot fails with Can't find hfile
Exception.
Key: HBASE-27404
URL: https://issues.apache.org/jira/browse/HBASE-27404
Project: HBase
Issue Type: Bug
Components: snapshots
Reporter: HemaKumar
ExportSnapshot Jobs running for more than destination cluster
hbase.master.hfilecleaner.ttl value, are filing with {_}Can't find hfile:
<hile> in the real or archive folders{_}. Copied HFiles in archive folder is
getting deleted at the Destination cluster by SnapshotHFileCleaner cleaner.
# Export snapshot moves archived hfiles files to destination archved folders.
# In progress ExportSnapshot manifest will be there in
/hbase/.hbase-snapshot/.tmp till it is completed.
# in SnapshotHFileCleaner flow, where it is ignoring
/hbase/.hbase-snapshot/.tmp directory to find the snapshot reference files,
{code:java}
private void refreshCache() throws IOException {
// just list the snapshot directory directly, do not check the modification
time for the root
// snapshot directory, as some file system implementations do not modify the
parent directory's
// modTime when there are new sub items, for example, S3.
FileStatus[] snapshotDirs = FSUtils.listStatus(fs, snapshotDir,
p -> !p.getName().equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME));
{code}
# As in progress snapshot reference is missed by SnapshotHFileCleaner.
TimeToLiveHFileCleaner marks the HFiles older(coped before
hbase.master.hfilecleaner.ttl) than hbase.master.hfilecleaner.ttl to delete
from in progress ExportSnapshots dir.
# This is causing ExportSnapshot to fail at the verification stage.
Workaround:
increase hbase.master.hfilecleaner.ttl value to more than the Snapshot
ExportSnapshot job run time in the destination cluster.
I think this issue needs to be fixed in SnapshotHFileCleaner flow so that
long-running ExportSnapshot jobs can succeed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)