Vinayak Hegde created HBASE-29406:
-------------------------------------
Summary: Skip Copying Bulkloaded Files to Backup Location in
Continuous Backup
Key: HBASE-29406
URL: https://issues.apache.org/jira/browse/HBASE-29406
Project: HBase
Issue Type: Task
Components: backup&restore
Reporter: Vinayak Hegde
Context:
In our current continuous backup design, we stream WALs and bulkloaded files to
the backup location after a full backup. However, we've observed that replaying
bulkloaded files during PITR (Point-In-Time Recovery) is not practical due to
performance overhead.
As a result, we are considering an alternate approach: asking users to take an
incremental backup immediately after performing a bulkload. This would ensure
that bulkloaded files are backed up through the existing incremental backup
mechanism, making it unnecessary to stream them separately.
Why we may skip copying bulkloaded files:
* Bulkloaded files are not reused during PITR or incremental backup recovery
paths.
* Asking users to trigger an incremental backup post-bulkload will naturally
include these files.
* Avoids duplicate effort: currently, the files may be copied both via
continuous backup and again during incremental backup.
* Reduces performance impact, especially since the current implementation
synchronously copies bulkloaded files via the replication endpoint.
Goal of this JIRA:
* Track the design and implementation of changes required to stop copying
bulkloaded files during continuous backup.
* Evaluate and update affected areas such as the replication endpoint, backup
streaming logic, and any related metadata handling.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)