Vinayak Hegde created HBASE-29406:
-------------------------------------

             Summary: Skip Copying Bulkloaded Files to Backup Location in 
Continuous Backup
                 Key: HBASE-29406
                 URL: https://issues.apache.org/jira/browse/HBASE-29406
             Project: HBase
          Issue Type: Task
          Components: backup&restore
            Reporter: Vinayak Hegde


Context:
In our current continuous backup design, we stream WALs and bulkloaded files to 
the backup location after a full backup. However, we've observed that replaying 
bulkloaded files during PITR (Point-In-Time Recovery) is not practical due to 
performance overhead.

As a result, we are considering an alternate approach: asking users to take an 
incremental backup immediately after performing a bulkload. This would ensure 
that bulkloaded files are backed up through the existing incremental backup 
mechanism, making it unnecessary to stream them separately.

Why we may skip copying bulkloaded files:
 * Bulkloaded files are not reused during PITR or incremental backup recovery 
paths.

 * Asking users to trigger an incremental backup post-bulkload will naturally 
include these files.

 * Avoids duplicate effort: currently, the files may be copied both via 
continuous backup and again during incremental backup.

 * Reduces performance impact, especially since the current implementation 
synchronously copies bulkloaded files via the replication endpoint.

Goal of this JIRA:
 * Track the design and implementation of changes required to stop copying 
bulkloaded files during continuous backup.

 * Evaluate and update affected areas such as the replication endpoint, backup 
streaming logic, and any related metadata handling.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to