[ https://issues.apache.org/jira/browse/HBASE-29406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tak-Lon (Stephen) Wu resolved HBASE-29406. ------------------------------------------ Hadoop Flags: Reviewed Resolution: Fixed patch merged into feature branch > Skip Copying Bulkloaded Files to Backup Location in Continuous Backup > --------------------------------------------------------------------- > > Key: HBASE-29406 > URL: https://issues.apache.org/jira/browse/HBASE-29406 > Project: HBase > Issue Type: Task > Components: backup&restore > Reporter: Vinayak Hegde > Assignee: Vinayak Hegde > Priority: Major > Labels: HBASE-28957, pull-request-available > Fix For: HBASE-28957 > > > Context: > In our current continuous backup design, we stream WALs and bulkloaded files > to the backup location after a full backup. However, we've observed that > replaying bulkloaded files during PITR (Point-In-Time Recovery) is not > practical due to performance overhead. > As a result, we are considering an alternate approach: asking users to take > an incremental backup immediately after performing a bulkload. This would > ensure that bulkloaded files are backed up through the existing incremental > backup mechanism, making it unnecessary to stream them separately. > Why we may skip copying bulkloaded files: > * Bulkloaded files are not reused during PITR or incremental backup recovery > paths. > * Asking users to trigger an incremental backup post-bulkload will naturally > include these files. > * Avoids duplicate effort: currently, the files may be copied both via > continuous backup and again during incremental backup. > * Reduces performance impact, especially since the current implementation > synchronously copies bulkloaded files via the replication endpoint. > Goal of this JIRA: > * Track the design and implementation of changes required to stop copying > bulkloaded files during continuous backup. > * Evaluate and update affected areas such as the replication endpoint, > backup streaming logic, and any related metadata handling. -- This message was sent by Atlassian Jira (v8.20.10#820010)