[ https://issues.apache.org/jira/browse/HBASE-28706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875866#comment-17875866 ]
Dieter De Paepe commented on HBASE-28706: ----------------------------------------- I designated this ticket (and several others) that could result in data loss as blockers, per the definitions provided in the issue submission. Personally, I have no objection to 2.6.1 releasing, with some blockers that only affect multi-root backups still open, given that many issues for single-root backups have been fixed. > Tracking of bulk-loads for backup does not work for multi-root backups > ---------------------------------------------------------------------- > > Key: HBASE-28706 > URL: https://issues.apache.org/jira/browse/HBASE-28706 > Project: HBase > Issue Type: Bug > Components: backup&restore > Affects Versions: 2.6.0, 3.0.0, 4.0.0-alpha-1 > Reporter: Dieter De Paepe > Priority: Blocker > > Haven't been able to test this yet, but I highly suspect that > IncrementalTableBackupClient#handleBulkLoad will delete records of the files > that were bulk loaded, even if those records are still needed for backups in > other backuproots. > I base this on the observation that the code for tracking which WALs should > be kept around, and backup metadata in general, are all tracked per > individual backuproot. But for the tracking of bulk uploads, this is not the > case. > The result would be data loss (i.e. the bulk loaded data) when taking backups > across different backuproots. > Edit: This is minimal test to reproduce the issue from the master branch: > First, enable backups by adding this to hbase-site.xml > {code:java} > <property> > <name>hbase.backup.enable</name> > <value>true</value> > </property> > <property> > <name>hbase.master.logcleaner.plugins</name> > > <value>org.apache.hadoop.hbase.master.cleaner.TimeToLiveLogCleaner,org.apache.hadoop.hbase.master.cleaner.TimeToLiveProcedureWALCleaner,org.apache.hadoop.hbase.master.cleaner.TimeToLiveMasterLocalStoreWALCleaner,org.apache.hadoop.hbase.backup.master.BackupLogCleaner</value> > </property> > <property> > <name>hbase.procedure.master.classes</name> > > <value>org.apache.hadoop.hbase.backup.master.LogRollMasterProcedureManager</value> > </property> > <property> > <name>hbase.procedure.regionserver.classes</name> > > <value>org.apache.hadoop.hbase.backup.regionserver.LogRollRegionServerProcedureManager</value> > </property> > <property> > <name>hbase.coprocessor.region.classes</name> > <value>org.apache.hadoop.hbase.backup.BackupObserver</value> > </property> > <property> > <name>hbase.fs.tmp.dir</name> > <value>file:/tmp/hbase-tmp</value> > </property> {code} > Next, execute: > {code:java} > # Create an hfile (to local storage) > echo -e 'row1\tvalue1' > /tmp/hfile_data > bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv > -Dimporttsv.columns=HBASE_ROW_KEY,cf:q1 > -Dimporttsv.bulk.output=/tmp/bulk-output table1 /tmp/hfile_data > # Create a table, and 2 full backups (using different roots) of the empty > table > echo "create 'table1', 'cf'" | bin/hbase shell -n > bin/hbase backup create full file:/tmp/backup1 -t table1 > bin/hbase backup create full file:/tmp/backup2 -t table1 > # Bulk load the HFile into the table, scan confirms it is loaded > bin/hbase completebulkload /tmp/bulk-output table1 > echo "scan 'table1'" | bin/hbase shell > # Take an incremental backup for each backup root > bin/hbase backup create incremental file:/tmp/backup1 -t table1 > export BACKUP_ID1=$(bin/hbase backup history | head -n1 | tail -n -1 | grep > -o -P "backup_\d+") > bin/hbase backup create incremental file:/tmp/backup2 -t table1 > export BACKUP_ID2=$(bin/hbase backup history | head -n1 | tail -n -1 | grep > -o -P "backup_\d+") > # Restore root 1: bulk loaded data is present > bin/hbase restore file:/tmp/backup1 $BACKUP_ID1 -t "table1" -m > "table1-backup1" > echo "scan 'table1-backup1'" | bin/hbase shell > # Restore root 2: bulk loaded data is missing > bin/hbase restore file:/tmp/backup2 $BACKUP_ID2 -t "table1" -m > "table1-backup2" > echo "scan 'table1-backup2'" | bin/hbase shell > {code} > Output of the final commands for reference: > {code:java} > hbase:001:0> scan 'table1-backup1' > ROW COLUMN+CELL > > > row1 column=cf:q1, > timestamp=2024-08-02T14:43:24.403, value=value1 > > 1 row(s) > hbase:001:0> scan 'table1-backup2' > ROW COLUMN+CELL > > > 0 row(s) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)