[
https://issues.apache.org/jira/browse/HBASE-29003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ray Mattingly resolved HBASE-29003.
-----------------------------------
Fix Version/s: 4.0.0-alpha-1
2.7.0
3.0.0-beta-2
2.6.3
Resolution: Fixed
Thank you [~dieterdp_ng]. Please note the creation of
https://issues.apache.org/jira/browse/HBASE-29271, which I believe we should
tackle as a follow up to this work.
> Proper bulk load tracking
> -------------------------
>
> Key: HBASE-29003
> URL: https://issues.apache.org/jira/browse/HBASE-29003
> Project: HBase
> Issue Type: Bug
> Components: backup&restore
> Affects Versions: 2.6.0, 3.0.0, 4.0.0-alpha-1, 2.6.1
> Reporter: Dieter De Paepe
> Assignee: Dieter De Paepe
> Priority: Critical
> Labels: pull-request-available
> Fix For: 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.6.3
>
>
> As part of the incremental backup mechanism, HBase tracks which files were
> bulk-loaded (since the last backup).
> This data is stored in the backup:system_bulk table. Entries are added when a
> bulk load occurs through the BackupObserver co-processor. Entries are deleted
> when an incremental backup is completed.
> There are 2 flaws in this implementation:
> 1) Performing a full backup should clear the list. Imagine following scenario:
> * Create a full backup B1 of table T.
> * Perform a bulk load L1.
> * Take a full backup B2 of table T.
> * Take an incremental backup of table T.
> ** The data stored for this backup will include L1, even though that data is
> already present due to B2. (This is an inefficiency, not a real error.)
> 2) Performing a table deletion should clear the list of bulk-loaded files.
> Imagine the following scenario:
> * Create a full backup of table T.
> * Perform a bulk-load B1 into T.
> * Disable, delete and recreate T.
> * Create an incremental backup (taking a full backup instead is similar to
> the previous case)
> ** The backup will contain B1, even though it doesn't belong there.
>
> Note that this *can also cause backup corruption* after a backup restores
> (which is how we encountered this issue), which makes this problem less niche
> than the above scenarios indicate. Backup restore effectively uses bulk loads
> as well, so users could run into following scenario, where they are trying to
> restore data corruption:
> * (create an environment with backup B1 (time t), backup B2 (time t2 > t).
> * Users notice data corruption, and restore backup B2 after clearing the
> table
> * Users notice data corruption is already present, and restore backup B1
> after clearing the table.
> * Users find data corruption solved, and resume regular backup cycle from
> here on.
> ** Any incremental backup taken will contain the (possible corrupt) data
> from B2 (due to the restore operation using bulk operations). The backups
> will be affected until a FULL backup is taken after an incremental backup (so
> this could span a period of weeks assuming bi-weekly/monthly full backups).
> A minimal reproduction example:
> {code:java}
> echo "create 'table', 'cf'; put 'table', 'row1', 'cf:a', 'value1',
> 1400523142819" | bin/hbase shell -n
> bin/hbase backup create full file:/tmp/backup -t table -i
> echo "disable 'table'; drop 'table'" | bin/hbase shell -n
> # Empty
> echo "scan 'backup:system_bulk'" | bin/hbase shell -n
> bin/hbase restore file:/tmp/backup backup_1732787972748 -t "table"
> # 1 entry
> echo "scan 'backup:system_bulk'" | bin/hbase shell -n
> echo "disable 'table'; drop 'table'" | bin/hbase shell -n
> # 1 entry
> echo "scan 'backup:system_bulk'" | bin/hbase shell -necho "create 'table',
> 'cf'; put 'table', 'row1', 'cf:b', 'value2', 1400523142819" | bin/hbase shell
> -n
> bin/hbase backup create full file:/tmp/backup -t table -i
> echo "scan 'backup:system_bulk'" | bin/hbase shell -n
> echo "put 'table', 'row1', 'cf:b', 'value3', 1400523142819" | bin/hbase shell
> -n
> bin/hbase backup create incremental file:/tmp/backup -t table -i
> # Emtpy
> echo "scan 'backup:system_bulk'" | bin/hbase shell -n
> echo "disable 'table'; drop 'table'" | bin/hbase shell -n
> bin/hbase restore file:/tmp/backup backup_1732788098586 -t "table"
> # Will contain "value1" (unexpected) and "value3" (expected)
> echo "scan 'table'" | bin/hbase shell -n
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)