[ 
https://issues.apache.org/jira/browse/HBASE-29003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-29003:
------------------------------
    Fix Version/s:     (was: 4.0.0-alpha-1)

> Proper bulk load tracking
> -------------------------
>
>                 Key: HBASE-29003
>                 URL: https://issues.apache.org/jira/browse/HBASE-29003
>             Project: HBase
>          Issue Type: Bug
>          Components: backup&restore
>    Affects Versions: 2.6.0, 3.0.0, 4.0.0-alpha-1, 2.6.1
>            Reporter: Dieter De Paepe
>            Assignee: Dieter De Paepe
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 2.7.0, 3.0.0-beta-2, 2.6.3
>
>
> As part of the incremental backup mechanism, HBase tracks which files were 
> bulk-loaded (since the last backup).
> This data is stored in the backup:system_bulk table. Entries are added when a 
> bulk load occurs through the BackupObserver co-processor. Entries are deleted 
> when an incremental backup is completed.
> There are 2 flaws in this implementation:
> 1) Performing a full backup should clear the list. Imagine following scenario:
>  * Create a full backup B1 of table T.
>  * Perform a bulk load L1.
>  * Take a full backup B2 of table T.
>  * Take an incremental backup of table T.
>  ** The data stored for this backup will include L1, even though that data is 
> already present due to B2. (This is an inefficiency, not a real error.)
> 2) Performing a table deletion should clear the list of bulk-loaded files. 
> Imagine the following scenario:
>  * Create a full backup of table T.
>  * Perform a bulk-load B1 into T.
>  * Disable, delete and recreate T.
>  * Create an incremental backup (taking a full backup instead is similar to 
> the previous case)
>  ** The backup will contain B1, even though it doesn't belong there.
>  
> Note that this *can also cause backup corruption* after a backup restores 
> (which is how we encountered this issue), which makes this problem less niche 
> than the above scenarios indicate. Backup restore effectively uses bulk loads 
> as well, so users could run into following scenario, where they are trying to 
> restore data corruption:
>  * (create an environment with backup B1 (time t), backup B2 (time t2 > t).
>  * Users notice data corruption, and restore backup B2 after clearing the 
> table
>  * Users notice data corruption is already present, and restore backup B1 
> after clearing the table.
>  * Users find data corruption solved, and resume regular backup cycle from 
> here on.
>  ** Any incremental backup taken will contain the (possible corrupt) data 
> from B2 (due to the restore operation using bulk operations). The backups 
> will be affected until a FULL backup is taken after an incremental backup (so 
> this could span a period of weeks assuming bi-weekly/monthly full backups).
> A minimal reproduction example:
> {code:java}
> echo "create 'table', 'cf'; put 'table', 'row1', 'cf:a', 'value1', 
> 1400523142819" | bin/hbase shell -n
> bin/hbase backup create full file:/tmp/backup -t table -i
> echo "disable 'table'; drop 'table'" | bin/hbase shell -n
> # Empty
> echo "scan 'backup:system_bulk'" | bin/hbase shell -n
> bin/hbase restore file:/tmp/backup backup_1732787972748 -t "table"
> # 1 entry
> echo "scan 'backup:system_bulk'" | bin/hbase shell -n
> echo "disable 'table'; drop 'table'" | bin/hbase shell -n
> # 1 entry
> echo "scan 'backup:system_bulk'" | bin/hbase shell -necho "create 'table', 
> 'cf'; put 'table', 'row1', 'cf:b', 'value2', 1400523142819" | bin/hbase shell 
> -n
> bin/hbase backup create full file:/tmp/backup -t table -i
> echo "scan 'backup:system_bulk'" | bin/hbase shell -n
> echo "put 'table', 'row1', 'cf:b', 'value3', 1400523142819" | bin/hbase shell 
> -n
> bin/hbase backup create incremental file:/tmp/backup -t table -i
> # Emtpy
> echo "scan 'backup:system_bulk'" | bin/hbase shell -n
> echo "disable 'table'; drop 'table'" | bin/hbase shell -n
> bin/hbase restore file:/tmp/backup backup_1732788098586 -t "table"
> # Will contain "value1" (unexpected) and "value3" (expected)
> echo "scan 'table'" | bin/hbase shell -n
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to