[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mallikarjun updated HBASE-25891: -------------------------------- Summary: Remove dependence on storing WAL filenames for backup (was: Remove dependence storing WAL filenames for backup) > Remove dependence on storing WAL filenames for backup > ----------------------------------------------------- > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backup&restore > Affects Versions: 3.0.0-alpha-1 > Reporter: Mallikarjun > Assignee: Mallikarjun > Priority: Major > Fix For: 3.0.0-alpha-2 > > > Context: > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > Also, Every backup (Incremental and Full) performs a log roll just before > taking backup and stores what was the timestamp at which log roll was > performed per regionserver per backup using following format. > > {code:java} > // code placeholder > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 > column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 > column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 > column=meta:rs-log-ts, timestamp=1622887363275, > value=\x00\x00\x01y\xDB\x81\x85 > {code} > > > There are 2 cases for which WAL log refrences stored in `backup:system` and > are being used. > 1. To cleanup WAL's for which backup is already taken using > `BackupLogCleaner` > Since log roll timestamp is stored as part of backup per regionserver. We can > check all previous successfull backup's and then identify which logs are to > be retained and which ones are to be cleaned up as follows > * Identify which are the latest successful backups performed per table. > * Per backup identified above, identify what is the oldest log rolled > timestamp perfomed per regionserver per table. > * All those WAL's which are older than oldest log rolled timestamp perfomed > for any table backed can be removed by `BackupLogCleaner` > > 2. During incremental backup, to check system table if there are any > duplicate WAL's for which backup is taken again. > * Incremental backup already identifies which all WAL's to be backed up > using `rslogts:` mentioned above. > * Additionally it checks `wals:` to ensure no logs are backuped for second > time. And this is redundant and not seen any extra benefit. -- This message was sent by Atlassian Jira (v8.3.4#803005)