[ 
https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382994#comment-17382994
 ] 

Mallikarjun commented on HBASE-25891:
-------------------------------------

This will be a much needed enhancement in terms of being able to extend this 
functionality to rsgroups and things to come for hbase backup restore. Esp 
before hbase 3.0 goes live, as this changes information stored in meta. Would 
like someone to spend time in reviewing this. [~anoop.hbase] [~zhangduo]

> Remove dependence storing WAL filenames for backup
> --------------------------------------------------
>
>                 Key: HBASE-25891
>                 URL: https://issues.apache.org/jira/browse/HBASE-25891
>             Project: HBase
>          Issue Type: Improvement
>          Components: backup&restore
>    Affects Versions: 3.0.0-alpha-1
>            Reporter: Mallikarjun
>            Assignee: Mallikarjun
>            Priority: Major
>             Fix For: 3.0.0-alpha-2
>
>
> Context:
> Currently WAL logs are stored in `backup:system` meta table 
> {code:java}
> // code placeholder
> wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, 
> timestamp=1622003479895, value=backup_1622003358258 
> wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, 
> timestamp=1622003479895, 
> value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175
>  wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, 
> timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 
> wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, 
> timestamp=1622003479895, value=backup_1622003358258 
> wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, 
> timestamp=1622003479895, 
> value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280
>  wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, 
> timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 
> {code}
> Also, Every backup (Incremental and Full) performs a log roll just before 
> taking backup and stores what was the timestamp at which log roll was 
> performed per regionserver per backup using following format. 
>  
> {code:java}
> // code placeholder
> rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 
> column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar
> rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 
> column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP
> rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 
> column=meta:rs-log-ts, timestamp=1622887363275, 
> value=\x00\x00\x01y\xDB\x81\x85
> {code}
>  
>  
> There are 2 cases for which WAL log refrences stored in `backup:system` and 
> are being used. 
> 1. To cleanup WAL's for which backup is already taken using 
> `BackupLogCleaner` 
> Since log roll timestamp is stored as part of backup per regionserver. We can 
> check all previous successfull backup's and then identify which logs are to 
> be retained and which ones are to be cleaned up as follows
>  * Identify which are the latest successful backups performed per table.
>  * Per backup identified above, identify what is the oldest log rolled 
> timestamp perfomed per regionserver per table. 
>  * All those WAL's which are older than oldest log rolled timestamp perfomed 
> for any table backed can be removed by `BackupLogCleaner` 
>  
> 2. During incremental backup, to check system table if there are any 
> duplicate WAL's for which backup is taken again. 
>  * Incremental backup already identifies which all WAL's to be backed up 
> using `rslogts:` mentioned above.
>  * Additionally it checks `wals:` to ensure no logs are backuped for second 
> time. And this is redundant and not seen any extra benefit. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to