[ 
https://issues.apache.org/jira/browse/HBASE-25784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mallikarjun updated HBASE-25784:
--------------------------------
    Summary: Support for Parallel Backups enabling multi tenancy with rsgroups  
(was: Support for Parallel Backups enabling multi tenancy)

> Support for Parallel Backups enabling multi tenancy with rsgroups
> -----------------------------------------------------------------
>
>                 Key: HBASE-25784
>                 URL: https://issues.apache.org/jira/browse/HBASE-25784
>             Project: HBase
>          Issue Type: Umbrella
>          Components: backup&restore
>            Reporter: Mallikarjun
>            Assignee: Mallikarjun
>            Priority: Major
>              Labels: backup
>
> *Problem 1:* 
> With this design, Incremental and Full backup can't be run in parallel and 
> leading to degraded RPO's in case Full backup is of longer duration esp for 
> large tables.
>  
> Example: 
> Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes 
> and you are allowed to ship the remote backup with 800 Mbps. And you are 
> allowed to take Full Backups once in a week and rest of them should be 
> incremental backups
>  
> Shortcoming: With the above design, one can't run parallel backups and 
> whenever there is a full backup running (which takes roughly 25 hours) you 
> are not allowed to take incremental backups and that would be a breach in 
> your RPO. 
>  
> *Proposed Solution:* Barring some critical sections such as modifying state 
> of the backup on meta tables, others can happen parallelly. Leaving 
> incremental backups to be able to run based on older successful full / 
> incremental backups and completion time of backup should be used instead of 
> start time of backup for ordering. I have not worked on the full redesign, 
> and will be doing so if this proposal seems acceptable for the community.
>  
> *Problem 2:*
> With one backup at a time, it fails easily for a multi-tenant system. This 
> poses following problems
>  * Admins will not be able to achieve required RPO's for their tables because 
> of dependence on other tenants present in the system. As one tenant doesn't 
> have control over other tenants' table sizes and hence the duration of the 
> backup
>  * Management overhead of setting up a right sequence to achieve required 
> RPO's for different tenants could be very hard.
> *Proposed Solution:* Same as previous proposal
>  
> *Problem 3:* 
> Incremental backup works on WAL's and 
> org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's are 
> never cleaned up until the next backup (Full / Incremental) is taken. This 
> poses following problem
>  * WAL's can grow unbounded in case there are transient problems like backup 
> site facing issues or anything else until next backup scheduled goes 
> successful
> *Proposed Solution:* I can't think of anything better, but I see this can be 
> a potential problem. Also, one can force full backup if required WAL files 
> are missing for whatever other reasons not necessarily mentioned above. 
>  
> Proposed Design.
> !https://i.ibb.co/vVV1BTs/Backup-Activity-Diagram.png|width=322,height=414!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to