Attached as image. Please let me know if it is availabe now. --- Mallikarjun
On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey <bus...@apache.org> wrote: > Hi! > > Thanks for the write up. unfortunately, your image for the existing > design didn't come through. Could you post it to some host and link it > here? > > On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun <mallik.v.ar...@gmail.com> > wrote: > > > > Existing Design: > > > > > > > > Problem 1: > > > > With this design, Incremental and Full backup can't be run in parallel > and leading to degraded RPO's in case Full backup is of longer duration esp > for large tables. > > > > Example: > > Expectation: Say you have a big table with 10 TB and your RPO is 60 > minutes and you are allowed to ship the remote backup with 800 Mbps. And > you are allowed to take Full Backups once in a week and rest of them should > be incremental backups > > > > Shortcoming: With the above design, one can't run parallel backups and > whenever there is a full backup running (which takes roughly 25 hours) you > are not allowed to take incremental backups and that would be a breach in > your RPO. > > > > Proposed Solution: Barring some critical sections such as modifying > state of the backup on meta tables, others can happen parallelly. Leaving > incremental backups to be able to run based on older successful full / > incremental backups and completion time of backup should be used instead of > start time of backup for ordering. I have not worked on the full redesign, > and will be doing so if this proposal seems acceptable for the community. > > > > Problem 2: > > > > With one backup at a time, it fails easily for a multi-tenant system. > This poses following problems > > > > Admins will not be able to achieve required RPO's for their tables > because of dependence on other tenants present in the system. As one tenant > doesn't have control over other tenants' table sizes and hence the duration > of the backup > > Management overhead of setting up a right sequence to achieve required > RPO's for different tenants could be very hard. > > > > Proposed Solution: Same as previous proposal > > > > Problem 3: > > > > Incremental backup works on WAL's and > org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's > are never cleaned up until the next backup (Full / Incremental) is taken. > This poses following problem > > > > WAL's can grow unbounded in case there are transient problems like > backup site facing issues or anything else until next backup scheduled goes > successful > > > > Proposed Solution: I can't think of anything better, but I see this can > be a potential problem. Also, one can force full backup if required WAL > files are missing for whatever other reasons not necessarily mentioned > above. > > > > --- > > Mallikarjun >