[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

Mike Drob (JIRA) Fri, 01 Dec 2017 10:46:23 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274762#comment-16274762
 ]


Mike Drob commented on HBASE-17852:
-----------------------------------

bq. This is much easier to fix, than concurrent backup sessions support, 
because restore does not access meta table.
Restore doesn't need to update the set of backup files (to remove references to 
no longer referenced files?) If we backup, add data, incremental backup, add 
data, restore to first backup, add data, incremental backup this will all work 
correctly without the Restore having needed to update any backup state? Where 
do I look for how this works?

bq. No, this is client -side operation. Can someone queue hbck?
It's client-side... kind of. We're encouraging folks to automate these 
operations, comparing to hbck isn't the same.

bq. "manual cleanup" is only running the hbase backup repair command. I don't 
feel like that is too onerous and goes back to my original feelings (acceptable 
limitation to get this in the hands of users).
Yea, this is probably ok. I thought we still had a pretty hairy situation here.

bq. Specifically, the client does a checkAndPut to specifics coordinates in the 
backup table and throws an exception when that fails. Remember that backups are 
client driven (per some design review from a long time ago), so queuing is 
tough to reason about (we have no "centralized" execution system to use). At a 
glance, it seems pretty straightforward to add some retry/backoff semantics to 
BackupSystemTable#startBackupExclusiveOperation(). Isn't exactly a "queue", but 
it would ease the pain you allude to.
Yea, retry would be good. File a JIRA?

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-17852
>                 URL: https://issues.apache.org/jira/browse/HBASE-17852
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>             Fix For: 2.0.0
>
>         Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

Reply via email to