[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15638081#comment-15638081
 ] 

Devaraj Das commented on HBASE-14417:
-------------------------------------

A summary of some internal discussions on the high-level flow that doesn't use 
ZK...
1. Client updates the hbase:backup table with a set of paths that are to be 
bulkloaded (if the tables in question have been fully backed up at least once 
in the past)
2. Client performs the bulkload of the data. If the client fails before the 
bulkload was fully complete, the cleaner chore in (5) would take care of 
cleaning up the unneeded entries from hbase:backup
3. There is a HFileCleaner that makes sure that paths that came about due to 
(1) are held until the next incremental backup
4. As part of the incremental backup, the hbase:backup table is updated to 
reflect the right location where the earlier bulkloaded file got copied to
5. A chore runs periodically (in the BackupController) that eliminates entries 
from the hbase:backup table if the corresponding paths don't exist in the 
filesystem until after a configured time period (default, say 24 hours; 
bulkload timeout is assumed to be much smaller than this, and hence all 
bulkloads that are meant to successfully complete would complete).
Thoughts?

> Incremental backup and bulk loading
> -----------------------------------
>
>                 Key: HBASE-14417
>                 URL: https://issues.apache.org/jira/browse/HBASE-14417
>             Project: HBase
>          Issue Type: New Feature
>    Affects Versions: 2.0.0
>            Reporter: Vladimir Rodionov
>            Assignee: Ted Yu
>            Priority: Critical
>              Labels: backup
>             Fix For: 2.0.0
>
>         Attachments: 14417.v1.txt, 14417.v11.txt, 14417.v13.txt, 
> 14417.v2.txt, 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 14417.v25.txt, 
> 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to