[
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15638081#comment-15638081
]
Devaraj Das commented on HBASE-14417:
-------------------------------------
A summary of some internal discussions on the high-level flow that doesn't use
ZK...
1. Client updates the hbase:backup table with a set of paths that are to be
bulkloaded (if the tables in question have been fully backed up at least once
in the past)
2. Client performs the bulkload of the data. If the client fails before the
bulkload was fully complete, the cleaner chore in (5) would take care of
cleaning up the unneeded entries from hbase:backup
3. There is a HFileCleaner that makes sure that paths that came about due to
(1) are held until the next incremental backup
4. As part of the incremental backup, the hbase:backup table is updated to
reflect the right location where the earlier bulkloaded file got copied to
5. A chore runs periodically (in the BackupController) that eliminates entries
from the hbase:backup table if the corresponding paths don't exist in the
filesystem until after a configured time period (default, say 24 hours;
bulkload timeout is assumed to be much smaller than this, and hence all
bulkloads that are meant to successfully complete would complete).
Thoughts?
> Incremental backup and bulk loading
> -----------------------------------
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
> Issue Type: New Feature
> Affects Versions: 2.0.0
> Reporter: Vladimir Rodionov
> Assignee: Ted Yu
> Priority: Critical
> Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417.v1.txt, 14417.v11.txt, 14417.v13.txt,
> 14417.v2.txt, 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 14417.v25.txt,
> 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading
> bypasses WALs for obvious reasons, breaking incremental backups. The only way
> to continue backups after bulk loading is to create new full backup of a
> table. This may not be feasible for customers who do bulk loading regularly
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)