Inline

On Mon, Nov 30, 2020, 9:56 PM Andrew Purtell <andrew.purt...@gmail.com>
wrote:

> If we have unmaintained and unreleased, and unreleasable, code sitting in
> the tree it should be moved out or deleted.
>
> > We could use replication to backup the WALs, and use Snapshot and
> ExportSnapshot to backup the HFiles.
>
> This is essentially how we have implemented backup at the $dayjob.
>
> However, while this will move the data, verifying the integrity of the
> backup remains an exercise for the user. (I don’t know the hbase-backup
> solution well but imagine if being in tree it does some trick to support
> verification?) For verification in our own solution we enable some schema
> options and manage compactions with a coprocessor such that within a
> sliding window of time (our backup SLA) it is possible to do digest based
> comparisons of each site-pair’s data, including deleted data and
> tombstones. Digest site A, then digest site B, only the data in the window,
> then compare, and if good we have proven equivalence and can move the far
> (older) edge of the window leftward, so normal compaction activity can
> resume beyond the new right side boundary. This is site and somewhat
> application specific, and the age of the design is showing for various
> reasons that would make this email long if discussed in depth. I only
> wanted to talk about this briefly to illustrate that verification is a
> harder problem. Data movement is only the first aspect of a complete backup
> solution.
>
> That said, for years we have been considering a move away from snapshot
> based backup to a WAL based backup model, or more recently one based on
> publication and consumption of change data capture. HBase replication has
> some rough edges if used as a source of change capture data. I have some
> thoughts on this: essentially, what could be a design doc for the causal
> replication JIRA. If we had an officially supported change capture device
> (could be in operator tools) then backup and restore could be implemented
> as tooling built on a foundation of retention and replay of change stream
> data (also in operator tools).
>

With change data capture backing backup will enforce certain requirements
which might be unnecessary

* There is a need for online system to capture the change always unlike
point in time based backup solutions
* There is a continuous data outflow from hbase cluster which reduces the
flexibility with respect to scaling the cluster and backup systems
* Not every deployments / tenant with require very low recover point
objectives and off peak backups can't be configured


Current solution seems elegant to me. Where wals are retained until backup
is performed and wal references are cleared with respect to backup.


> > On Nov 28, 2020, at 7:48 PM, 张铎 <palomino...@gmail.com> wrote:
> >
> > I'm afraid it is not easy to be moved to hbase-operator-tools.
> >
> > You can see the code under the org.apache.hadoop.hbase.backup.master
> > package, we need to set up log cleaner at master side, and also,
> > the LogRollMasterProcedureManager class needs MasterServices(not
> > MasterRpcServices), which means it must be used in the same process with
> > HMaster.
> >
> > And I'm OK with purging this feature, especially if there is no developer
> > who wants to maintain it.
> >
> > For me, I suspect that the backup feature could be done more separately
> > with the main cluster. We could use replication to backup the WALs, and
> use
> > Snapshot and ExportSnapshot to backup the HFiles. The feature could be
> done
> > as a separated project.
> >
> > Thanks.
> >
> > Stack <st...@duboce.net> 于2020年11月20日周五 上午10:18写道:
> >
> >> It strikes me as work that has been abandoned with no supporting
> developer.
> >> It has had no improvement and few commits other than adjustment because
> a
> >> backing dependency has changed since original contribution. It has not
> been
> >> included in a release so has no users as yet. Does anyone use it or want
> >> it? If not, I suggest we remove it.  I could file an issue for it to be
> >> added to hbase-operator-tools for some gallant dev to pick up if they
> >> wanted to use this backup work? (I could help w/ the migration).
> >>
> >> What do others think?
> >>
> >> S
> >>
>

Reply via email to