ruanhui created HBASE-26323:
-------------------------------
Summary: introduce a SnapshotProcedure
Key: HBASE-26323
URL: https://issues.apache.org/jira/browse/HBASE-26323
Project: HBase
Issue Type: New Feature
Components: proc-v2, snapshots
Reporter: ruanhui
Currently,snapshot in hbase uses zk as coordinator. It has some limitations,
a. Snapshot maybe fails when there are region server crashes.
b. Snapshot maybe failed when master restarts.
c. Only one snapshot per table can be taken in a time.
d. Snapshot verify will be handled by master, which may take long time when
our table has a large number of regions, for example 10000.
Since we have procedure v2 framework now, it is possible to solve the above
problems. So here is a procedure2-based snapshot implementation. It has some
goals,
a. Snapshot can continue when there are region server crashes.
b. Snapshot can continue when master restarts.
c. More than one snapshot per table can be taken in a time.
d. We can use region servers to verify snapshot to accelerate procedure.
Here are some details about implementation.
*SnapshotProcedure*
SnapshotProcedure is used to take snapshot on a table. It acquires shared
table lock on the snapshot table and hold the shared lock during suspend and
yield.
*SnapshotRegionProcedure*
SnapshotRegionProcedure is used to take snapshot on a specific region of the
snapshot table. It acquires exclusive region lock and releases lock during
suspend and yield. Before dispatch remote snapshot operations to region server,
it will check target region in RIT or not. If target region is in RIT, it will
sleep some time and retry.
*SnapshotVerifyProcedure*
SnapshotVerifyProcedure is used to send snapshot verify request to region
server. If snapshot is corrupted, it will notify parent snapshot to retry. When
remote region server is crashed, it will choose another online server and retry.
I would be very grateful for any advice and guidance. Is anyone interested in
taking a look?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)