ruanhui created HBASE-26323:
-------------------------------

             Summary: introduce a SnapshotProcedure
                 Key: HBASE-26323
                 URL: https://issues.apache.org/jira/browse/HBASE-26323
             Project: HBase
          Issue Type: New Feature
          Components: proc-v2, snapshots
            Reporter: ruanhui


Currently,snapshot in hbase uses zk as coordinator. It has some limitations, 
 a. Snapshot maybe fails when there are region server crashes.
 b. Snapshot maybe failed when master restarts.
 c. Only one snapshot per table can be taken in a time.
 d. Snapshot verify will be handled by master, which may take long time when 
our table has a large number of regions, for example 10000.

 

Since we have procedure v2 framework now, it is possible to solve the above 
problems. So here is a procedure2-based snapshot implementation. It has some 
goals,
 a. Snapshot can continue when there are region server crashes.
 b. Snapshot can continue when master restarts.
 c. More than one snapshot per table can be taken in a time.
 d. We can use region servers to verify snapshot to accelerate procedure.

 

Here are some details about implementation.
 *SnapshotProcedure*
 SnapshotProcedure is used to take snapshot on a table. It acquires shared 
table lock on the snapshot table and hold the shared lock during suspend and 
yield. 
 *SnapshotRegionProcedure*
 SnapshotRegionProcedure is used to take snapshot on a specific region of the 
snapshot table. It acquires exclusive region lock and releases lock during 
suspend and yield. Before dispatch remote snapshot operations to region server, 
it will check target region in RIT or not. If target region is in RIT, it will 
sleep some time and retry.
 *SnapshotVerifyProcedure*
 SnapshotVerifyProcedure is used to send snapshot verify request to region 
server. If snapshot is corrupted, it will notify parent snapshot to retry. When 
remote region server is crashed, it will choose another online server and retry.

 

I would be very grateful for any advice and guidance. Is anyone interested in 
taking a look?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to