[ 
https://issues.apache.org/jira/browse/HBASE-26323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506278#comment-17506278
 ] 

Duo Zhang commented on HBASE-26323:
-----------------------------------

Is it easy to implement for branch-2? If so, just open a new issue to land the 
changes. Otherwise, let's revert the commit for branch-2, and reapply it with a 
new PR.

Thanks.

> Introduce a SnapshotProcedure
> -----------------------------
>
>                 Key: HBASE-26323
>                 URL: https://issues.apache.org/jira/browse/HBASE-26323
>             Project: HBase
>          Issue Type: New Feature
>          Components: proc-v2, snapshots
>            Reporter: ruanhui
>            Assignee: ruanhui
>            Priority: Major
>             Fix For: 2.6.0, 3.0.0-alpha-3
>
>
> Currently,snapshot in hbase uses zk as coordinator. It has some limitations, 
>  a. Snapshot maybe fails when there are region server crashes.
>  b. Snapshot maybe failed when master restarts.
>  c. Only one snapshot per table can be taken in a time.
>  d. Snapshot verify will be handled by master, which may take long time when 
> our table has a large number of regions, for example 10000.
>  
> Since we have procedure v2 framework now, it is possible to solve the above 
> problems. So here is a procedure2-based snapshot implementation. It has some 
> goals,
>  a. Snapshot can continue when there are region server crashes.
>  b. Snapshot can continue when master restarts.
>  c. More than one snapshot per table can be taken in a time.
>  d. We can use region servers to verify snapshot to accelerate procedure.
>  
> Here are some details about implementation.
>  *SnapshotProcedure*
>  SnapshotProcedure is used to take snapshot on a table. It acquires shared 
> table lock on the snapshot table and hold the shared lock during suspend and 
> yield. 
>  *SnapshotRegionProcedure*
>  SnapshotRegionProcedure is used to take snapshot on a specific region of the 
> snapshot table. It acquires exclusive region lock and releases lock during 
> suspend and yield. Before dispatch remote snapshot operations to region 
> server, it will check target region in RIT or not. If target region is in 
> RIT, it will sleep some time and retry.
>  *SnapshotVerifyProcedure*
>  SnapshotVerifyProcedure is used to send snapshot verify request to region 
> server. If snapshot is corrupted, it will notify parent snapshot to retry. 
> When remote region server is crashed, it will choose another online server 
> and retry.
>  
> I would be very grateful for any advice and guidance. Is anyone interested in 
> taking a look?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to