[ 
https://issues.apache.org/jira/browse/HBASE-26323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423668#comment-17423668
 ] 

ruanhui commented on HBASE-26323:
---------------------------------

sorry. I don't know why the pull request didn't link, can I just put it here ? 
https://github.com/apache/hbase/pull/3716

> introduce a SnapshotProcedure
> -----------------------------
>
>                 Key: HBASE-26323
>                 URL: https://issues.apache.org/jira/browse/HBASE-26323
>             Project: HBase
>          Issue Type: New Feature
>          Components: proc-v2, snapshots
>            Reporter: ruanhui
>            Assignee: ruanhui
>            Priority: Minor
>
> Currently,snapshot in hbase uses zk as coordinator. It has some limitations, 
>  a. Snapshot maybe fails when there are region server crashes.
>  b. Snapshot maybe failed when master restarts.
>  c. Only one snapshot per table can be taken in a time.
>  d. Snapshot verify will be handled by master, which may take long time when 
> our table has a large number of regions, for example 10000.
>  
> Since we have procedure v2 framework now, it is possible to solve the above 
> problems. So here is a procedure2-based snapshot implementation. It has some 
> goals,
>  a. Snapshot can continue when there are region server crashes.
>  b. Snapshot can continue when master restarts.
>  c. More than one snapshot per table can be taken in a time.
>  d. We can use region servers to verify snapshot to accelerate procedure.
>  
> Here are some details about implementation.
>  *SnapshotProcedure*
>  SnapshotProcedure is used to take snapshot on a table. It acquires shared 
> table lock on the snapshot table and hold the shared lock during suspend and 
> yield. 
>  *SnapshotRegionProcedure*
>  SnapshotRegionProcedure is used to take snapshot on a specific region of the 
> snapshot table. It acquires exclusive region lock and releases lock during 
> suspend and yield. Before dispatch remote snapshot operations to region 
> server, it will check target region in RIT or not. If target region is in 
> RIT, it will sleep some time and retry.
>  *SnapshotVerifyProcedure*
>  SnapshotVerifyProcedure is used to send snapshot verify request to region 
> server. If snapshot is corrupted, it will notify parent snapshot to retry. 
> When remote region server is crashed, it will choose another online server 
> and retry.
>  
> I would be very grateful for any advice and guidance. Is anyone interested in 
> taking a look?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to