[ https://issues.apache.org/jira/browse/HBASE-26323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423668#comment-17423668 ]
ruanhui commented on HBASE-26323: --------------------------------- sorry. I don't know why the pull request didn't link, can I just put it here ? https://github.com/apache/hbase/pull/3716 > introduce a SnapshotProcedure > ----------------------------- > > Key: HBASE-26323 > URL: https://issues.apache.org/jira/browse/HBASE-26323 > Project: HBase > Issue Type: New Feature > Components: proc-v2, snapshots > Reporter: ruanhui > Assignee: ruanhui > Priority: Minor > > Currently,snapshot in hbase uses zk as coordinator. It has some limitations, > a. Snapshot maybe fails when there are region server crashes. > b. Snapshot maybe failed when master restarts. > c. Only one snapshot per table can be taken in a time. > d. Snapshot verify will be handled by master, which may take long time when > our table has a large number of regions, for example 10000. > > Since we have procedure v2 framework now, it is possible to solve the above > problems. So here is a procedure2-based snapshot implementation. It has some > goals, > a. Snapshot can continue when there are region server crashes. > b. Snapshot can continue when master restarts. > c. More than one snapshot per table can be taken in a time. > d. We can use region servers to verify snapshot to accelerate procedure. > > Here are some details about implementation. > *SnapshotProcedure* > SnapshotProcedure is used to take snapshot on a table. It acquires shared > table lock on the snapshot table and hold the shared lock during suspend and > yield. > *SnapshotRegionProcedure* > SnapshotRegionProcedure is used to take snapshot on a specific region of the > snapshot table. It acquires exclusive region lock and releases lock during > suspend and yield. Before dispatch remote snapshot operations to region > server, it will check target region in RIT or not. If target region is in > RIT, it will sleep some time and retry. > *SnapshotVerifyProcedure* > SnapshotVerifyProcedure is used to send snapshot verify request to region > server. If snapshot is corrupted, it will notify parent snapshot to retry. > When remote region server is crashed, it will choose another online server > and retry. > > I would be very grateful for any advice and guidance. Is anyone interested in > taking a look? -- This message was sent by Atlassian Jira (v8.3.4#803005)