[ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319165#comment-14319165 ]
Andrew Purtell commented on HBASE-13031: ---------------------------------------- Carrying over the counterpoint from [~vrodionov] from the mailing list discussion for reference here: {quote} I do not think there is a need for new API. Take a look at TableSnapshotInputFormat which you can customize to to work with key ranges. It allows M/R over snapshots. You make a snapshot of a full table, then you run first batch of keys in M/R job then you delete snapshot and create new one ... repeat until last key range. You will need to control major compaction during this migration. How to output the data - is your choice: TableOutputFormat or HFileOutputFormat2 {quote} > Ability to snapshot based on a key range > ---------------------------------------- > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Improvement > Reporter: churro morales > Assignee: churro morales > Fix For: 2.0.0, 0.94.26, 1.1.0, 0.98.11 > > > Posted on the mailing list and seems like some people are interested. A > little background for everyone. > We have a very large table, we would like to snapshot and transfer the data > to another cluster (compressed data is always better to ship). Our problem > lies in the fact it could take many weeks to transfer all of the data and > during that time with major compactions, the data stored in dfs has the > potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key > range. > Ideally I feel the approach is that the user would specify a start and stop > key, those would be associated with a region boundary. If between the time > the user submits the request and the snapshot is taken the boundaries change > (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the > request was submitted and the regions locked, the snapshot could simply fail > and the user would try again, instead of potentially giving the user more / > less than what they had anticipated. I was planning on storing the start / > stop key in the SnapshotDescription and from there it looks pretty straight > forward where we just have to change the verifier code to accommodate the key > ranges. > If this design sounds good to anyone, or if I am overlooking anything please > let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)