[ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288238#comment-13288238 ]
Jonathan Hsieh commented on HBASE-6055: --------------------------------------- Jesse, Thanks for answering the questions. A strong +1 for doing the simplest hbase timestamp-based approach first, and then looking into the more complicated version as an option afterwards. Maybe start a sub issue with the point-in-time approach to move discussion there? (I still have questions there, might be better to ask there) The main use case I care about is ability to quickly "snapshot" without downtime and quickly recover it (ideally with no downtime, but possibly with a short downtime window). Although it is a "sloppy snapshot" conceptually it is pretty simple to define and I think the caveats are fairly well undestood. I don't expect something with stronger consistency guarantees than what hbase currently offers but do expect something better (cheaper/faster) than the current closest thing which is a CopyTable. I have a bunch of new questions - some just asking for precision and some for clarification. It might be helpful to define terms in the beginning of the doc so it stays consistent? - Hm.. how do you restore a snapshot from references files if it hasn't been scan/copied yet? Require scan/copy "materialization" of the snapshot first? (which means slower restore, but probably would likely be simplest for a first cut) - Snapshot restore needs to be "transactional" like snapshotting right? - what is "export"? is this taking a snapshot or the materialization or the snapshot restore or something else? - If we restore snapshots to the same hbase instance, in dir structure, you probably need .regioninfo files as well. (contains region startkey/endkey info necessary to reconsistute META later). - Is restoring to a separate instance in scope? If so bulk loads can be expensive -- if regions don't line up there will be a bunch of spliting that happens. Again, keeping the regionsinfos and the snapshot's splits may be worthwhile. - Where do the materialized versions of the snapshot reference files end up? in the snapshot dirs? elsewhere? -- This potentially gets a little trickier with markers as opposed to log rolls. -- The HLog will have edits from regions not relevant to the table's regions. Not a huge problem but maybe an optmization would be that the materialization step will do an "offline hlogsplit/flush" to just keep the data relevent to this table/region? > Snapshots in HBase 0.96 > ----------------------- > > Key: HBASE-6055 > URL: https://issues.apache.org/jira/browse/HBASE-6055 > Project: HBase > Issue Type: New Feature > Components: client, master, regionserver, zookeeper > Reporter: Jesse Yates > Assignee: Jesse Yates > Fix For: 0.96.0 > > Attachments: Snapshots in HBase.docx > > > Continuation of HBASE-50 for the current trunk. Since the implementation has > drastically changed, opening as a new ticket. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira