[ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288715#comment-13288715 ]
Jesse Yates commented on HBASE-6055: ------------------------------------ A couple of definitions going forward: - materialization: the end result of taking a single snapshot, on the same cluster. It ends up in in the .snapshot/[snapshot_name] directory - export: sending the snapshot to another cluster or another part of the same cluster - restore: taking an exported snapshot and converting the snapshot into an active table. {quote} Hm.. how do you restore a snapshot from references files if it hasn't been scan/copied yet? Require scan/copy "materialization" of the snapshot first? (which means slower restore, but probably would likely be simplest for a first cut) {quote} Right now, you would do a M/R job to distcp over the files to another cluster or a backup part of your cluster. Since we are just storing references, the actual file copying will be necessary. This will be helped by using the actual "Reference" class for the HFiles (and currently being (mis)used for the WALs, but I don't think we actually need to keep the WALs - I'll comment in the timestamp ticket). Since they are just reference files, you could just use the regular HFile reader to load them into another table. {quote} Snapshot restore needs to be "transactional" like snapshotting right? {quote} Yeah, I guess. I don't really see this as a problem - just keep it to one restore at a time. But it would be all or nothing to get a table online. {quote} what is "export"? is this taking a snapshot or the materialization or the snapshot restore or something else? {quote} Export is taking a snapshot from the .snapshot/ directory and possibly having a special snapshot distcp to somewhere. I would consider materialization as taking the exported snapshot and then 'hooking it back up' to another cluster (or the same) as a new table. You could throw materialization of the exported snapshot, but they are in fact distinct. {quote} If we restore snapshots to the same hbase instance, in dir structure, you probably need .regioninfo files as well. (contains region startkey/endkey info necessary to reconsistute META later). {quote} +1 I'll make sure that gets in {quote} Is restoring to a separate instance in scope? If so bulk loads can be expensive – if regions don't line up there will be a bunch of spliting that happens. Again, keeping the regionsinfos and the snapshot's splits may be worthwhile. {quote} I'd say restore is part of this. Should be solved by having the region info. -1 for split/compact storms. {quote} Where do the materialized versions of the snapshot reference files end up? in the snapshot dirs? elsewhere? {quote} What do you mean materialized? After taking snapshot, where do the snapshot files end up? In the .snapshot directory. See my earlier comments on the structure. {quote} This potentially gets a little trickier with markers as opposed to log rolls. {quote} If we do a log roll, its probably going to take a bit longer. Also, its not going to be applicable to the timestamp approach, since log rolling will necessitate doing some kind of locking, which we should avoid, where the markers will be much faster. {quote} The HLog will have edits from regions not relevant to the table's regions. Not a huge problem but maybe an optmization would be that the materialization step will do an "offline hlogsplit/flush" to just keep the data relevent to this table/region? {quote} +1, assuming we need the HLogs. I think there is a minimally impactful way to avoid this altogether. > Snapshots in HBase 0.96 > ----------------------- > > Key: HBASE-6055 > URL: https://issues.apache.org/jira/browse/HBASE-6055 > Project: HBase > Issue Type: New Feature > Components: client, master, regionserver, zookeeper > Reporter: Jesse Yates > Assignee: Jesse Yates > Fix For: 0.96.0 > > Attachments: Snapshots in HBase.docx > > > Continuation of HBASE-50 for the current trunk. Since the implementation has > drastically changed, opening as a new ticket. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira