[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Jesse Yates (JIRA) Mon, 04 Jun 2012 10:34:25 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288715#comment-13288715
 ]


Jesse Yates commented on HBASE-6055:
------------------------------------

A couple of definitions going forward:
 - materialization: the end result of taking a single snapshot, on the same 
cluster. It ends up in in the .snapshot/[snapshot_name] directory
 - export: sending the snapshot to another cluster or another part of the same 
cluster
 - restore: taking an exported snapshot and converting the snapshot into an 
active table.

{quote}
Hm.. how do you restore a snapshot from references files if it hasn't been 
scan/copied yet? Require scan/copy "materialization" of the snapshot first? 
(which means slower restore, but probably would likely be simplest for a first 
cut)
{quote}

Right now, you would do a M/R job to distcp over the files to another cluster 
or a backup part of your cluster. Since we are just storing references, the 
actual file copying will be necessary. This will be helped by using the actual 
"Reference" class for the HFiles (and currently being (mis)used for the WALs, 
but I don't think we actually need to keep the WALs  - I'll comment in the 
timestamp ticket). Since they are just reference files, you could just use the 
regular HFile reader to load them into another table.

{quote}
        Snapshot restore needs to be "transactional" like snapshotting right?
{quote}

Yeah, I guess. I don't really see this as a problem - just keep it to one 
restore at a time. But it would be all or nothing to get a table online.

{quote}
what is "export"? is this taking a snapshot or the materialization or the 
snapshot restore or something else?
{quote}

Export is taking a snapshot from the .snapshot/ directory and possibly having a 
special snapshot distcp to somewhere. I would consider materialization as 
taking the exported snapshot and then 'hooking it back up' to another cluster 
(or the same) as a new table. You could throw materialization of the exported 
snapshot, but they are in fact distinct.
{quote}
If we restore snapshots to the same hbase instance, in dir structure, you 
probably need .regioninfo files as well. (contains region startkey/endkey info 
necessary to reconsistute META later).
{quote}

+1 I'll make sure that gets in
{quote}
Is restoring to a separate instance in scope? If so bulk loads can be expensive 
– if regions don't line up there will be a bunch of spliting that happens. 
Again, keeping the regionsinfos and the snapshot's splits may be worthwhile.
{quote}

I'd say restore is part of this. Should be solved by having the region info. -1 
for split/compact storms.

{quote}
Where do the materialized versions of the snapshot reference files end up? in 
the snapshot dirs? elsewhere?
{quote}

What do you mean materialized? After taking  snapshot, where do the snapshot 
files end up? In the .snapshot directory. See my earlier comments on the 
structure.

{quote}
This potentially gets a little trickier with markers as opposed to log rolls.
{quote}
If we do a log roll, its probably going to take a bit longer. Also, its not 
going to be applicable to the timestamp approach, since log rolling will 
necessitate doing some kind of locking, which we should avoid, where the 
markers will be much faster.

{quote}
The HLog will have edits from regions not relevant to the table's regions. Not 
a huge problem but maybe an optmization would be that the materialization step 
will do an "offline hlogsplit/flush" to just keep the data relevent to this 
table/region?
{quote}

+1, assuming we need the HLogs. I think there is a minimally impactful way to 
avoid this altogether.


                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has 
> drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Reply via email to