[ 
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580992#action_12580992
 ] 

stack commented on HBASE-50:
----------------------------

Other ideas.  A command on the master would send a signal to all regionservers. 
 They would dump their in-memory content and tell the master when done.  They 
would then block until they got the all-clear from the master and take reads 
but no updates.   Master would then do a listing of the current content of the 
filesystem and dump a file listing of all files.  The all-files-listing could 
then be used as input for a discp job.  Master would wait until it gets a 
prompt from the admin that the distcp was complete or it would give the 
all-clear after the dump of the catalog of all files and instead of file delete 
on compaction or region delete, instead, files would get a '.deleted' suffix.  
The running distcp, if it couldn't find the original file would look for the 
same file with the '.deleted' suffix and copy that instead.

> Snapshot of table
> -----------------
>
>                 Key: HBASE-50
>                 URL: https://issues.apache.org/jira/browse/HBASE-50
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Billy Pearson
>            Priority: Minor
>
> Havening an option to take a snapshot of a table would be vary useful in 
> production.
> What I would like to see this option do is do a merge of all the data into 
> one or more files stored in the same folder on the dfs. This way we could 
> save data in case of a software bug in hadoop or user code. 
> The other advantage would be to be able to export a table to multi locations. 
> Say I had a read_only table that must be online. I could take a snapshot of 
> it when needed and export it to a separate data center and have it loaded 
> there and then i would have it online at multi data centers for load 
> balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect 
> from failed servers, but this does not protect use from software bugs that 
> might delete or alter data in ways we did not plan. We should have a way we 
> can roll back a dataset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to