[ 
https://issues.apache.org/jira/browse/HADOOP-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556798#action_12556798
 ] 

Allen Wittenauer commented on HADOOP-2541:
------------------------------------------

There are multiple methods that are generally in use amongst storage folks. 

Some examples:

On NetApps a .snapshot directory exists that contains timestamped points to the 
snapshots.  These snapshots are controlled by the admin via cron.  The 
directory structure 'mirrors' the file system it is capturing.  Thus you have:

$ find . -name startvnc
./bin/startvnc
./.snapshot/hourly.0/bin/startvnc
./.snapshot/hourly.1/bin/startvnc
./.snapshot/nightly.0/bin/startvnc
./.snapshot/hourly.2/bin/startvnc
./.snapshot/hourly.3/bin/startvnc
./.snapshot/hourly.4/bin/startvnc
./.snapshot/hourly.5/bin/startvnc
./.snapshot/nightly.1/bin/startvnc


While it is tremendous benefit that users can easily get to their files, as 
noticed in this example, commands like find will also hit the snapshots.... 
which can have some unexpected side effects.
NetApps do have a way to 'hide' the .snapshot directory such that the file 
system does not display it when doing an ls.  In the case of HDFS, a similar 
system could be used in place of the trash bin. 

Another popular option is to just store the snapshot 'offline' and only bring 
it online when you need to reference a file from it.  This is how ZFS works.  
You take a snapshot, and it is stored in the storage pool until you need it.  
When you do need it you, you mount it up and access your files.  This is less 
friendly towards users, but does allow admins the protection that snapshotting 
provides.

It is worth pointing out that a) both systems use COW and b) both systems are 
'lossy'.  You can only recover a file that was present when the snapshot was 
taken.  Any files created and removed after the time of the last snapshot are 
gone.

> Online Snapshotting Capability
> ------------------------------
>
>                 Key: HADOOP-2541
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2541
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.15.1
>            Reporter: Allen Wittenauer
>
> Modern file systems have the ability to create snapshots of the running file 
> system without having to unmount.  HDFS should be offer similar capabilities 
> to allow admins the ability to perform "online" backups of the file system 
> such that files can be recovered after deletions or, for extra bonus points, 
> catastrophic failures.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to