[ 
https://issues.apache.org/jira/browse/HDFS-4529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586655#comment-13586655
 ] 

Colin Patrick McCabe commented on HDFS-4529:
--------------------------------------------

I just read the discussion on HDFS-4523.  I agree with atm.  Snapshots should 
be read-only.  That means they should not be modified after they are created.

If the sysadmin does not want to snapshot "temporary files", then he can create 
them under a non-snapshottable directory.  Perhaps most system administrators 
will configure /tmp as a non-snapshottable directory.

There is no additional complexity because we already have to deal with the 
issue of two INodes sharing the same block(s).  Any two versions of the same 
file INode, snapshotted at different times, will share the same blocks, 
regardless of how we implement the concat operation.
                
> Decide the semantic of concat with snapshots
> --------------------------------------------
>
>                 Key: HDFS-4529
>                 URL: https://issues.apache.org/jira/browse/HDFS-4529
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>
> The use case of concat is for copying large files across clusters using the 
> following steps.
> - Step 1: The blocks of a file in the source cluster are copied in parallel 
> to transient files in the destination cluster.
> - Step 2: Then the transient files in the destination cluster are 
> concatenated in order to obtain the original file.
> If a snapshot is taken in the destination cluster before Step 2, some 
> transient files may be captured in the snapshot.  Then what should happen?  
> The following are some alternatives:
> * (1) fail concat and keep the transient files in the snapshots;
> * (2) allow concat and keep the transient files in the snapshots;
> * (3) allow concat but remove the transient files from all snapshots.
> All solutions above are not perfect.  Here are their drawbacks:
> For (1) and (2), the transient files will remain in the system until the 
> snapshots are deleted.  It is inefficient to the system since the files are 
> known to be transient.  (1) may be able to force user to create files under 
> some non-snapshottable tmp directory in the first place.  However, it 
> complicates the user applications and the existing applications may need to 
> be updated for the new policy.  Also, non-snapshottable directory may not 
> exists since admin may set the system root directory to be snapshottable.  
> For (2), the problem seems to break the Read-Only snapshot contract - some 
> files appear in a snapshot may disappear later on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to