[jira] [Commented] (HDFS-7056) Snapshot support for truncate

Konstantin Shvachko (JIRA) Sat, 11 Oct 2014 13:59:49 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168334#comment-14168334
 ]


Konstantin Shvachko commented on HDFS-7056:
-------------------------------------------

Here are some details of the implementation. LMK if it sounds reasonable. I'll 
update the design doc accordingly.

When file is in a snapshot and not on a block boundary then a new last block is 
created for the truncated file, which will hold the truncated data of the 
original file. The original block will remain in the snapshot copy of the file 
unchanged.
There are two main parts in implementing this.
# The truncate recovery on a DN will copy a part of the last block replica into 
a new block file, instead of just truncating the replica as HDFS-3107 does now. 
The truncating logic can be kept as an optimization for the case when file is 
not in a snapshot.
# SnapshotCopy of INodeFile should be extended with a list of blocks, 
referencing blocks that composed the file when the snapshot was taken. When 
there are multiple snapshots of the same file each snapshot copy may have 
different lists of blocks, if file have been truncated and appended between the 
snapshots. As a prt of this change we will need to adjust logic when deleting a 
file and deleting a snapshot because some blocks may or some may not need to be 
invalidated. Here is an overview of operations related to the change.
#* There is no change in createSnapshot operation. The snapshot diffs will be 
introduced when the files are actually modified.
#* Append is not changing, because the file has the list of blocks, which is 
separate from the lists of snapshot copies.
#* File delete should check if a block belongs to a snapshot before sending it 
to the invalidates queue. Only the initial prefix of blocks common with the 
latest snapshot should be retained in BlocksMap. Therefore removeFile should 
find the common prefix of blocks with the latest snapshot and invalidate the 
rest of them.
#* Deleting a snapshot copy of a file should check if a block belongs to 
another snapshot before sending it to the invalidates queue. It is not 
necessary to check all snapshots, only the previous and the next snapshots 
should be verified for blocks common with snapshot being deleted. Therefore 
removing a snapshot one should find a prefix of blocks of the current snapshot 
which are in common with either the previous or the next snapshot. The rest of 
the blocks can be invalidated.
#* I would propose to copy the entire list of blocks to the snapshot copy. This 
will simplify the implementation. We can optimize this later by storing 
references only to the blocks that are different between the current state of 
the file and the snapshot.

> Snapshot support for truncate
> -----------------------------
>
>                 Key: HDFS-7056
>                 URL: https://issues.apache.org/jira/browse/HDFS-7056
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 3.0.0
>            Reporter: Konstantin Shvachko
>
> Implementation of truncate in HDFS-3107 does not allow truncating files which 
> are in a snapshot. It is desirable to be able to truncate and still keep the 
> old file state of the file in the snapshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7056) Snapshot support for truncate

Reply via email to