[
https://issues.apache.org/jira/browse/HADOOP-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530219
]
dhruba borthakur commented on HADOOP-1700:
------------------------------------------
{noformat}
Here is a slightly more detailed description of a proposal to support appending
writes to files.
There is an DataGenerationStamp associated with every block. It is persisted by
the namenode and by the datanode(s).
*The Writer*
1. The client requests the namenode for a new block. The namenode generates a
new blockid and associates a DataGenerationStamp
of 0 with this block. It persists this blockId in the inode and the
DataGenerationStamp in the BlocksMap. The Namenode
returns the blockid, DataGenerationStamp and block locations to the Client.
2. The Client sends the blockid and DataGenerationStamp to all the datanodes in
the pipeline. The Datanodes record the blockid and the
DataGenerationStamp persistently and returns. In case of error, go to Step
3a.
3. The Client then starts streaming data to the Datanodes in the pipeline.
The Client notices if any datanodes in the pipeline encountered an error.
In this case:
3a. The Client removes the bad datanode from the pipeline.
3b. The Client requests a new DataGenerationStamp for this block from the
NameNode. The Client also informs the Namenode
of the bad Datanode.
3c. The Namenode removes the bad datanode as a valid block location for
this block. The Namenode increments the current
DataGenerationStamp by one, persists it, and returns it to the Client.
3d. The Client sends the new DataGenerationStamp to all remaining datanodes
in the pipeline.
3e. The Datanodes receive the new DataGenerationStamp and persist it.
3f. The Client can now continue, go back to Step 3 above.
4. The Datanode sends block confirmations to the namenode when the full block
is received. The block confirmation has the
blockid and DataGenerationStamp in it.
5. The Namenode receives a block confirmation from a Datanode. If the
DataGenerationStamp does not match with what is stored in
the inode, the namenode refuses to consider that Datanode as a valid
replica location. The namenode sends a block delete
command to that Datanode.
*Reader (concurrent reading while file is being appended to)*
1. A reader that opens a file gets the list of blocks from the Namenode. Each
block has the block locations
and DataGenerationStamp too.
2. A client sends the DataGenerationStamp along with every read request to a
datanode. The datanode refuses the serve the
data if the DataGenerationStamp does not match with the value in its
persistent store. In this case, the client will fail
over to other datanodes.
This algorithm came out of a discussion with Sameer. This solution does not
solve the problem of duplicate blockids
that can result when datanodes that were down for a long time re-appear.
{noformat}
> Append to files in HDFS
> -----------------------
>
> Key: HADOOP-1700
> URL: https://issues.apache.org/jira/browse/HADOOP-1700
> Project: Hadoop
> Issue Type: New Feature
> Components: dfs
> Reporter: stack
>
> Request for being able to append to files in HDFS has been raised a couple of
> times on the list of late. For one example, see
> http://www.nabble.com/HDFS%2C-appending-writes-status-tf3848237.html#a10916193.
> Other mail describes folks' workarounds because this feature is lacking:
> e.g. http://www.nabble.com/Loading-data-into-HDFS-tf4200003.html#a12039480
> (Later on this thread, Jim Kellerman re-raises the HBase need of this
> feature). HADOOP-337 'DFS files should be appendable' makes mention of file
> append but it was opened early in the life of HDFS when the focus was more on
> implementing the basics rather than adding new features. Interest fizzled.
> Because HADOOP-337 is also a bit of a grab-bag -- it includes truncation and
> being able to concurrently read/write -- rather than try and breathe new life
> into HADOOP-337, instead, here is a new issue focused on file append.
> Ultimately, being able to do as the google GFS paper describes -- having
> multiple concurrent clients making 'Atomic Record Append' to a single file
> would be sweet but at least for a first cut at this feature, IMO, a single
> client appending to a single HDFS file letting the application manage the
> access would be sufficent.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.