[jira] Commented: (HADOOP-1700) Append to files in HDFS

Doug Cutting (JIRA) Wed, 05 Sep 2007 12:58:03 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525203
 ]


Doug Cutting commented on HADOOP-1700:
--------------------------------------

> If a Datanode goes down with garbage blocks, and some of those blockids are 
> assigned to new files [ ... ]

How is this related to append?  Block id collisions are improbable events that 
perhaps we should concern ourselves with more.  It seems you're arguing that we 
should minimize block id allocation in order to address this, but that seems 
like a tiny bandaid and mostly irrelevant to the present discussion. 

The proposal I'd made was that new block ids created by appends should only be 
reported to the namenode when they're complete, and the namenode should not 
commit a file's new block id list until all block ids are reported complete, 
making appends atomic.  Such an approach would inhibit rapid visibility of 
appends, which may be required, and that is a vote against it. So we may have 
to go with something like non-persistent block timestamps, but I still don't 
see your point about non-persistent timestamps substantially reducing 
corruption.  Rather they seem to create new opportunities for corruption that 
may require new mechanisms to prevent such corruptions, and that may just be 
the price we have to pay.


> Append to files in HDFS
> -----------------------
>
>                 Key: HADOOP-1700
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1700
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: stack
>
> Request for being able to append to files in HDFS has been raised a couple of 
> times on the list of late.   For one example, see 
> http://www.nabble.com/HDFS%2C-appending-writes-status-tf3848237.html#a10916193.
>   Other mail describes folks' workarounds because this feature is lacking: 
> e.g. http://www.nabble.com/Loading-data-into-HDFS-tf4200003.html#a12039480 
> (Later on this thread, Jim Kellerman re-raises the HBase need of this 
> feature).  HADOOP-337 'DFS files should be appendable' makes mention of file 
> append but it was opened early in the life of HDFS when the focus was more on 
> implementing the basics rather than adding new features.  Interest fizzled.  
> Because HADOOP-337 is also a bit of a grab-bag -- it includes truncation and 
> being able to concurrently read/write -- rather than try and breathe new life 
> into HADOOP-337, instead, here is a new issue focused on file append.  
> Ultimately, being able to do as the google GFS paper describes -- having 
> multiple concurrent clients making 'Atomic Record Append' to a single file 
> would be sweet but at least for a first cut at this feature, IMO, a single 
> client appending to a single HDFS file letting the application manage the 
> access would be sufficent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1700) Append to files in HDFS

Reply via email to