[jira] Commented: (HADOOP-1700) Append to files in HDFS

Jim Kellerman (JIRA) Thu, 15 Nov 2007 21:10:13 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542956
 ]


Jim Kellerman commented on HADOOP-1700:
---------------------------------------

Comments on the document.

In the Requirements section:

> There can be many simultaneous writers reading the same file at the same time.

Do you mean readers here instead of writers?


As far as I understand the internals of the DFS, this appears to satisfy our 
requirements.
I think the section entitled "File Deletions and Renames" addresses our needs, 
but let me be sure by asking the following:

A process is writing to a file and has done flushes. Another process detects 
that the writer is now dead and reads the file (up to the point it was last 
flushed) and then deletes the file (because it knows the writer is dead). This 
happens well before the file lease timeout. My understanding is that the file 
will be deleted and that if another process tries to create a file with the 
same name, this should succeed and not contain any of the contents of the 
original file. Is this correct?

If so, ++1


> Append to files in HDFS
> -----------------------
>
>                 Key: HADOOP-1700
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1700
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: stack
>         Attachments: Appends-1.xhtml, Appends.doc, Appends.htm
>
>
> Request for being able to append to files in HDFS has been raised a couple of 
> times on the list of late.   For one example, see 
> http://www.nabble.com/HDFS%2C-appending-writes-status-tf3848237.html#a10916193.
>   Other mail describes folks' workarounds because this feature is lacking: 
> e.g. http://www.nabble.com/Loading-data-into-HDFS-tf4200003.html#a12039480 
> (Later on this thread, Jim Kellerman re-raises the HBase need of this 
> feature).  HADOOP-337 'DFS files should be appendable' makes mention of file 
> append but it was opened early in the life of HDFS when the focus was more on 
> implementing the basics rather than adding new features.  Interest fizzled.  
> Because HADOOP-337 is also a bit of a grab-bag -- it includes truncation and 
> being able to concurrently read/write -- rather than try and breathe new life 
> into HADOOP-337, instead, here is a new issue focused on file append.  
> Ultimately, being able to do as the google GFS paper describes -- having 
> multiple concurrent clients making 'Atomic Record Append' to a single file 
> would be sweet but at least for a first cut at this feature, IMO, a single 
> client appending to a single HDFS file letting the application manage the 
> access would be sufficent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1700) Append to files in HDFS

Reply via email to