[ 
https://issues.apache.org/jira/browse/HADOOP-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525463
 ] 

Sameer Paranjpye commented on HADOOP-1700:
------------------------------------------

> I think you're intending that datanodes do persist the block timestamp

Yes, that is what I was intending. Sorry, if that wasn't clearer in my 
description. Timestamps would be persisted on the Datanodes but not on the 
Namenode. Having the DFSClient generate the timestamps would open us up to some 
issues with unsynchronized clocks, the probability of error is very low though. 
Having the Namenode generate the timestamps would be more robust, but it might 
mean more Namenode transactions depending on how frequently a timestamp is 
requested. It still exposes us to problems if the clock on the Namenode machine 
is turned back for some reason. Most robust would be the Namenode generating 
monotonically increasing revisions.

The DFSClient could send the timestamp with each flushed buffer, but it's not 
clear that we need the timestamps to be so fine grained. A new timestamp could 
be generated every time a block is accessed for modification (and when an error 
occurs during an append). This would require no change in the number of 
Namenode transactions, but the corner cases when a client dies in the middle of 
a write could be trickier to handle.

> As for new opportunities for corruption, I simply meant that having multiple 
> versions of a block increases the chances of getting the wrong version

Fair point. Block revisions will mean new code that needs extensive testing and 
debugging and will likely expose hidden assumptions in the code. We should get 
a lot of eyes on code that we write for this issue. Given the number of 
watchers on this issue, it shouldn't be a problem ;)




> Append to files in HDFS
> -----------------------
>
>                 Key: HADOOP-1700
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1700
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: stack
>
> Request for being able to append to files in HDFS has been raised a couple of 
> times on the list of late.   For one example, see 
> http://www.nabble.com/HDFS%2C-appending-writes-status-tf3848237.html#a10916193.
>   Other mail describes folks' workarounds because this feature is lacking: 
> e.g. http://www.nabble.com/Loading-data-into-HDFS-tf4200003.html#a12039480 
> (Later on this thread, Jim Kellerman re-raises the HBase need of this 
> feature).  HADOOP-337 'DFS files should be appendable' makes mention of file 
> append but it was opened early in the life of HDFS when the focus was more on 
> implementing the basics rather than adding new features.  Interest fizzled.  
> Because HADOOP-337 is also a bit of a grab-bag -- it includes truncation and 
> being able to concurrently read/write -- rather than try and breathe new life 
> into HADOOP-337, instead, here is a new issue focused on file append.  
> Ultimately, being able to do as the google GFS paper describes -- having 
> multiple concurrent clients making 'Atomic Record Append' to a single file 
> would be sweet but at least for a first cut at this feature, IMO, a single 
> client appending to a single HDFS file letting the application manage the 
> access would be sufficent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to