[
https://issues.apache.org/jira/browse/HADOOP-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525463
]
Sameer Paranjpye commented on HADOOP-1700:
------------------------------------------
> I think you're intending that datanodes do persist the block timestamp
Yes, that is what I was intending. Sorry, if that wasn't clearer in my
description. Timestamps would be persisted on the Datanodes but not on the
Namenode. Having the DFSClient generate the timestamps would open us up to some
issues with unsynchronized clocks, the probability of error is very low though.
Having the Namenode generate the timestamps would be more robust, but it might
mean more Namenode transactions depending on how frequently a timestamp is
requested. It still exposes us to problems if the clock on the Namenode machine
is turned back for some reason. Most robust would be the Namenode generating
monotonically increasing revisions.
The DFSClient could send the timestamp with each flushed buffer, but it's not
clear that we need the timestamps to be so fine grained. A new timestamp could
be generated every time a block is accessed for modification (and when an error
occurs during an append). This would require no change in the number of
Namenode transactions, but the corner cases when a client dies in the middle of
a write could be trickier to handle.
> As for new opportunities for corruption, I simply meant that having multiple
> versions of a block increases the chances of getting the wrong version
Fair point. Block revisions will mean new code that needs extensive testing and
debugging and will likely expose hidden assumptions in the code. We should get
a lot of eyes on code that we write for this issue. Given the number of
watchers on this issue, it shouldn't be a problem ;)
> Append to files in HDFS
> -----------------------
>
> Key: HADOOP-1700
> URL: https://issues.apache.org/jira/browse/HADOOP-1700
> Project: Hadoop
> Issue Type: New Feature
> Components: dfs
> Reporter: stack
>
> Request for being able to append to files in HDFS has been raised a couple of
> times on the list of late. For one example, see
> http://www.nabble.com/HDFS%2C-appending-writes-status-tf3848237.html#a10916193.
> Other mail describes folks' workarounds because this feature is lacking:
> e.g. http://www.nabble.com/Loading-data-into-HDFS-tf4200003.html#a12039480
> (Later on this thread, Jim Kellerman re-raises the HBase need of this
> feature). HADOOP-337 'DFS files should be appendable' makes mention of file
> append but it was opened early in the life of HDFS when the focus was more on
> implementing the basics rather than adding new features. Interest fizzled.
> Because HADOOP-337 is also a bit of a grab-bag -- it includes truncation and
> being able to concurrently read/write -- rather than try and breathe new life
> into HADOOP-337, instead, here is a new issue focused on file append.
> Ultimately, being able to do as the google GFS paper describes -- having
> multiple concurrent clients making 'Atomic Record Append' to a single file
> would be sweet but at least for a first cut at this feature, IMO, a single
> client appending to a single HDFS file letting the application manage the
> access would be sufficent.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.