[ https://issues.apache.org/jira/browse/HADOOP-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547003 ]
dhruba borthakur commented on HADOOP-1700: ------------------------------------------ Thanks Ruyue for your comments. 0. Regarding your comment about using a sequential block-id, instead of a random block-id: How do we uprade existing clusters? An existing cluster can have huge number of already allocated blocks. 1. Your comment about Point 1 makes sense. i will update the document. 2. Regarding Point 2: a file typically has few blocks, ranging from somewhere between 2 to 10 blocks. I like your proposal to optimize transaction logging, especially when the number of blocks in a file are huge. Is it possible to consider it as an enhancement and implement it after implementing HDFS Apends? Won't the system be simpler if we avoid this optimization at first go and then once HDFS Append is committed, then we can work on this optimization? > Append to files in HDFS > ----------------------- > > Key: HADOOP-1700 > URL: https://issues.apache.org/jira/browse/HADOOP-1700 > Project: Hadoop > Issue Type: New Feature > Components: dfs > Reporter: stack > Attachments: Appends.doc, Appends.doc, Appends.html > > > Request for being able to append to files in HDFS has been raised a couple of > times on the list of late. For one example, see > http://www.nabble.com/HDFS%2C-appending-writes-status-tf3848237.html#a10916193. > Other mail describes folks' workarounds because this feature is lacking: > e.g. http://www.nabble.com/Loading-data-into-HDFS-tf4200003.html#a12039480 > (Later on this thread, Jim Kellerman re-raises the HBase need of this > feature). HADOOP-337 'DFS files should be appendable' makes mention of file > append but it was opened early in the life of HDFS when the focus was more on > implementing the basics rather than adding new features. Interest fizzled. > Because HADOOP-337 is also a bit of a grab-bag -- it includes truncation and > being able to concurrently read/write -- rather than try and breathe new life > into HADOOP-337, instead, here is a new issue focused on file append. > Ultimately, being able to do as the google GFS paper describes -- having > multiple concurrent clients making 'Atomic Record Append' to a single file > would be sweet but at least for a first cut at this feature, IMO, a single > client appending to a single HDFS file letting the application manage the > access would be sufficent. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.