[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155816#comment-14155816 ]
Colin Patrick McCabe edited comment on HDFS-3107 at 10/2/14 1:12 AM: --------------------------------------------------------------------- So to summarize all this above, myself, [~jingzhao], and [~sureshms] have made the point that this feature is still incomplete until it supports snapshots. Since snapshot support may involve fundamentally changing the design (e.g. copying the final partial block file versus using block recovery), we need to figure it out before merging to trunk or branch-2. As far as I can see, there are two options here. We could roll the snapshots support into this patch, or start a feature branch with the above commit... and then do snapshots support (and whatever else is needed) in that feature branch. I'm fine with either option, I have absolutely no preference. I'm happy to review anything and Jing has offered to help with snapshots support. [~rvs]: I realize that getting truncate into a release is important to you. If people get this done by 2.6, I wouldn't oppose putting it in. But you will have to convince the release manager for 2.6, and propose it to the community. Since 2.6 is already a very big release, I think you will get pushback. I also think you should evaluate writing length-delimited records instead of using truncate. Truncate is an operation that can fail, and relying on truncate to clean up mistakes will always be more fragile than writing in a format that can ignore torn records automatically. If a client gets an error from DFSOutputStream#write because it has dropped off the network, truncate is also going to fail for the same reason. And then you have a torn record. was (Author: cmccabe): So to summarize all this above, myself, [~jingzhao], and [~sureshms] have made the point that this feature is still incomplete until it supports snapshots. Since snapshot support may involve fundamentally changing the design (e.g. copying the final partial block file versus using block recovery), we need to figure it out before merging to trunk or branch-2. As far as I can see, there are two options here. We could roll the snapshots support into this patch, or start a feature branch with the above commit... and then do snapshots support (and whatever else is needed) in that feature branch. I'm fine with either option, I have absolutely no preference. I'm happy to review anything and Jing has offered to help with snapshots support. [~rvs]: I realize that getting truncate into a release is important to you. If people get this done by 2.6, I wouldn't oppose putting it in. But you will have to convince the release manager for 2.6, and propose it to the community. Since 2.6 is already a very big release, I think you will get pushback. I also think you should evaluate writing length-delimited records instead of using truncate. Truncate is an operation that can fail, and relying on truncate to clean up mistakes will always be more fragile than writing in a format that can ignore torn records automatically. Think about it... if a write error occurs because of a disk I/O error, isn't it likely that truncate will also get an error when trying to shrink that block file? And for write errors that result from clients dropping off the network, truncate will also be impossible because of the lack of connectivity. > HDFS truncate > ------------- > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode > Reporter: Lei Chang > Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)