[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154121#comment-14154121 ]
Roman Shaposhnik commented on HDFS-3107: ---------------------------------------- FWIW I would like to provide a few additional datapoints to what [~shv] has said: # in its current form, this is an extremely useful self-contained feature that allows various vendors of solutions running on Hadoop to build products having having much easier time running on HDFS. # it is true that currently there's not immediate integration with snapshot functionality, but the way the current patch is implemented makes it extremely easy to expand the scope of the feature to snapshots. In other words, if this current implementation gets committed it will NOT create a migration opportunity. The snapshot+truncate can be added into later releases of HDFS and applications targeting truncate as it is implemented currently will continue to run unmodified. # as HDFS-7056 indicated it will take more time to come up with design and implementation of the complimentary functionality that would extend truncate to snapshotted files. It feels unfortunate if we had to hold the current patch hostage, even though today it delivers a very much needed functionality AND it allows for smooth migration for when snapshot+truncate gets implemented. # we all know that features sitting in a branch don't get exposed to commercial distributions and workloads as much as the ones hitting trunk do. This is, of course, a totally right approach to features that are half-baked or not self-contained, but it feels that in this particular case committing the patch would benefit us all by giving customers access to the self-contained feature AND start receiving feedback for the more extended functionality much earlier. Hope this provides additional food for thought to reconsider this patch for inclusion. Also, FWIW, based on our testing, this feels like an extremely useful and important feature to get into Hadoop now and extend to cover snapshots later. > HDFS truncate > ------------- > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode > Reporter: Lei Chang > Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)