[jira] [Commented] (HDFS-3107) HDFS truncate

Roman Shaposhnik (JIRA) Tue, 30 Sep 2014 17:48:50 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154121#comment-14154121
 ]


Roman Shaposhnik commented on HDFS-3107:
----------------------------------------

FWIW I would like to provide a few additional datapoints to what [~shv] has 
said:
   # in its current form, this is an extremely useful self-contained feature 
that allows various vendors of solutions running on Hadoop to build products 
having having much easier time running on HDFS.
  # it is true that currently there's not immediate integration with snapshot 
functionality, but the way the current patch is implemented makes it extremely 
easy to expand the scope of the feature to snapshots. In other words, if this 
current implementation gets committed it will NOT create a migration 
opportunity. The snapshot+truncate can be added into later releases of HDFS and 
applications targeting truncate as it is implemented currently will continue to 
run unmodified.
  # as HDFS-7056 indicated it will take more time to come up with design and 
implementation of the complimentary functionality that would extend truncate to 
snapshotted files. It feels unfortunate if we had to hold the current patch 
hostage, even though today it delivers a very much needed functionality AND it 
allows for smooth migration for when snapshot+truncate gets implemented.
  # we all know that features sitting in a branch don't get exposed to 
commercial distributions and workloads as much as the ones hitting trunk do. 
This is, of course, a totally right approach to features that are half-baked or 
not self-contained, but it feels that in this particular case committing the 
patch would benefit us all by giving customers access to the self-contained 
feature AND start receiving feedback for the more extended functionality much 
earlier.
 
Hope this provides additional food for thought to reconsider this patch for 
inclusion. Also, FWIW, based on our testing, this feels like an extremely 
useful and important feature to get into Hadoop now and extend to cover 
snapshots later.

> HDFS truncate
> -------------
>
>                 Key: HDFS-3107
>                 URL: https://issues.apache.org/jira/browse/HDFS-3107
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>            Reporter: Lei Chang
>            Assignee: Plamen Jeliazkov
>         Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

Reply via email to