[jira] [Commented] (HDFS-3107) HDFS truncate

Todd Lipcon (Commented) (JIRA) Tue, 20 Mar 2012 15:50:06 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233904#comment-13233904
 ]


Todd Lipcon commented on HDFS-3107:
-----------------------------------

IMO adding truncate() adds a bunch of non-trivial complexity. It's not so much 
because truncating a block is that hard -- but rather because it breaks a 
serious invariant we have elsewhere that blocks only get longer after they are 
created. This means that we have to revisit code all over HDFS -- in particular 
some of the trickiest bits around block synchronization -- to get this to work. 
It's not insurmountable, but I would like to know a lot more about the use case 
before commenting on the API/semantics.

Maybe you can open a JIRA or upload a design about your transactional HDFS 
feature, so we can understand the motivation better? Otherwise I'm more 
inclined to agree with Eli's suggestion to remove append entirely (please 
continue that discussion on-list, though).

{quote}
After appends were enabled in HDFS, we have seen a lot of cases where a lot of 
(mainly text, or even compressed text) datasets were merged using appends.

This is where customers realize their mistake immediately after starting to 
append, and do a ctrl-c.
{quote}
I don't follow... we don't even expose append() via the shell. And if we did, 
would users actually be using "fs -append" to manually write new lines of data 
into their Hadoop systems??

                
> HDFS truncate
> -------------
>
>                 Key: HDFS-3107
>                 URL: https://issues.apache.org/jira/browse/HDFS-3107
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node, name-node
>            Reporter: Lei Chang
>         Attachments: HDFS_truncate_semantics_Mar15.pdf
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3107) HDFS truncate

Reply via email to