[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941851#comment-14941851 ] Constantine Peresypkin commented on HDFS-3107: -- Already done: https://issues.apache.org/jira/browse/HDFS-9164 > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 2.7.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941846#comment-14941846 ] Konstantin Shvachko commented on HDFS-3107: --- Sorry, got distracted. It would be good to create a new jira for truncate support in nfs. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 2.7.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908412#comment-14908412 ] Constantine Peresypkin commented on HDFS-3107: -- OK, great success after patching write cache. Where do I post the code? > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 2.7.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14803234#comment-14803234 ] Constantine Peresypkin commented on HDFS-3107: -- Hmm, fast patching nfs did not work. Intermittently fails on "lease already acquired". Seems like nfs gateway holds leases to all files it opened in some sort of cache. Very strange. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 2.7.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791779#comment-14791779 ] Konstantin Shvachko commented on HDFS-3107: --- I would say absolutely go for it. As that was one of the motivations for truncate to support nfs and fuse apis. Let's check with the authors what is the state of nfs with respect to truncate. [~brandonli] could you please elaborate. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 2.7.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790796#comment-14790796 ] Constantine Peresypkin commented on HDFS-3107: -- What about truncate support in nfs3 proxy/gateway? Setattr with Size still fails, should I fix it? > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 2.7.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518741#comment-14518741 ] Neeta Garimella commented on HDFS-3107: --- Thanks Yi. I will get the latest. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 2.7.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518648#comment-14518648 ] Yi Liu commented on HDFS-3107: -- Hi Neeta, please checkout the latest trunk or if you get Hadoop 2.7 version, you will find the truncate API exposed via FileSystem there. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 2.7.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518572#comment-14518572 ] Neeta Garimella commented on HDFS-3107: --- Could someone comment on why truncate is not exposed via FileSystem Class. Are we expecting applications using this will be calling DFSClient interface directly? > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 2.7.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308494#comment-14308494 ] Tsz Wo Nicholas Sze commented on HDFS-3107: --- We should also have a truncate test with cached data to that client won't read beyond truncated length from the cached data. BTW, Section 5.2.9 Snapshot operation during truncate in the design doc is out dated. It is still talking about creating a transient file. We should update it. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 2.7.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308071#comment-14308071 ] Konstantin Shvachko commented on HDFS-3107: --- Nicholas, you are right we don't have those tests. - For truncate with HA as I mentioned in HDFS-7738 we can incorporate truncate to TestHAAppend. LMK if you want to add it to your patch, otherwise I'll ad it to HDFS-7740. - Created HDFS-7740 for adding truncate test with DNs restarts. Described two scenarios there. Feel free to add other scenarios you have in mind there. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 2.7.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306296#comment-14306296 ] Tsz Wo Nicholas Sze commented on HDFS-3107: --- [~shv], how is the remaining work? Hope that we still have progress after merging to branch-2. BTW, it seems that we do not have truncate tests with HA nor datanode restart. Is it true? > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 2.7.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292447#comment-14292447 ] Tsz Wo Nicholas Sze commented on HDFS-3107: --- Thanks a lot for all the great works! > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 2.7.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291559#comment-14291559 ] Yi Liu commented on HDFS-3107: -- Thanks [~shv], I find another two things: * DistributedFileSystem#truncate should resolve symlinks * Expose truncate API via FileContext I open two new JIRAs HDFS-7677, HADOOP-11510, and will fix them there. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 2.7.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290864#comment-14290864 ] Tsz Wo Nicholas Sze commented on HDFS-3107: --- Sure. I think it is fine to merge. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 3.0.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290863#comment-14290863 ] Tsz Wo Nicholas Sze commented on HDFS-3107: --- Sure. I think it is fine to merge. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 3.0.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290345#comment-14290345 ] Konstantin Shvachko commented on HDFS-3107: --- Yes HDFS-7659 is ready. Documentation needs to be updates. I mentioned it in my email on the dev list some days ago. Other things to do are adding truncate to DFSIO and SLive. I don't think we should wait for them to merge. My main concern is that it increases the work for developers. Branches being substancially diverged means that it is harder to merge new code into branch-2, which is not related to truncate. Also it will be easier to implement TestCLI for example or the documentation update and then merge it into both branches at once. In the end it is not that we have a release planned next week. Would it be ok with you if we commit HDFS-7659, fix TestFileTruncate as I proposed there, and then merge? > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 3.0.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290095#comment-14290095 ] Tsz Wo Nicholas Sze commented on HDFS-3107: --- BTW, have we updated user documentation for the truncate CLI change? > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 3.0.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290072#comment-14290072 ] Tsz Wo Nicholas Sze commented on HDFS-3107: --- There is a list of unresolved JIRAs. Let's discuss it. - HDFS-7341 Add initial snapshot support based on pipeline recovery Is it still relevant? - HDFS-7058 Tests for truncate CLI Let's finish it before merging since CLI is user facing. Is anyone working on it? - HDFS-7655/HDFS-7656 Expose truncate API for Web HDFS/httpfs It seems that we should not wait for them before merging. Agree? - HDFS-7659 We should check the new length of truncate can't be a negative value Look like that this is going to be committed soon. - HDFS-7665 Add definition of truncate preconditions/postconditions to filesystem specification This is simple a documentation change. Let's finish it before merging? As a summary, how about finishing HDFS-7058, HDFS-7659 and HDFS-7665 before merging it to branch-2? > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 3.0.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288936#comment-14288936 ] Konstantin Shvachko commented on HDFS-3107: --- Hey guys, Plamen's ports to branch-2 look good to me. I'll schedule time during next weekend to commit these and related jiras. This is almost two weeks from the trunk commit. LMK if there are any issues. Looks like we are making good progress. I think we've reached the stage when further truncate related changes can be done both on trunk and branch-2 after the merge. Nichloas, are you ok doing HDFS-7058, which has only TestCLI remaining, after the merge? > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 3.0.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287050#comment-14287050 ] Yi Liu commented on HDFS-3107: -- I just noticed we have not exposed truncate API for Web HDFS and HDFS http fs, I created two JIRAs HDFS-7655, HDFS-7656 for them. Of course, they will *not* block the merging of this feature to branch-2. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 3.0.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282995#comment-14282995 ] Konstantin Shvachko commented on HDFS-3107: --- This is tracked under HDFS-7611. Working on it. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 3.0.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282983#comment-14282983 ] Tsz Wo Nicholas Sze commented on HDFS-3107: --- BTW, the truncate unit test is failing occasionally; see [build #9264|https://builds.apache.org/job/PreCommit-HDFS-Build/9264//testReport/], [build #9257|https://builds.apache.org/job/PreCommit-HDFS-Build/9257//testReport/], [build #9255|https://builds.apache.org/job/PreCommit-HDFS-Build/9255//testReport/] and [build #9244|https://builds.apache.org/job/PreCommit-HDFS-Build/9244/testReport/]. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 3.0.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282775#comment-14282775 ] Tsz Wo Nicholas Sze commented on HDFS-3107: --- It is great to merge this to branch-2. How about finishing the tests (HDFS-7058) first? I am also going to test it this week. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 3.0.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282267#comment-14282267 ] Yi Liu commented on HDFS-3107: -- I have done some functionality tests in my cluster, and do some code review from my point of view (I'm not very good at snapshot part, so just did some basic review for that part :)). +1 (non-binding) for merging into branch-2, it's good to me, except some small nits, and I filed them in HDFS-7634 and HDFS-7638. Certainly we can file follow ups if I find some new issues. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 3.0.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14280170#comment-14280170 ] Yi Liu commented on HDFS-3107: -- [~shv], this is a good feature and nice work. Agree with Colin that we should give it a work or two before merging to branch-2, then others can have enough test. Personally I will get some time to verify the functionality and do some review next week. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 3.0.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279695#comment-14279695 ] Uma Maheswara Rao G commented on HDFS-3107: --- Thanks Plamen, Konstantin for the great efforts on this feature. I did not get chance to review this fully. +1 for merging this feature into branch-2 in few days. By that time others can test and raise the follow ups if there are any issue. I will look into it with my review and file followups if any things I could notice. Thanks > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 3.0.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279692#comment-14279692 ] Uma Maheswara Rao G commented on HDFS-3107: --- Thanks Plamen, Konstantin for the great efforts on this feature. I did not get chance to review this fully. +1 for merging this feature into branch-2 in few days. By that time others can test and raise the follow ups if there are any issue. I will look into it with my review and file followups if any things I could notice. Thanks > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 3.0.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277547#comment-14277547 ] Konstantin Boudnik commented on HDFS-3107: -- I think it is a reasonable expectation to do the merge in a few days or a week. Most importantly, the merge might require certain changes resulting from the conflicts resolution - so it'd be some dev. effort anyway. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 3.0.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277534#comment-14277534 ] Colin Patrick McCabe commented on HDFS-3107: I can't speak for anyone else on this thread, but I personally think we should give it a week or two and then merge to branch-2. I hope anyone with unresolved concerns will post follow-ups by then. In the future we should probably do this kind of work in a branch. It would have sped up the work, because we could have committed things quickly to the branch. git has made branches easier than ever. The requirement to get three +1s in a branch might seem onerous, but a feature of this size needs several eyes on it anyway, so the review requirement ended up being about the same (or even more). > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 3.0.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275400#comment-14275400 ] Hudson commented on HDFS-3107: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2023 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2023/]) HDFS-3107. Introduce truncate. Contributed by Plamen Jeliazkov. (shv: rev 7e9358feb326d48b8c4f00249e7af5023cebd2e2) * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientNamenodeProtocol.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/metrics/NameNodeMetrics.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/HdfsServerConstants.java * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockRecoveryCommand.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRetryCacheWithHA.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSInotifyEventInputStream.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoUnderConstruction.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileWithSnapshotFeature.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeRetryCache.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 3.0.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275329#comment-14275329 ] Hudson commented on HDFS-3107: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #73 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/73/]) HDFS-3107. Introduce truncate. Contributed by Plamen Jeliazkov. (shv: rev 7e9358feb326d48b8c4f00249e7af5023cebd2e2) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRetryCacheWithHA.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSInotifyEventInputStream.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockRecoveryCommand.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileWithSnapshotFeature.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoUnderConstruction.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/HdfsServerConstants.java * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientNamenodeProtocol.proto * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeRetryCache.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/metrics/NameNodeMetrics.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 3.0.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as >
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275272#comment-14275272 ] Hudson commented on HDFS-3107: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2004 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2004/]) HDFS-3107. Introduce truncate. Contributed by Plamen Jeliazkov. (shv: rev 7e9358feb326d48b8c4f00249e7af5023cebd2e2) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientNamenodeProtocol.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSInotifyEventInputStream.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRetryCacheWithHA.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/HdfsServerConstants.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeRetryCache.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoUnderConstruction.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileWithSnapshotFeature.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/metrics/NameNodeMetrics.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockRecoveryCommand.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 3.0.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of t
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275254#comment-14275254 ] Hudson commented on HDFS-3107: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #69 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/69/]) HDFS-3107. Introduce truncate. Contributed by Plamen Jeliazkov. (shv: rev 7e9358feb326d48b8c4f00249e7af5023cebd2e2) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoUnderConstruction.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSInotifyEventInputStream.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileWithSnapshotFeature.java * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientNamenodeProtocol.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockRecoveryCommand.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/metrics/NameNodeMetrics.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeRetryCache.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRetryCacheWithHA.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/HdfsServerConstants.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 3.0.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping tr
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275106#comment-14275106 ] Hudson commented on HDFS-3107: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #806 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/806/]) HDFS-3107. Introduce truncate. Contributed by Plamen Jeliazkov. (shv: rev 7e9358feb326d48b8c4f00249e7af5023cebd2e2) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRetryCacheWithHA.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoUnderConstruction.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/metrics/NameNodeMetrics.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientNamenodeProtocol.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockRecoveryCommand.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeRetryCache.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSInotifyEventInputStream.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileWithSnapshotFeature.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/HdfsServerConstants.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 3.0.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275088#comment-14275088 ] Hudson commented on HDFS-3107: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #72 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/72/]) HDFS-3107. Introduce truncate. Contributed by Plamen Jeliazkov. (shv: rev 7e9358feb326d48b8c4f00249e7af5023cebd2e2) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/HdfsServerConstants.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSInotifyEventInputStream.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockRecoveryCommand.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileWithSnapshotFeature.java * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeRetryCache.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoUnderConstruction.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRetryCacheWithHA.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/metrics/NameNodeMetrics.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientNamenodeProtocol.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 3.0.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping tr
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274950#comment-14274950 ] Hudson commented on HDFS-3107: -- FAILURE: Integrated in Hadoop-trunk-Commit #6853 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6853/]) HDFS-3107. Introduce truncate. Contributed by Plamen Jeliazkov. (shv: rev 7e9358feb326d48b8c4f00249e7af5023cebd2e2) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileWithSnapshotFeature.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSInotifyEventInputStream.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeRetryCache.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoUnderConstruction.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockRecoveryCommand.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRetryCacheWithHA.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/HdfsServerConstants.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientNamenodeProtocol.proto * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/metrics/NameNodeMetrics.java > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 3.0.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273144#comment-14273144 ] Konstantin Shvachko commented on HDFS-3107: --- [~wheat9] do you have comments or suggestions on the patch? Or do you have anybody in particular in mind to return from vacation? I mean we cannot wait for everybody as the Hadoop community is pretty big - there is always somebody on vacation. I don't think we are rushing here. The patch is sitting here from September. People expressed desire to have the snapshot support implemented before committing the feature, which is now also done under HDFS-7056. And went through a series of good review rounds by Jing, Collin, Cos, which improved the implementation substantially. Byron independently tested the feature on a cluster with and without snapshots. So let's plan to finally commit both issues eod tomorrow Monday. If there are no more comments. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259795#comment-14259795 ] Konstantin Boudnik commented on HDFS-3107: -- It's not rush, but having the patch sitting on the JIRA with very low comments activity - and none of them are really touching on the design or the integration of the feature to the rest of the HDFS - just increases the maintenance cost: the patch maintainer just keeps rebasing the patch over the current trunk. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259747#comment-14259747 ] Haohui Mai commented on HDFS-3107: -- I don't see any reasons to rush to commit the patch. It is a big feature and it makes sense to ensure all feedbacks are properly address. I think it is worth holding off until people are back from the vacations. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259705#comment-14259705 ] Konstantin Boudnik commented on HDFS-3107: -- I think it is ready for commit. Any other comments from any of the reviewers? > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14253840#comment-14253840 ] Colin Patrick McCabe commented on HDFS-3107: bq. Now, I am not posting any patches into this JIRA nor HDFS-7056. You are confusing me with someone else. Sorry, [~cos], for some reason I thought you had posted the latest patch on this JIRA. But now I see it was [~shv]. Anyway, I will review HDFS-7056 in a sec > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252940#comment-14252940 ] Konstantin Boudnik commented on HDFS-3107: -- [~cmccabe], I have repeated +1 on the latest version of the patch which was a simple rebase on top of latest changes in the trunk. Now, I am not posting any patches into this JIRA nor HDFS-7056. You are confusing me with someone else. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252934#comment-14252934 ] Colin Patrick McCabe commented on HDFS-3107: [~cos], can you answer my question about what you are +1 on? I find it confusing that you are continuing to post patches to this JIRA as well as to HDFS-7056. Based on the earlier comments, I thought we were going to move this code into subtasks. We also discussed giving patches revision numbers on hdfs-dev@, (and your last comment on that thread was "Oh well. Revision numbers is the way to go then for now"), but still I see all these patches have the same name. thanks > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252922#comment-14252922 ] Colin Patrick McCabe commented on HDFS-3107: I'll take a look tomorrow. Was busy on HDFS-7443, which is a blocker for upgrades. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252766#comment-14252766 ] Jing Zhao commented on HDFS-3107: - Yeah, I will review the latest patch in HDFS-7056 today and tomorrow. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252760#comment-14252760 ] Konstantin Shvachko commented on HDFS-3107: --- Colin, please do review. For your convenience here is a [direct link|https://issues.apache.org/jira/browse/HDFS-7056?focusedCommentId=14211912&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14211912] to Plamen's comment, addressing your review comments. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252591#comment-14252591 ] Colin Patrick McCabe commented on HDFS-3107: Hi [~shv], I see that you have submitted a patch on this JIRA, and also one on HDFS-7056. Can you clarify which one you are +1 on, and proposing to commit? [~jingzhao], can you clarify if the latest revision on HDFS-7056 addresses your comments? I will look through it and see if mine were addressed, and put that comment on HDFS-7056. Thanks, all. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252476#comment-14252476 ] Konstantin Boudnik commented on HDFS-3107: -- The diff between the two seems to be quite small, yet I guess it requires a formal review again. Hence +1 again. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252451#comment-14252451 ] Konstantin Boudnik commented on HDFS-3107: -- I actually quite like it. I think over the last few iterations the patch was polished enough and the test coverage is quite decent. +1 > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252354#comment-14252354 ] Konstantin Shvachko commented on HDFS-3107: --- Looks like HDFS-7506 broke this again. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198755#comment-14198755 ] Colin Patrick McCabe commented on HDFS-3107: Thanks, I will take a look at HDFS-7056. I suppose this means we can mark HDFS-7341 as a duplicate. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197027#comment-14197027 ] Plamen Jeliazkov commented on HDFS-3107: I checked FindBugs since Jenkins messed up the patch build. I xml-diff compared the XMLs generated with and without the combined patch and found no new FindBugs were introduced. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196716#comment-14196716 ] Plamen Jeliazkov commented on HDFS-3107: I will attach it to HDFS-7056 since it has the design doc attached to it and is assigned to me. Thanks [~cmccabe]. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196619#comment-14196619 ] Colin Patrick McCabe commented on HDFS-3107: bq. There is the option to treat the combined patch HDFS-3107-&-7056 as the first patch, which accounts for upgrade and rollback functionality as well as snapshot support, demonstrated in unit test. That's fine with me. It can go into trunk directly if it doesn't break rollback + snapshots. bq. I am not objecting to do work on a branch but I am unsure it is necessary given the combined patch seems to meet the support requirements asked for this work. I suggested a branch since I thought it would let us commit things quicker. But I don't think it's necessary if you can do things without breaking trunk. It is going to be no more than 3-4 patches anyway as I understand. Whatever is easiest for you guys. Just one request: Can you post the combined patch on a subtask rather than this JIRA? I think having patches on this umbrella jira is very confusing. If you're going to combine the patches, post the combined patch on either HDFS-7341 or HDFS-7056 please. Thanks. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196336#comment-14196336 ] Plamen Jeliazkov commented on HDFS-3107: [~cmccabe], At the time [~shv] talked about my new patch there was nothing posted yet in HDFS-7056 minus Konstantin's design doc. We only uploaded even newer patches yesterday around noon. Please be careful not to confuse [~shv] and [~cos]. The snapshot support patch (for HDFS-7056) was not ready yet when [~cos] made his comment. We don't have to commit HDFS-3107 on its own. There is the option to treat the combined patch HDFS-3107-&-7056 as the first patch, which accounts for upgrade and rollback functionality as well as snapshot support, demonstrated in unit test. This should address your comment: "My reasoning is that if the first patch breaks rollback, it's tough to see it getting into trunk." I am not objecting to do work on a branch but I am unsure it is necessary given the combined patch seems to meet the support requirements asked for this work. I'll investigate the FindBugs. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195607#comment-14195607 ] Colin Patrick McCabe commented on HDFS-3107: Hi Konstantin, Yes, I was aware of HDFS-7056 and had commented about it earlier. However, when you announced on this JIRA: "So, but the new patch from Plamen Jeliazkov already have snapshot support," the patch that I checked was {{HDFS-3107.patch}}, not a patch on a different JIRA. After all, [~zero45] is the assignee on this JIRA as well as HDFS-7056, so "the new patch from Plamen Jeliazkov" could refer to either patch. It seems logical to assume that what is being discussed on a particular JIRA is the patch attached to that JIRA, unless specified otherwise. I think this highlights one confusing thing: that HDFS-3107 is now an umbrella JIRA as well as a JIRA with a (non-rollup) patch. This creates confusion in people's minds because questions like "is HDFS-3107 done?" become ambiguous. It could be interpreted as either "is the patch on HDFS-3107 done?" or "is the feature discussed in HDFS-3107 done?" To remove this confusion, I created the HDFS-7341 subtask to implement the part we've been discussing here. That is, the pipeline-recovery based solution which the other subtasks build on top of. I would like to create the HDFS-3107 branch as soon as possible. I would have already created it, but as per my comment above, I want to make sure you are not objecting. The quicker we can get this stuff into the branch, the quicker we can get this feature polished and merged into trunk. As I commented earlier, it's not up to me to determine if something gets into 2.6 or not. It's up to the release manager (currently [~acmurthy]) and the community to vote. Since 2.6 is so far along (it should have been released weeks ago, if the original schedule had been followed), I doubt that most people will welcome a big new feature getting added at this stage. I don't think branch versus since commit matters in this regard-- a big new feature is a big new feature. It's not going to "sneak in the back door"-- nor should it, given the problems we've had in the past (like with HDFS append). We have time to do a thorough review. In the meantime, distributions that are already shipping a variant of append can continue to do so, knowing that eventually the feature has a path to mainline. Let me know if you have any objections to creating the branch, otherwise I'll do it tomorrow. Thanks > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195395#comment-14195395 ] Hadoop QA commented on HDFS-3107: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679004/HDFS-3107-HDFS-7056-combined.patch against trunk revision 35d353e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to cause Findbugs (version 2.0.3) to fail. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8630//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8630//console This message is automatically generated. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194975#comment-14194975 ] Konstantin Boudnik commented on HDFS-3107: -- bq. Would love to see it in a feature branch [~rvs] how a feature branch helps to solve your issue at hands? I thought you guys want to have it in the next release, right? > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored, editsStored, > editsStored.xml, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194946#comment-14194946 ] Konstantin Shvachko commented on HDFS-3107: --- Submitted two patches separetly. One here and the follow up patch in HDFS-7056. We will post a combined patch here for jenkins once the edits are regenerated. > I wasn't aware of HDFS-7056. I am glad you are now. It was created almots two months ago and was mentioned in this jira 8 times, including one of your own comments. Colin, it is not a problem by itself when people catch up on what they missed. But I am confused about your objections. You seem to be putting pressure on the development of this feature, but doing it without having full information at hand. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194846#comment-14194846 ] Colin Patrick McCabe commented on HDFS-3107: Great. If there are no objections, later today I will create a feature branch for this and commit the patch on this JIRA to it. (Since it has my +1 for committing to feature branch and we've discussed it a bunch). I'm looking forward to seeing the other patches which add snapshot support and shell support. Thanks, everyone. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192739#comment-14192739 ] Roman Shaposhnik commented on HDFS-3107: Would love to see it in a feature branch. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192430#comment-14192430 ] Colin Patrick McCabe commented on HDFS-3107: Thanks for the clarification. I wasn't aware of HDFS-7056. The design there looks reasonable. I would prefer to see this work in a feature branch. My reasoning is that if the first patch breaks rollback, it's tough to see it getting into trunk. What do others think? > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14191142#comment-14191142 ] Konstantin Shvachko commented on HDFS-3107: --- Let me try to clarify. # The latest attached patch (Oct 17) does not support snapshots. We are targeting to post a patch for truncate with snapshots under HDFS-7056. We are in the testing stage there. # I did comment on rollbacks [just yesterday|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14189216&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14189216]. Copy replica on truncate solves the issue. Also addressed in the comming patch under HDFS-7056. # The solution [you described in your comment|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14190706&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14190706] is exactly the one we are implementing here. Or I do not understand how it differs from the design document. ??Can we put numbers on patch files? I find it difficult to keep track of them when they all have the same file name.?? ~Colin I usually sort attachments by date. That way you do need to track any file names or numbers. Actually was asking many times to change the default for sorting the attachements, but never got a solution for that. Sould have been so much easier to just look at them in the order of submission, same as comments.~ > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190942#comment-14190942 ] Colin Patrick McCabe commented on HDFS-3107: bq. So, but the new patch from Plamen Jeliazkov already have snapshot support Am I looking at the wrong version? The patch posted on 17 Oct does not seem to have snapshot support. The relevant code is here: {code} 2242 boolean truncateInternal(String src, long newLength, 2243 String clientName, String clientMachine, 2244 long mtime, FSPermissionChecker pc) 2245 throws IOException, UnresolvedLinkException { 2246assert hasWriteLock(); 2247if (isPermissionEnabled) { 2248 checkPathAccess(pc, src, FsAction.WRITE); 2249} 2250INodesInPath iip = dir.getINodesInPath4Write(src, true); 2251final int latestSnapshotId = iip.getLatestSnapshotId(); 2252INodeFile inodeFile = INodeFile.valueOf(iip.getLastINode(), src); 2253// Data will be lost after truncate occurs so it cannot support snapshots. 2254if(inodeFile.isInLatestSnapshot(latestSnapshotId)) 2255 throw new HadoopIllegalArgumentException( 2256 "Cannot truncate file with snapshot."); {code} I think it's misleading to refer to throwing an exception when there are snapshots present as "support." But whatever we call it, my patch doesn't throw an exception here-- it always allows truncation even when the admin is using snapshots. I was trying to be helpful and point out an approach that would let you respond to the comments here. For example, [~sureshms] 's comment here: https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14155537&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14155537 Or [~szetszwo]'s comment about the need to support rollback. You need to address the comments that people have made. P.S. Can we put numbers on patch files? I find it difficult to keep track of them when they all have the same file name. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190850#comment-14190850 ] Konstantin Boudnik commented on HDFS-3107: -- So, but the new patch from [~zero45] already have snapshot support. Hence I'll repeat: what's the point of having an alternative version of the original patch? > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190785#comment-14190785 ] Colin Patrick McCabe commented on HDFS-3107: The point is that the original implementation doesn't support snapshots and rollback. It needs to support those two things before it can be committed to trunk. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190724#comment-14190724 ] Konstantin Boudnik commented on HDFS-3107: -- Sorry, perhaps I missing something, but what's the point of the alternative implementation of the original patch? As far as I see [~zero45]'s version doesn't have any shortcomings as per the current design. What are you trying to achieve exactly? > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190706#comment-14190706 ] Colin Patrick McCabe commented on HDFS-3107: bq. And it needs to be atomic e.g. not involving 5 RPC calls, otherwise recovery would be a nightmare. The block-copying solution could be done with 1 RPC. The NN receives the truncate RPC. It marks the file as open-for-write, and creates a new block ID for the last block. Then it returns to the client. Later, when the DNs which have the last block check in with the NN via heartbeats, the NN sends them a (new) command that leads them to duplicate the block file with a new ID. Then when the NN receives word that these blocks have been duplicated, it closes the file and truncate is complete. This is pretty similar to the original patch, except the original patch had the NN send a command that led the DNs to truncate the block on-disk (destroying the part of it after the truncation point). Just like in the original patch, the client may need to call {{isFileClosed}} a few times to wait for the truncation to finish. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189216#comment-14189216 ] Konstantin Shvachko commented on HDFS-3107: --- Yes, good point, Nicholas. During upgrade old blocks are hard linked and the hard links are stored in a separate directory, which allows the blocks themselves be deleted while the file system is updated. So when we roll back the deleted blocks are still available via those hard links. With truncate the same is applied to the blocks that were deleted. For the blocks that are truncated to a smaller length we will need to do copy-on-truncate recovery, same as we do for snapshots. That is if upgrade is in progress NN will schedule copy-on-truncate wether snapshots are present or not. The bottom line is that implementing copy-on-truncate is needed both for snapshots and upgrades. I'll make a note to update the design. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189063#comment-14189063 ] Konstantin Boudnik commented on HDFS-3107: -- bq. I posted it as a demonstration. I think to make it more robust we would want And it needs to be atomic e.g. not involving 5 RPC calls, otherwise recovery would be a nightmare. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1416#comment-1416 ] Colin Patrick McCabe commented on HDFS-3107: Good point, Nicholas. Note that the patch I posted does handle rollback correctly, since it never modifies any existing block files. I posted it as a demonstration. I think to make it more robust we would want to avoid having the client write out the last block and concat it, and instead have some other mechanism for duplicating + shortening the final block of the file-- possibly a new DN command similar to COPY_BLOCK, but taking a length argument. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14187705#comment-14187705 ] Tsz Wo Nicholas Sze commented on HDFS-3107: --- In a HDFS upgrade, how to roll back a truncated file? The current design seems not covering it. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178044#comment-14178044 ] Plamen Jeliazkov commented on HDFS-3107: [~srivas], There is no plan to grow the file by padding it with zeroes as general-purpose truncate does. Both [~shv] and [~lei_chang] mentioned this in their design docs, I believe. [~cmccabe], While the copying the last block up to its truncate point and doing a delete/concat is definitely a simpler overall approach, the full truncate implementation has the benefit of being a single NameNode RPC call that can both truncate in-place and copy-on-truncate, preserving the original last block and moving the 'copy&truncate' work to the DataNodes themselves (as opposed to having to pass data through the network / client). I am not intending to debate either implementation -- I like both personally; just wanted to explain as briefly as I could why Konstantin and I are taking our approach. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177900#comment-14177900 ] M. C. Srivas commented on HDFS-3107: Note that a general-purpose truncate can be used to also *increase* the size of the file. Used very often, for example, to implement a database and growing the file if it isn't large enough. Are you planning to implement truncate to behave so too? > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14175835#comment-14175835 ] Todd Lipcon commented on HDFS-3107: --- bq. Dhruba, Indeed. Lack of concurrent writes to a single HDFS file means that there will be only a single outstanding transaction against a file (unless the concurrency is implemented at a higher level.) A database can consist of multiple files, though, and one can have multiple outstanding transactions against the database (one per file.) In either case, rollback is achieved by truncating the file to position prior to beginning of transaction. I don't know the architecture of this particular database, but it seems to me that this is not the method that most databases use to issue an abort. Instead, an abort record is logged. Abort is a rare operation, and optimizing it by truncating the log doesn't seem worth the complexity to me. Do you have some reference which explains why the truncate is advantageous for a database log vs logging a new abort record? Typical database logging architectures based on ARIES couldn't really work this way, since aborting a transaction may involve rolling back changes which have already been persisted to the on-disk state (ie applying UNDO records). Truncation can only work in a REDO-only logging scheme (what ARIES calls "no-steal" iirc). > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161021#comment-14161021 ] dhruba borthakur commented on HDFS-3107: Thanks for the clarification Milind. I was just making sure that i understand the limitations of such a database the uses the HDFS truncate feature. Given this fact, it is unlikely that HBase can use it (in future) to support transactions. Thanks anyways. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160511#comment-14160511 ] Milind Bhandarkar commented on HDFS-3107: - Dhruba, Indeed. Lack of concurrent writes to a single HDFS file means that there will be only a single outstanding transaction against a file (unless the concurrency is implemented at a higher level.) A database can consist of multiple files, though, and one can have multiple outstanding transactions against the database (one per file.) In either case, rollback is achieved by truncating the file to position prior to beginning of transaction. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159278#comment-14159278 ] dhruba borthakur commented on HDFS-3107: Thanks KonstantinS. I get it now. Just to make it clear, my questions/comments are neither to support this patch nor to reject it. I just want to understand if the 'proposed truncate' feature can be used by databases like HBase. KonstantinS/Milind: If the database depends on the truncate feature of HDFS to rollback a transaction, won't it mean that there can be only one outstanding transaction per hdfs-file? Won't it be better for the database to support multiple outstanding transactions in parallel and have the ability to rollback (if needed) only one without having to rollback the others? > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158373#comment-14158373 ] Konstantin Shvachko commented on HDFS-3107: --- Dhruba, this is not my story to tell. The use case was introduced earlier by Milind and he discussed it with Hairong based on the comments above. So you may have a local reference there. I personally see it useful for transaction management tools / systems, which use HDFS as a reliable storage for their journal. When processing begin-transaction op the transaction manager (TM) writes this op into the journal and remembers the offset (Offset1) in the journal file where it was written. Then it processes operations in the transaction and writes them into the journal file. If commit-transaction happens TM just keeps writing to the journal. If TM gets and abort-transaction op, then TM truncates the journal file to Offset1 and starts appending to it for further transactions. That way if TM is restarted it sees only committed transactions. Hope this makes sense. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156265#comment-14156265 ] dhruba borthakur commented on HDFS-3107: Hi KonstantinS, I have been following this jira, mostly as a passive observer. Can you pl explain me the use-case for truncate? You might have already explained this earlier, but if you could again elaborate the reason why you need truncate. I would appreciate it a lot. from your comments, it feels that you have a database layer on top of hdfs and the database is using an hdfs file as the transaction log. But I am not able to understand the rest of the story. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155966#comment-14155966 ] Konstantin Shvachko commented on HDFS-3107: --- ??Even with truncate, snapshot creation does not require moving data. The data movement happens at truncate time. Is it correct??? Good point, Nicholas. You are right the data moves on truncate in option 3 or op append in option 2. I'll rephrase this. [~jingzhao] thanks for offering your help. I'll reach out offline to check on your availability. [~sureshms] I thought you would notice that I am not arguing about the integration with snapshots or postponing it, or exposing it to the users until done. I am arguing that the faster way is to commit this patch since it is ready and get into snapshots integration. Setting aside commercial distributions and superiority of features. It used to be a normal development path to split a new feature into parts and commit them to trunk as long as the parts are consistent and the feature is small, since nobody is running trunk in production. I wonder what happened with that practice. Truncate has only 3 subtasks. It is nowhere near 136 tasks that went into HDFS-2802, which was done on its branch. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155816#comment-14155816 ] Colin Patrick McCabe commented on HDFS-3107: So to summarize all this above, myself, [~jingzhao], and [~sureshms] have made the point that this feature is still incomplete until it supports snapshots. Since snapshot support may involve fundamentally changing the design (e.g. copying the final partial block file versus using block recovery), we need to figure it out before merging to trunk or branch-2. As far as I can see, there are two options here. We could roll the snapshots support into this patch, or start a feature branch with the above commit... and then do snapshots support (and whatever else is needed) in that feature branch. I'm fine with either option, I have absolutely no preference. I'm happy to review anything and Jing has offered to help with snapshots support. [~rvs]: I realize that getting truncate into a release is important to you. If people get this done by 2.6, I wouldn't oppose putting it in. But you will have to convince the release manager for 2.6, and propose it to the community. Since 2.6 is already a very big release, I think you will get pushback. I also think you should evaluate writing length-delimited records instead of using truncate. Truncate is an operation that can fail, and relying on truncate to clean up mistakes will always be more fragile than writing in a format that can ignore torn records automatically. Think about it... if a write error occurs because of a disk I/O error, isn't it likely that truncate will also get an error when trying to shrink that block file? And for write errors that result from clients dropping off the network, truncate will also be impossible because of the lack of connectivity. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155663#comment-14155663 ] Colin Patrick McCabe commented on HDFS-3107: bq. I think the approach that one new feature is treated should be equally used for another. I am looking at HDFS-6994 and see subtasks getting committed despite the fact that they were rejected in the parent. Am I missing some subtle differences between two features? Subtasks were not "rejected in the parent." [~aw] made a comment that he doesn't like the name of the library, and he doesn't like the fact that it uses boost. We agreed to consider these points once the code is in the branch since they can easily be changed later. It's clear that nobody likes the current name, and it will almost certainly be renamed before it gets merged. That's the point of a feature branch-- the code and the design are flexible while they're in a feature branch. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155561#comment-14155561 ] Konstantin Boudnik commented on HDFS-3107: -- I think the approach that one new feature is treated should be equally used for another. I am looking at HDFS-6994 and see subtasks getting committed despite the fact that they were rejected in the parent. Am I missing some subtle differences between two features? > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155546#comment-14155546 ] Konstantin Boudnik commented on HDFS-3107: -- Sorry, what I meant so say is that yes, you're right but the actual use of the feature is questionable - sounded better in my head ;) > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155539#comment-14155539 ] Konstantin Boudnik commented on HDFS-3107: -- bq. Most users of commercial distros are using snapshots This is a hearsay as far as I know > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155537#comment-14155537 ] Suresh Srinivas commented on HDFS-3107: --- bq. In fact, if I were to rank the importance of snapshots vs. truncate to the majority of our users, I'd say truncate functionality is way more important. But the snapshots feature is already in the older releases (available since 2.1.0). All [~cmccabe] is saying is, it is an existing feature and is currently used by many deployments. So debating a feature that already exists vs. a new feature is not useful. I agree with [~jingzhao]'s comment. We should address the interaction of this feature with Snapshots. Having a feature that fails in unpredictable ways when a file has been snapshotted vs. not is confusing. Postponing that will also result in an API that changes its behavior. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155514#comment-14155514 ] Roman Shaposhnik commented on HDFS-3107: bq. Most users of commercial distros are using snapshots, so even if we pulled this into 2.6 and then into the commercial releases based on it, the feature still wouldn't get tested. And anyway it's not going to make 2.6, so let's not get an artificial sense of urgency here. We have time to get the design right and test it. I think we can only speak on behalf of the companies that we represent. Making a generic statement that starts with "Most users of commercial distros" is just bad form. With that in mind, I can assure you that regardless of the use cases you see around CDH, we (PHD) see it differently. In fact, if I were to rank the importance of snapshots vs. truncate to the majority of our users, I'd say truncate functionality is way more important. Now, this is not to say that we should be basing decisions affecting an open source project on the immediate needs of *any* commercial distribution. This is just pointing out that the statement above is false. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155499#comment-14155499 ] Colin Patrick McCabe commented on HDFS-3107: bq. \[The boolean\] is an optimization for the case when truncate happens on the block boundary. Clients will save one RPC call in this particular case. From NameNode perspective returning the boolean does not require any extra processing. Fair enough. Can we change it to return an enum of { {{TRUNCATE_IN_PROGRESS}}, {{TRUNCATE_COMPLETED}} }? A lot of developers just don't read documentation and will assume "it returned false, that means it failed." bq. \[concurrent readers discussion\] It's true that concurrent unlink has the same problem with making readers get mysterious IOExceptions. I don't consider this a good behavior to copy, though (it's more like a bug). Perhaps in a follow-up change, we could add some faculty for clients to ask the NN for information about whether the file has been unlinked or truncated after an unrecoverable IOException happened during a read? That's probably better as a follow-up, though. bq. as HDFS-7056 indicated it will take more time to come up with design and implementation of the complimentary functionality that would extend truncate to snapshotted files Clearly, we're still at the stage where the design isn't complete. This is exactly what feature branches are for. bq. in its current form, this is an extremely useful self-contained feature that allows various vendors of solutions running on Hadoop to build products having having much easier time running on HDFS we all know that features sitting in a branch don't get exposed to commercial distributions and workloads as much as the ones hitting trunk do. This is, of course, a totally right approach to features that are half-baked or not self-contained, but it feels that in this particular case committing the patch would benefit us all by giving customers access to the self-contained feature AND start receiving feedback for the more extended functionality much earlier. Most users of commercial distros are using snapshots, so even if we pulled this into 2.6 and then into the commercial releases based on it, the feature still wouldn't get tested. And anyway it's not going to make 2.6, so let's not get an artificial sense of urgency here. We have time to get the design right and test it. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155485#comment-14155485 ] Jing Zhao commented on HDFS-3107: - Actually I agree with [~cmccabe] that it's better to finish HDFS-7056 before committing the current patch into trunk, especially considering the patch contains a DistributedFileSystem API already. Thus a feature branch makes sense to me. I have not thought through the details about HDFS-7056 yet. But it does not seem to be a very hard issue. I will also be happy to help there. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155445#comment-14155445 ] Tsz Wo Nicholas Sze commented on HDFS-3107: --- Checked the design doc. It looks quite good. Just a minor comment below: {quote} Current implementation of snapshots does not require moving any data. Both approaches 2 and 3 mean that this property will be lost. Some data will need to be moved for the snapshots to work with truncates and appends. {quote} Even with truncate, snapshot creation does not require moving data. The data movement happens at truncate time. Is it correct? > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155367#comment-14155367 ] Konstantin Shvachko commented on HDFS-3107: --- [~cmccabe] feels like some clarification is needed from you here, so that we could avoid misunderstanding like in the other jira. # I assume that your [veto of Sept 17 conditional on the design document|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14137882&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14137882] had been addressed. Now with subsequent corrections in line with your suggestions. # [Earlier this week you stated|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14152094&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14152094] ??we should hold off on committing to trunk until we figure out the snapshot story.?? Given the design in place, the subtask HDFS-7056 opened, and the general acceptance of the approach by other contributors, the snapshot story seem to be clear. So if your statement is a veto could you please clarify the reason(s)? If not please say so, so that people could proceed with the subtasks? I hope you can reply by tomorrow. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154121#comment-14154121 ] Roman Shaposhnik commented on HDFS-3107: FWIW I would like to provide a few additional datapoints to what [~shv] has said: # in its current form, this is an extremely useful self-contained feature that allows various vendors of solutions running on Hadoop to build products having having much easier time running on HDFS. # it is true that currently there's not immediate integration with snapshot functionality, but the way the current patch is implemented makes it extremely easy to expand the scope of the feature to snapshots. In other words, if this current implementation gets committed it will NOT create a migration opportunity. The snapshot+truncate can be added into later releases of HDFS and applications targeting truncate as it is implemented currently will continue to run unmodified. # as HDFS-7056 indicated it will take more time to come up with design and implementation of the complimentary functionality that would extend truncate to snapshotted files. It feels unfortunate if we had to hold the current patch hostage, even though today it delivers a very much needed functionality AND it allows for smooth migration for when snapshot+truncate gets implemented. # we all know that features sitting in a branch don't get exposed to commercial distributions and workloads as much as the ones hitting trunk do. This is, of course, a totally right approach to features that are half-baked or not self-contained, but it feels that in this particular case committing the patch would benefit us all by giving customers access to the self-contained feature AND start receiving feedback for the more extended functionality much earlier. Hope this provides additional food for thought to reconsider this patch for inclusion. Also, FWIW, based on our testing, this feels like an extremely useful and important feature to get into Hadoop now and extend to cover snapshots later. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152875#comment-14152875 ] Konstantin Shvachko commented on HDFS-3107: --- Thanks Dhruba and Colin for your reviews of the design document. Colin, I'll incorporate your suggestions. But it looks that you got everything right from the current edition. ??{{boolean truncate(Path src, long newLength)}}. do we really need the boolean here??? * This is an optimization for the case when truncate happens on the block boundary. Clients will save one RPC call in this particular case. >From NameNode perspective returning the boolean does not require any extra >processing. ??DFSInputStream#locatedBlocks will continue to have the block information it had prior to truncation.?? * Don't we have the same behaviour with deletes. Somebody can delete a file on the NameNode, but readers will keep reading old blocks until they are deleted. Truncate doesn't add anything new in that regard. ??I don't think we should commit anything to trunk until we figure out how this integrates with snapshots.?? * You should have seen HDFS-7056 subtask. Mentioning it again to reassure there is no intension to avoid the snapshot issue. * People agreed above that [they are OK|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14129351&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14129351] implementing snapshot integration in a separate jira. * We also [agreed not to port it to branch 2|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14148406&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14148406] until this is completed. * And there was a [request to commit|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14150308&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14150308] this sooner rather than later. * Besides, it seems from your comments that you yourself are in favour of option 3 for snapshots from the design. So the question arise what is not clear in the truncate-snapshot story and why you object committing anything to trunk? > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152094#comment-14152094 ] Colin Patrick McCabe commented on HDFS-3107: Thanks for waiting. I'm checking out the design doc. {code} In proposed approach truncate is performed only on a closed file. If the file is opened for write an attempt to truncate fails. {code} Just a style change, but maybe "Truncate cannot be performed on a file which is currently open for writing" would be clearer. {code} Conceptually, truncate removes all full blocks of the file and then starts a recovery process for the last block if it is not fully truncated. The truncate recovery is similar to standard HDFS lease recovery procedure. That is, NameNode sends a DatanodeCommand to one of the DataNodes containing block replicas. The primary DataNode synchronizes the new length among the replicas, and then confirms it to the NameNode by sending commitBlockSynchronization() message, which completes the truncate. Until the truncate recovery is complete the file is assigned a lease, which revokes the ability for other clients to modify that file. {code} I think a diagram might help here. The impression I'm getting is that we have some "truncation point" like this: {code} truncation point | V +-+-+-+-+-+-+ | A | B | C | D | E | F | +-+-+-+-+-+-+ {code} In this case, blocks E and F would be invalidated by the NameNode, and block recovery would begin on block D? "Conceptually, truncate removes all full blocks of the file" seems to suggest we're removing all blocks, so it might be nice to rewrite this as "Truncate removes all full blocks after the truncation point." {code} Full blocks if any are deleted instantaneously. And if there is nothing more to truncate NameNode returns success to the client. {code} They're invalidated instantly, but not deleted instantly, right? Clients may still be reading from them on the various datanodes. {code} public boolean truncate(Path src, long newLength) throws IOException; Truncate file src to the specified newLength. Returns: - true if the file have been truncated to the desired newLength and is immediately available to be reused for write operations such as append, or - false if a background process of adjusting the length of the last block has been started, and clients should wait for it to complete before they can proceed with further file updates. {code} Hmm, do we really need the boolean here? It seems like the client could simply try to reopen the file until it no longer got an {{RecoveryInProgressException.}} (or lease exception, as the case may be.) The client will have to do this anyway most of the time, since most truncates don't fall on even block boundaries. {code} It should be noted that applications that cache data may still see old bytes of the file stored in the cache. It is advised for such applications to incorporate techniques, which would retire cache when the data is truncated. {code} One issue that I see here is that {{DFSInputStream}} users will continue to see the old, longer length for a long time potentially. {{DFSInputStream#locatedBlocks}} will continue to have the block information it had prior to truncation. And eventually, whenever they try to read from that longer length, they'll get read failures since the blocks will actually be unlinked. These will look like IOExceptions to the user. I don't know if there's a good way around this problem with the design proposed here. bq. \[truncate with snapshots\] I don't think we should commit anything to trunk until we figure out how this integrates with snapshots. It just impacts the design too much. When you start seriously thinking about snapshots, integrating this with block recovery (by adding {{BEING_TRUNCATED}}, etc.) does not look like a very good option. A better option would be simply to copy the partial block and have the snapshotted version reference the old block, and the new version reference the (shorter) copy. That corresponds to your approach #3, right? truncate is presumably a rare operation and doing the truncation in-place for non-snapshotted files is an optimization we could do later. The copy approach is also nice for {{DFSInputStream}}, since readers can continue reading from the old (longer) copy until the readers close. If we truncated that copy directly, this would not work. We could commit this to a branch, but I think we should hold off on committing to trunk until we figure out the snapshot story. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Compon
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151051#comment-14151051 ] dhruba borthakur commented on HDFS-3107: The design-doc looks cool. And thanks for waiting to commit, Konstantin. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14150830#comment-14150830 ] Konstantin Shvachko commented on HDFS-3107: --- Hi Dhruba, good to hear from you. The design was actually [outlined on Sept 5|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14123590&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14123590]. The idea is rather simple. We just formalized it in a document per Colin's request last week. More reviews is better. So if people need time lets wait until end of day Monday Sept 29. Hope Roman is OK with that? > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14150789#comment-14150789 ] dhruba borthakur commented on HDFS-3107: hi konstantin, is it possible to wait for a few more days before committing? from the comments, it feels that aaron might want to do a detailed review of the design. Especially since the design doc was added only 4 days ago and the 'truncate' feature is not trivial. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14150769#comment-14150769 ] Konstantin Boudnik commented on HDFS-3107: -- Don't miss the {{hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java}} that needs to be explicitely added to the commit as it's a new file. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > editsStored > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)