subject:"\[jira\] \[Commented\] \(HDFS\-3107\) HDFS truncate"

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-10-02 Thread Constantine Peresypkin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941851#comment-14941851
 ] 

Constantine Peresypkin commented on HDFS-3107:
--

Already done:
https://issues.apache.org/jira/browse/HDFS-9164

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 2.7.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-10-02 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941846#comment-14941846
 ] 

Konstantin Shvachko commented on HDFS-3107:
---

Sorry, got distracted. It would be good to create a new jira for truncate 
support in nfs.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 2.7.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-09-25 Thread Constantine Peresypkin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908412#comment-14908412
 ] 

Constantine Peresypkin commented on HDFS-3107:
--

OK, great success after patching write cache.
Where do I post the code?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 2.7.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-09-17 Thread Constantine Peresypkin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14803234#comment-14803234
 ] 

Constantine Peresypkin commented on HDFS-3107:
--

Hmm, fast patching nfs did not work. Intermittently fails on "lease already 
acquired".
Seems like nfs gateway holds leases to all files it opened in some sort of 
cache.
Very strange.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 2.7.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-09-17 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791779#comment-14791779
 ] 

Konstantin Shvachko commented on HDFS-3107:
---

I would say absolutely go for it. As that was one of the motivations for 
truncate to support nfs and fuse apis.
Let's check with the authors what is the state of nfs with respect to truncate.
[~brandonli] could you please elaborate.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 2.7.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-09-16 Thread Constantine Peresypkin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790796#comment-14790796
 ] 

Constantine Peresypkin commented on HDFS-3107:
--

What about truncate support in nfs3 proxy/gateway?
Setattr with Size still fails, should I fix it?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 2.7.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-04-28 Thread Neeta Garimella (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518741#comment-14518741
 ] 

Neeta Garimella commented on HDFS-3107:
---

Thanks Yi. I will get the latest.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 2.7.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-04-28 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518648#comment-14518648
 ] 

Yi Liu commented on HDFS-3107:
--

Hi Neeta, please checkout the latest trunk or if you get Hadoop 2.7 version, 
you will find the truncate API exposed via FileSystem there.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 2.7.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-04-28 Thread Neeta Garimella (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518572#comment-14518572
 ] 

Neeta Garimella commented on HDFS-3107:
---

Could someone comment on why truncate is not exposed via FileSystem Class. Are 
we expecting applications using this will be calling DFSClient interface 
directly?  

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 2.7.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-02-05 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308494#comment-14308494
 ] 

Tsz Wo Nicholas Sze commented on HDFS-3107:
---

We should also have a truncate test with cached data to that client won't read 
beyond truncated length from the cached data.

BTW, Section 5.2.9 Snapshot operation during truncate in the design doc is out 
dated.  It is still talking about creating a transient file.  We should update 
it.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 2.7.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-02-05 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308071#comment-14308071
 ] 

Konstantin Shvachko commented on HDFS-3107:
---

Nicholas, you are right we don't have those tests.
- For truncate with HA as I mentioned in HDFS-7738 we can incorporate truncate 
to TestHAAppend. LMK if you want to add it to your patch, otherwise I'll ad it 
to HDFS-7740.
- Created HDFS-7740 for adding truncate test with DNs restarts. Described two 
scenarios there. Feel free to add other scenarios you have in mind there.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 2.7.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-02-04 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306296#comment-14306296
 ] 

Tsz Wo Nicholas Sze commented on HDFS-3107:
---

[~shv], how is the remaining work?  Hope that we still have progress after 
merging to branch-2.

BTW, it seems that we do not have truncate tests with HA nor datanode restart.  
Is it true?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 2.7.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-26 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292447#comment-14292447
 ] 

Tsz Wo Nicholas Sze commented on HDFS-3107:
---

Thanks a lot for all the great works!

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 2.7.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-26 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291559#comment-14291559
 ] 

Yi Liu commented on HDFS-3107:
--

Thanks [~shv], I find another two things:
* DistributedFileSystem#truncate should resolve symlinks
* Expose truncate API via FileContext

I open two new JIRAs HDFS-7677, HADOOP-11510, and will fix them there.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 2.7.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-24 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290864#comment-14290864
 ] 

Tsz Wo Nicholas Sze commented on HDFS-3107:
---

Sure.  I think it is fine to merge.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-24 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290863#comment-14290863
 ] 

Tsz Wo Nicholas Sze commented on HDFS-3107:
---

Sure.  I think it is fine to merge.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-23 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290345#comment-14290345
 ] 

Konstantin Shvachko commented on HDFS-3107:
---

Yes HDFS-7659 is ready.
Documentation needs to be updates. I mentioned it in my email on the dev list 
some days ago. Other things to do are adding truncate to DFSIO and SLive.
I don't think we should wait  for them to merge. My main concern is that it 
increases the work for developers. Branches being substancially diverged means 
that it is harder to merge new code into branch-2, which is not related to 
truncate.
Also it will be easier to implement TestCLI for example or the documentation 
update and then merge it into both branches at once.
In the end it is not that we have a release planned next week.

Would it be ok with you if we commit HDFS-7659, fix TestFileTruncate as I 
proposed there, and then merge?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-23 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290095#comment-14290095
 ] 

Tsz Wo Nicholas Sze commented on HDFS-3107:
---

BTW, have we updated user documentation for the truncate CLI change?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-23 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290072#comment-14290072
 ] 

Tsz Wo Nicholas Sze commented on HDFS-3107:
---

There is a list of unresolved JIRAs.  Let's discuss it.
- HDFS-7341 Add initial snapshot support based on pipeline recovery
Is it still relevant?
- HDFS-7058 Tests for truncate CLI
Let's finish it before merging since CLI is user facing.  Is anyone working on 
it?
- HDFS-7655/HDFS-7656 Expose truncate API for Web HDFS/httpfs
It seems that we should not wait for them before merging.  Agree?
- HDFS-7659 We should check the new length of truncate can't be a negative value
Look like that this is going to be committed soon.
- HDFS-7665 Add definition of truncate preconditions/postconditions to 
filesystem specification
This is simple a documentation change.  Let's finish it before merging?

As a summary, how about finishing HDFS-7058, HDFS-7659 and HDFS-7665 before 
merging it to branch-2?


> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-22 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288936#comment-14288936
 ] 

Konstantin Shvachko commented on HDFS-3107:
---

Hey guys, Plamen's ports to branch-2 look good to me.
I'll schedule time during next weekend to commit these and related jiras. This 
is almost two weeks from the trunk commit.
LMK if there are any issues. Looks like we are making good progress.
I think we've reached the stage when further truncate related changes can be 
done both on trunk and branch-2 after the merge.
Nichloas, are you ok doing HDFS-7058, which has only TestCLI remaining, after 
the merge?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-21 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287050#comment-14287050
 ] 

Yi Liu commented on HDFS-3107:
--

I just noticed we have not exposed truncate API for Web HDFS and HDFS http fs, 
I created two JIRAs HDFS-7655, HDFS-7656 for them. 
Of course, they will *not* block the merging of this feature to branch-2. 

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-19 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282995#comment-14282995
 ] 

Konstantin Shvachko commented on HDFS-3107:
---

This is tracked under HDFS-7611. Working on it.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-19 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282983#comment-14282983
 ] 

Tsz Wo Nicholas Sze commented on HDFS-3107:
---

BTW, the truncate unit test is failing occasionally; see [build 
#9264|https://builds.apache.org/job/PreCommit-HDFS-Build/9264//testReport/], 
[build 
#9257|https://builds.apache.org/job/PreCommit-HDFS-Build/9257//testReport/], 
[build 
#9255|https://builds.apache.org/job/PreCommit-HDFS-Build/9255//testReport/] and 
[build 
#9244|https://builds.apache.org/job/PreCommit-HDFS-Build/9244/testReport/].

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-19 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282775#comment-14282775
 ] 

Tsz Wo Nicholas Sze commented on HDFS-3107:
---

It is great to merge this to branch-2.  How about finishing the tests 
(HDFS-7058) first?  I am also going to test it this week.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-19 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282267#comment-14282267
 ] 

Yi Liu commented on HDFS-3107:
--

I have done some functionality tests in my cluster, and do some code review 
from my point of view (I'm not very good at snapshot part, so just did some 
basic review for that part :)). 

+1 (non-binding) for merging into branch-2, it's good to me, except some small 
nits, and I filed them in HDFS-7634 and HDFS-7638. Certainly we can file follow 
ups if I find some new issues.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-16 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14280170#comment-14280170
 ] 

Yi Liu commented on HDFS-3107:
--

[~shv], this is a good feature and nice work. Agree with Colin that we should 
give it a work or two before merging to branch-2, then others can have enough 
test.
Personally I will get some time to verify the functionality and do some review 
next week.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-15 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279695#comment-14279695
 ] 

Uma Maheswara Rao G commented on HDFS-3107:
---

Thanks Plamen, Konstantin for the great efforts on this feature. I did not get 
chance to review this fully. 
 +1 for merging this feature into branch-2 in few days. By that time others can 
test and raise the follow ups if there are any issue. I will look into it with 
my review and file followups if any things I could notice. Thanks

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-15 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279692#comment-14279692
 ] 

Uma Maheswara Rao G commented on HDFS-3107:
---

Thanks Plamen, Konstantin for the great efforts on this feature. I did not get 
chance to review this fully. 
 +1 for merging this feature into branch-2 in few days. By that time others can 
test and raise the follow ups if there are any issue. I will look into it with 
my review and file followups if any things I could notice. Thanks

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-14 Thread Konstantin Boudnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277547#comment-14277547
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

I  think it is a reasonable expectation to do the merge in a few days or a 
week. Most importantly, the merge might require certain changes resulting from 
the conflicts resolution - so it'd be some dev. effort anyway.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-14 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277534#comment-14277534
 ] 

Colin Patrick McCabe commented on HDFS-3107:


I can't speak for anyone else on this thread, but I personally think we should 
give it a week or two and then merge to branch-2.  I hope anyone with 
unresolved concerns will post follow-ups by then.

In the future we should probably do this kind of work in a branch.  It would 
have sped up the work, because we could have committed things quickly to the 
branch.  git has made branches easier than ever.  The requirement to get three 
+1s in a branch might seem onerous, but a feature of this size needs several 
eyes on it anyway, so the review requirement ended up being about the same (or 
even more).

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-13 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275400#comment-14275400
 ] 

Hudson commented on HDFS-3107:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2023 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2023/])
HDFS-3107. Introduce truncate. Contributed by Plamen Jeliazkov. (shv: rev 
7e9358feb326d48b8c4f00249e7af5023cebd2e2)
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientNamenodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/metrics/NameNodeMetrics.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/HdfsServerConstants.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockRecoveryCommand.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRetryCacheWithHA.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSInotifyEventInputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoUnderConstruction.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileWithSnapshotFeature.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeRetryCache.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java


> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-13 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275329#comment-14275329
 ] 

Hudson commented on HDFS-3107:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #73 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/73/])
HDFS-3107. Introduce truncate. Contributed by Plamen Jeliazkov. (shv: rev 
7e9358feb326d48b8c4f00249e7af5023cebd2e2)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRetryCacheWithHA.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSInotifyEventInputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockRecoveryCommand.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileWithSnapshotFeature.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoUnderConstruction.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/HdfsServerConstants.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientNamenodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeRetryCache.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/metrics/NameNodeMetrics.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java


> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
>

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-13 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275272#comment-14275272
 ] 

Hudson commented on HDFS-3107:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2004 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2004/])
HDFS-3107. Introduce truncate. Contributed by Plamen Jeliazkov. (shv: rev 
7e9358feb326d48b8c4f00249e7af5023cebd2e2)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientNamenodeProtocol.proto
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSInotifyEventInputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRetryCacheWithHA.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/HdfsServerConstants.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeRetryCache.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoUnderConstruction.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileWithSnapshotFeature.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/metrics/NameNodeMetrics.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockRecoveryCommand.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java


> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of t

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-13 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275254#comment-14275254
 ] 

Hudson commented on HDFS-3107:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #69 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/69/])
HDFS-3107. Introduce truncate. Contributed by Plamen Jeliazkov. (shv: rev 
7e9358feb326d48b8c4f00249e7af5023cebd2e2)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoUnderConstruction.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSInotifyEventInputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileWithSnapshotFeature.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientNamenodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockRecoveryCommand.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/metrics/NameNodeMetrics.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeRetryCache.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRetryCacheWithHA.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/HdfsServerConstants.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto


> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping tr

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-13 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275106#comment-14275106
 ] 

Hudson commented on HDFS-3107:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #806 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/806/])
HDFS-3107. Introduce truncate. Contributed by Plamen Jeliazkov. (shv: rev 
7e9358feb326d48b8c4f00249e7af5023cebd2e2)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRetryCacheWithHA.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoUnderConstruction.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/metrics/NameNodeMetrics.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientNamenodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockRecoveryCommand.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeRetryCache.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSInotifyEventInputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileWithSnapshotFeature.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/HdfsServerConstants.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored


> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-13 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275088#comment-14275088
 ] 

Hudson commented on HDFS-3107:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #72 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/72/])
HDFS-3107. Introduce truncate. Contributed by Plamen Jeliazkov. (shv: rev 
7e9358feb326d48b8c4f00249e7af5023cebd2e2)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/HdfsServerConstants.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSInotifyEventInputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockRecoveryCommand.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileWithSnapshotFeature.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeRetryCache.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoUnderConstruction.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRetryCacheWithHA.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/metrics/NameNodeMetrics.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientNamenodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java


> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping tr

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-13 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274950#comment-14274950
 ] 

Hudson commented on HDFS-3107:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6853 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6853/])
HDFS-3107. Introduce truncate. Contributed by Plamen Jeliazkov. (shv: rev 
7e9358feb326d48b8c4f00249e7af5023cebd2e2)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileWithSnapshotFeature.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSInotifyEventInputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeRetryCache.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoUnderConstruction.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockRecoveryCommand.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRetryCacheWithHA.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/HdfsServerConstants.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientNamenodeProtocol.proto
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/metrics/NameNodeMetrics.java


> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track

[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-11 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273144#comment-14273144
 ] 

Konstantin Shvachko commented on HDFS-3107:
---

[~wheat9] do you have comments or suggestions on the patch? Or do you have 
anybody in particular in mind to return from vacation? I mean we cannot wait 
for everybody as the Hadoop community is pretty big - there is always somebody 
on vacation.
I don't think we are rushing here. The patch is sitting here from September. 
People expressed desire to have the snapshot support implemented before 
committing the feature, which is now also done under HDFS-7056. And went 
through a series of good review rounds by Jing, Collin, Cos, which improved the 
implementation substantially. Byron independently tested the feature on a 
cluster with and without snapshots.

So let's plan to finally commit both issues eod tomorrow Monday. If there are 
no more comments.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-12-28 Thread Konstantin Boudnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259795#comment-14259795
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

It's not rush, but having the patch sitting on the JIRA with very low comments 
activity - and none of them are really touching on the design or the 
integration of the feature to the rest of the HDFS - just increases the 
maintenance cost: the patch maintainer just keeps rebasing the patch over the 
current trunk.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-12-28 Thread Haohui Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259747#comment-14259747
 ] 

Haohui Mai commented on HDFS-3107:
--

I don't see any reasons to rush to commit the patch. It is a big feature and it 
makes sense to ensure all feedbacks are properly address.

I think it is worth holding off until people are back from the vacations.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-12-28 Thread Konstantin Boudnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259705#comment-14259705
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

I think it is ready for commit.
Any other comments from any of the reviewers?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-12-19 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14253840#comment-14253840
 ] 

Colin Patrick McCabe commented on HDFS-3107:


bq. Now, I am not posting any patches into this JIRA nor HDFS-7056. You are 
confusing me with someone else.

Sorry, [~cos], for some reason I thought you had posted the latest patch on 
this JIRA.  But now I see it was [~shv].

Anyway, I will review HDFS-7056 in a sec

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-12-18 Thread Konstantin Boudnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252940#comment-14252940
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

[~cmccabe], I have repeated +1 on the latest version of the patch which was a 
simple rebase on top of latest changes in the trunk.
Now, I am not posting any patches into this JIRA nor HDFS-7056. You are 
confusing me with someone else.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-12-18 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252934#comment-14252934
 ] 

Colin Patrick McCabe commented on HDFS-3107:


[~cos], can you answer my question about what you are +1 on?  I find it 
confusing that you are continuing to post patches to this JIRA as well as to 
HDFS-7056.  Based on the earlier comments, I thought we were going to move this 
code into subtasks.  We also discussed giving patches revision numbers on 
hdfs-dev@, (and your last comment on that thread was "Oh well. Revision numbers 
is the way to go then for now"), but still I see all these patches have the 
same name.

thanks

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-12-18 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252922#comment-14252922
 ] 

Colin Patrick McCabe commented on HDFS-3107:


I'll take a look tomorrow.  Was busy on HDFS-7443, which is a blocker for 
upgrades.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-12-18 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252766#comment-14252766
 ] 

Jing Zhao commented on HDFS-3107:
-

Yeah, I will review the latest patch in HDFS-7056 today and tomorrow.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-12-18 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252760#comment-14252760
 ] 

Konstantin Shvachko commented on HDFS-3107:
---

Colin, please do review. For your convenience here is a [direct 
link|https://issues.apache.org/jira/browse/HDFS-7056?focusedCommentId=14211912&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14211912]
 to Plamen's comment, addressing your review comments.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-12-18 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252591#comment-14252591
 ] 

Colin Patrick McCabe commented on HDFS-3107:


Hi [~shv], I see that you have submitted a patch on this JIRA, and also one on 
HDFS-7056.  Can you clarify which one you are +1 on, and proposing to commit?  
[~jingzhao], can you clarify if the latest revision on HDFS-7056 addresses your 
comments?  I will look through it and see if mine were addressed, and put that 
comment on HDFS-7056.  Thanks, all.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-12-18 Thread Konstantin Boudnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252476#comment-14252476
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

The diff between the two seems to be quite small, yet I guess it requires a 
formal review again. Hence +1 again.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-12-18 Thread Konstantin Boudnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252451#comment-14252451
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

I actually quite like it. I think over the last few iterations the patch was 
polished enough and the test coverage is quite decent. 

+1

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-12-18 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252354#comment-14252354
 ] 

Konstantin Shvachko commented on HDFS-3107:
---

Looks like HDFS-7506 broke this again.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-11-05 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198755#comment-14198755
 ] 

Colin Patrick McCabe commented on HDFS-3107:


Thanks, I will take a look at HDFS-7056.  I suppose this means we can mark 
HDFS-7341 as a duplicate.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-11-04 Thread Plamen Jeliazkov (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197027#comment-14197027
 ] 

Plamen Jeliazkov commented on HDFS-3107:


I checked FindBugs since Jenkins messed up the patch build.

I xml-diff compared the XMLs generated with and without the combined patch and 
found no new FindBugs were introduced.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-11-04 Thread Plamen Jeliazkov (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196716#comment-14196716
 ] 

Plamen Jeliazkov commented on HDFS-3107:


I will attach it to HDFS-7056 since it has the design doc attached to it and is 
assigned to me. Thanks [~cmccabe].

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-11-04 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196619#comment-14196619
 ] 

Colin Patrick McCabe commented on HDFS-3107:


bq. There is the option to treat the combined patch HDFS-3107-&-7056 as the 
first patch, which accounts for upgrade and rollback functionality as well as 
snapshot support, demonstrated in unit test.

That's fine with me.  It can go into trunk directly if it doesn't break 
rollback + snapshots.

bq. I am not objecting to do work on a branch but I am unsure it is necessary 
given the combined patch seems to meet the support requirements asked for this 
work.

I suggested a branch since I thought it would let us commit things quicker.  
But I don't think it's necessary if you can do things without breaking trunk.  
It is going to be no more than 3-4 patches anyway as I understand.  Whatever is 
easiest for you guys.

Just one request: Can you post the combined patch on a subtask rather than this 
JIRA?  I think having patches on this umbrella jira is very confusing.  If 
you're going to combine the patches, post the combined patch on either 
HDFS-7341 or HDFS-7056 please.  Thanks.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-11-04 Thread Plamen Jeliazkov (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196336#comment-14196336
 ] 

Plamen Jeliazkov commented on HDFS-3107:


[~cmccabe],

At the time [~shv] talked about my new patch there was nothing posted yet in 
HDFS-7056 minus Konstantin's design doc.
We only uploaded even newer patches yesterday around noon.

Please be careful not to confuse [~shv] and [~cos].
The snapshot support patch (for HDFS-7056) was not ready yet when [~cos] made 
his comment.

We don't have to commit HDFS-3107 on its own. 
There is the option to treat the combined patch HDFS-3107-&-7056 as the first 
patch, which accounts for upgrade and rollback functionality as well as 
snapshot support, demonstrated in unit test.
This should address your comment: "My reasoning is that if the first patch 
breaks rollback, it's tough to see it getting into trunk."
I am not objecting to do work on a branch but I am unsure it is necessary given 
the combined patch seems to meet the support requirements asked for this work.

I'll investigate the FindBugs.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-11-03 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195607#comment-14195607
 ] 

Colin Patrick McCabe commented on HDFS-3107:


Hi Konstantin,

Yes, I was aware of HDFS-7056 and had commented about it earlier.  However, 
when you announced on this JIRA: "So, but the new patch from Plamen Jeliazkov 
already have snapshot support," the patch that I checked was 
{{HDFS-3107.patch}}, not a patch on a different JIRA.  After all, [~zero45] is 
the assignee on this JIRA as well as HDFS-7056, so "the new patch from Plamen 
Jeliazkov" could refer to either patch.  It seems logical to assume that what 
is being discussed on a particular JIRA is the patch attached to that JIRA, 
unless specified otherwise.

I think this highlights one confusing thing: that HDFS-3107 is now an umbrella 
JIRA as well as a JIRA with a (non-rollup) patch.  This creates confusion in 
people's minds because questions like "is HDFS-3107 done?" become ambiguous.  
It could be interpreted as either "is the patch on HDFS-3107 done?" or "is the 
feature discussed in HDFS-3107 done?"  To remove this confusion, I created the 
HDFS-7341 subtask to implement the part we've been discussing here.  That is, 
the pipeline-recovery based solution which the other subtasks build on top of.

I would like to create the HDFS-3107 branch as soon as possible.  I would have 
already created it, but as per my comment above, I want to make sure you are 
not objecting.  The quicker we can get this stuff into the branch, the quicker 
we can get this feature polished and merged into trunk.

As I commented earlier, it's not up to me to determine if something gets into 
2.6 or not.  It's up to the release manager (currently [~acmurthy]) and the 
community to vote.  Since 2.6 is so far along (it should have been released 
weeks ago, if the original schedule had been followed), I doubt that most 
people will welcome a big new feature getting added at this stage.  I don't 
think branch versus since commit matters in this regard-- a big new feature is 
a big new feature.  It's not going to "sneak in the back door"-- nor should it, 
given the problems we've had in the past (like with HDFS append).  We have time 
to do a thorough review.  In the meantime, distributions that are already 
shipping a variant of append can continue to do so, knowing that eventually the 
feature has a path to mainline.

Let me know if you have any objections to creating the branch, otherwise I'll 
do it tomorrow.  Thanks

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-11-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195395#comment-14195395
 ] 

Hadoop QA commented on HDFS-3107:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12679004/HDFS-3107-HDFS-7056-combined.patch
  against trunk revision 35d353e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to cause Findbugs 
(version 2.0.3) to fail.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8630//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8630//console

This message is automatically generated.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-11-03 Thread Konstantin Boudnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194975#comment-14194975
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

bq. Would love to see it in a feature branch
[~rvs] how a feature branch helps to solve your issue at hands? I thought you 
guys want to have it in the next release, right?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored, editsStored, 
> editsStored.xml, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-11-03 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194946#comment-14194946
 ] 

Konstantin Shvachko commented on HDFS-3107:
---

Submitted two patches separetly. One here and the follow up patch in HDFS-7056. 
We will post a combined patch here for jenkins once the edits are regenerated.

>  I wasn't aware of HDFS-7056.

I am glad you are now. It was created almots two months ago and was mentioned 
in this jira 8 times, including one of your own comments.
Colin, it is not a problem by itself when people catch up on what they missed. 
But I am confused about your objections. You seem to be putting pressure on the 
development of this feature, but doing it without having full information at 
hand.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-11-03 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194846#comment-14194846
 ] 

Colin Patrick McCabe commented on HDFS-3107:


Great.  If there are no objections, later today I will create a feature branch 
for this and commit the patch on this JIRA to it.  (Since it has my +1 for 
committing to feature branch and we've discussed it a bunch).  I'm looking 
forward to seeing the other patches which add snapshot support and shell 
support.  Thanks, everyone.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-31 Thread Roman Shaposhnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192739#comment-14192739
 ] 

Roman Shaposhnik commented on HDFS-3107:


Would love to see it in a feature branch.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-31 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192430#comment-14192430
 ] 

Colin Patrick McCabe commented on HDFS-3107:


Thanks for the clarification.  I wasn't aware of HDFS-7056.  The design there 
looks reasonable.

I would prefer to see this work in a feature branch.  My reasoning is that if 
the first patch breaks rollback, it's tough to see it getting into trunk.  What 
do others think?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-30 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14191142#comment-14191142
 ] 

Konstantin Shvachko commented on HDFS-3107:
---

Let me try to clarify.
# The latest attached patch (Oct 17) does not support snapshots. We are 
targeting to post a patch for truncate with snapshots under HDFS-7056. We are 
in the testing stage there.
# I did comment on rollbacks [just 
yesterday|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14189216&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14189216].
 Copy replica on truncate solves the issue. Also addressed in the comming patch 
under HDFS-7056.
# The solution [you described in your 
comment|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14190706&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14190706]
 is exactly the one we are implementing here. Or I do not understand how it 
differs from the design document.

??Can we put numbers on patch files? I find it difficult to keep track of them 
when they all have the same file name.??
~Colin I usually sort attachments by date. That way you do need to track any 
file names or numbers. Actually was asking many times to change the default for 
sorting the attachements, but never got a solution for that. Sould have been so 
much easier to just look at them in the order of submission, same as comments.~

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-30 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190942#comment-14190942
 ] 

Colin Patrick McCabe commented on HDFS-3107:


bq. So, but the new patch from Plamen Jeliazkov already have snapshot support

Am I looking at the wrong version?  The patch posted on 17 Oct does not seem to 
have snapshot support.  The relevant code is here:

{code}
2242  boolean truncateInternal(String src, long newLength,
2243   String clientName, String clientMachine,
2244   long mtime, FSPermissionChecker pc)
2245  throws IOException, UnresolvedLinkException {
2246assert hasWriteLock();
2247if (isPermissionEnabled) {
2248  checkPathAccess(pc, src, FsAction.WRITE);
2249}
2250INodesInPath iip = dir.getINodesInPath4Write(src, true);
2251final int latestSnapshotId = iip.getLatestSnapshotId();
2252INodeFile inodeFile = INodeFile.valueOf(iip.getLastINode(), src);
2253// Data will be lost after truncate occurs so it cannot support 
snapshots.
2254if(inodeFile.isInLatestSnapshot(latestSnapshotId))
2255  throw new HadoopIllegalArgumentException(
2256  "Cannot truncate file with snapshot.");
{code}

I think it's misleading to refer to throwing an exception when there are 
snapshots present as "support."  But whatever we call it, my patch doesn't 
throw an exception here-- it always allows truncation even when the admin is 
using snapshots.

I was trying to be helpful and point out an approach that would let you respond 
to the comments here.  For example, [~sureshms] 's comment here:
https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14155537&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14155537
  Or [~szetszwo]'s comment about the need to support rollback.  You need to 
address the comments that people have made.

P.S. Can we put numbers on patch files?  I find it difficult to keep track of 
them when they all have the same file name.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-30 Thread Konstantin Boudnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190850#comment-14190850
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

So, but the new patch from [~zero45] already have snapshot support. Hence I'll 
repeat: what's the point of having an alternative version of the original patch?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-30 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190785#comment-14190785
 ] 

Colin Patrick McCabe commented on HDFS-3107:


The point is that the original implementation doesn't support snapshots and 
rollback.  It needs to support those two things before it can be committed to 
trunk.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-30 Thread Konstantin Boudnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190724#comment-14190724
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

Sorry, perhaps I missing something, but what's the point of the alternative 
implementation of the original patch? As far as I see [~zero45]'s version 
doesn't have any shortcomings as per the current design. What are you trying to 
achieve exactly?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-30 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190706#comment-14190706
 ] 

Colin Patrick McCabe commented on HDFS-3107:


bq. And it needs to be atomic e.g. not involving 5 RPC calls, otherwise 
recovery would be a nightmare.

The block-copying solution could be done with 1 RPC.  The NN receives the 
truncate RPC.  It marks the file as open-for-write, and creates a new block ID 
for the last block.  Then it returns to the client.  Later, when the DNs which 
have the last block check in with the NN via heartbeats, the NN sends them a 
(new) command that leads them to duplicate the block file with a new ID.  Then 
when the NN receives word that these blocks have been duplicated, it closes the 
file and truncate is complete.

This is pretty similar to the original patch, except the original patch had the 
NN send a command that led the DNs to truncate the block on-disk (destroying 
the part of it after the truncation point).  Just like in the original patch, 
the client may need to call {{isFileClosed}} a few times to wait for the 
truncation to finish.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-29 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189216#comment-14189216
 ] 

Konstantin Shvachko commented on HDFS-3107:
---

Yes, good point, Nicholas.
During upgrade old blocks are hard linked and the hard links are stored in a 
separate directory, which allows the blocks themselves be deleted while the 
file system is updated. So when we roll back the deleted blocks are still 
available via those hard links.
With truncate the same is applied to the blocks that were deleted. For the 
blocks that are truncated to a smaller length we will need to do 
copy-on-truncate recovery, same as we do for snapshots. That is if upgrade is 
in progress NN will schedule copy-on-truncate wether snapshots are present or 
not.
The bottom line is that implementing copy-on-truncate is needed both for 
snapshots and upgrades. I'll make a note to update the design.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-29 Thread Konstantin Boudnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189063#comment-14189063
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

bq. I posted it as a demonstration. I think to make it more robust we would want
And it needs to be atomic e.g. not involving 5 RPC calls, otherwise recovery 
would be a nightmare. 

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-29 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1416#comment-1416
 ] 

Colin Patrick McCabe commented on HDFS-3107:


Good point, Nicholas.

Note that the patch I posted does handle rollback correctly, since it never 
modifies any existing block files.

I posted it as a demonstration.  I think to make it more robust we would want 
to avoid having the client write out the last block and concat it, and instead 
have some other mechanism for duplicating + shortening the final block of the 
file-- possibly a new DN command similar to COPY_BLOCK, but taking a length 
argument.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-28 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14187705#comment-14187705
 ] 

Tsz Wo Nicholas Sze commented on HDFS-3107:
---

In a HDFS upgrade, how to roll back a truncated file?  The current design seems 
not covering it.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-20 Thread Plamen Jeliazkov (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178044#comment-14178044
 ] 

Plamen Jeliazkov commented on HDFS-3107:


[~srivas], 
There is no plan to grow the file by padding it with zeroes as general-purpose 
truncate does. Both [~shv] and [~lei_chang] mentioned this in their design 
docs, I believe.

[~cmccabe],
While the copying the last block up to its truncate point and doing a 
delete/concat is definitely a simpler overall approach, the full truncate 
implementation has the benefit of being a single NameNode RPC call that can 
both truncate in-place and copy-on-truncate, preserving the original last block 
and moving the 'copy&truncate' work to the DataNodes themselves (as opposed to 
having to pass data through the network / client). I am not intending to debate 
either implementation -- I like both personally; just wanted to explain as 
briefly as I could why Konstantin and I are taking our approach.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-20 Thread M. C. Srivas (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177900#comment-14177900
 ] 

M. C. Srivas commented on HDFS-3107:


Note that a general-purpose truncate can be used to also *increase* the size of 
the file.  Used very often, for example, to implement a database and growing 
the file if it isn't large enough.  Are you planning to implement truncate to 
behave so too?


> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-17 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14175835#comment-14175835
 ] 

Todd Lipcon commented on HDFS-3107:
---

bq. Dhruba, Indeed. Lack of concurrent writes to a single HDFS file means that 
there will be only a single outstanding transaction against a file (unless the 
concurrency is implemented at a higher level.) A database can consist of 
multiple files, though, and one can have multiple outstanding transactions 
against the database (one per file.) In either case, rollback is achieved by 
truncating the file to position prior to beginning of transaction.

I don't know the architecture of this particular database, but it seems to me 
that this is not the method that most databases use to issue an abort. Instead, 
an abort record is logged. Abort is a rare operation, and optimizing it by 
truncating the log doesn't seem worth the complexity to me.

Do you have some reference which explains why the truncate is advantageous for 
a database log vs logging a new abort record? Typical database logging 
architectures based on ARIES couldn't really work this way, since aborting a 
transaction may involve rolling back changes which have already been persisted 
to the on-disk state (ie applying UNDO records). Truncation can only work in a 
REDO-only logging scheme (what ARIES calls "no-steal" iirc).

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-06 Thread dhruba borthakur (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161021#comment-14161021
 ] 

dhruba borthakur commented on HDFS-3107:


Thanks for the clarification Milind. I was just making sure that i understand 
the limitations of such a database the uses the HDFS truncate feature. Given 
this fact, it is unlikely that HBase can use it (in future) to support 
transactions. 

Thanks anyways.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-06 Thread Milind Bhandarkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160511#comment-14160511
 ] 

Milind Bhandarkar commented on HDFS-3107:
-

Dhruba, Indeed. Lack of concurrent writes to a single HDFS file means that 
there will be only a single outstanding transaction against a file (unless the 
concurrency is implemented at a higher level.) A database can consist of 
multiple files, though, and one can have multiple outstanding transactions 
against the database (one per file.) In either case, rollback is achieved by 
truncating the file to position prior to beginning of transaction.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-04 Thread dhruba borthakur (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159278#comment-14159278
 ] 

dhruba borthakur commented on HDFS-3107:


Thanks KonstantinS. I get it now.

Just to make it clear, my questions/comments are neither to support this patch 
nor to reject it. I just want to understand if the 'proposed truncate' feature 
can be used by databases like HBase.

KonstantinS/Milind: If the database depends on the truncate feature of HDFS to 
rollback a transaction, won't it mean that there can be only one outstanding 
transaction per hdfs-file? Won't it be better for the database to support 
multiple outstanding transactions in parallel and have the ability to rollback 
(if needed) only one without having to rollback the others?



> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-03 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158373#comment-14158373
 ] 

Konstantin Shvachko commented on HDFS-3107:
---

Dhruba, this is not my story to tell. The use case was introduced earlier by 
Milind and he discussed it with Hairong based on the comments above. So you may 
have a local reference there.
I personally see it useful for transaction management tools / systems, which 
use HDFS as a reliable storage for their journal.
When processing begin-transaction op the transaction manager (TM) writes this 
op into the journal and remembers the offset (Offset1) in the journal file 
where it was written. Then it processes operations in the transaction and 
writes them into the journal file. If commit-transaction happens TM just keeps 
writing to the journal. If TM gets and abort-transaction op, then TM truncates 
the journal file to Offset1 and starts appending to it for further 
transactions. That way if TM is restarted it sees only committed transactions.
Hope this makes sense.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-02 Thread dhruba borthakur (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156265#comment-14156265
 ] 

dhruba borthakur commented on HDFS-3107:


Hi KonstantinS, I have been following this jira, mostly as a passive observer. 

Can you pl explain me the use-case for truncate? You might have already 
explained this earlier, but if you could again elaborate the reason why you 
need truncate. I would appreciate it a lot.

from your comments, it feels that you have a database layer on top of hdfs and 
the database is using an hdfs file as the transaction log.  But I am not able 
to understand the rest of the story.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-01 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155966#comment-14155966
 ] 

Konstantin Shvachko commented on HDFS-3107:
---

??Even with truncate, snapshot creation does not require moving data. The data 
movement happens at truncate time. Is it correct???

Good point, Nicholas. You are right the data moves on truncate in option 3 or 
op append in option 2. I'll rephrase this.

[~jingzhao] thanks for offering your help. I'll reach out offline to check on 
your availability.

[~sureshms] I thought you would notice that I am not arguing about the 
integration with snapshots or postponing it, or exposing it to the users until 
done.
I am arguing that the faster way is to commit this patch since it is ready and 
get into snapshots integration.

Setting aside commercial distributions and superiority of features.
It used to be a normal development path to split a new feature into parts and 
commit them to trunk as long as the parts are consistent and the feature is 
small, since nobody is running trunk in production. I wonder what happened with 
that practice. Truncate has only 3 subtasks. It is nowhere near 136 tasks that 
went into HDFS-2802, which was done on its branch.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-01 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155816#comment-14155816
 ] 

Colin Patrick McCabe commented on HDFS-3107:


So to summarize all this above, myself, [~jingzhao], and [~sureshms] have made 
the point that this feature is still incomplete until it supports snapshots.  
Since snapshot support may involve fundamentally changing the design (e.g. 
copying the final partial block file versus using block recovery), we need to 
figure it out before merging to trunk or branch-2.

As far as I can see, there are two options here.  We could roll the snapshots 
support into this patch, or start a feature branch with the above commit... and 
then do snapshots support (and whatever else is needed) in that feature branch. 
 I'm fine with either option, I have absolutely no preference.  I'm happy to 
review anything and Jing has offered to help with snapshots support.

[~rvs]: I realize that getting truncate into a release is important to you.  If 
people get this done by 2.6, I wouldn't oppose putting it in.  But you will 
have to convince the release manager for 2.6, and propose it to the community.  
Since 2.6 is already a very big release, I think you will get pushback.

I also think you should evaluate writing length-delimited records instead of 
using truncate.  Truncate is an operation that can fail, and relying on 
truncate to clean up mistakes will always be more fragile than writing in a 
format that can ignore torn records automatically.  Think about it... if a 
write error occurs because of a disk I/O error, isn't it likely that truncate 
will also get an error when trying to shrink that block file?  And for write 
errors that result from clients dropping off the network, truncate will also be 
impossible because of the lack of connectivity.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-01 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155663#comment-14155663
 ] 

Colin Patrick McCabe commented on HDFS-3107:


bq. I think the approach that one new feature is treated should be equally used 
for another. I am looking at HDFS-6994 and see subtasks getting committed 
despite the fact that they were rejected in the parent. Am I missing some 
subtle differences between two features?

Subtasks were not "rejected in the parent."  [~aw] made a comment that he 
doesn't like the name of the library, and he doesn't like the fact that it uses 
boost.  We agreed to consider these points once the code is in the branch since 
they can easily be changed later.  It's clear that nobody likes the current 
name, and it will almost certainly be renamed before it gets merged.  That's 
the point of a feature branch-- the code and the design are flexible while 
they're in a feature branch.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-01 Thread Konstantin Boudnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155561#comment-14155561
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

I think the approach that one new feature is treated should be equally used for 
another. I am looking at HDFS-6994 and see subtasks getting committed despite 
the fact that they were rejected in the parent. Am I missing some subtle 
differences between two features?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-01 Thread Konstantin Boudnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155546#comment-14155546
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

Sorry, what I meant so say is that yes, you're right but the actual use of the 
feature is questionable - sounded better in my head ;)

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-01 Thread Konstantin Boudnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155539#comment-14155539
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

bq. Most users of commercial distros are using snapshots
This is a hearsay as far as I know

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-01 Thread Suresh Srinivas (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155537#comment-14155537
 ] 

Suresh Srinivas commented on HDFS-3107:
---

bq. In fact, if I were to rank the importance of snapshots vs. truncate to the 
majority of our users, I'd say truncate functionality is way more important.
But the snapshots feature is already in the older releases (available since 
2.1.0). All [~cmccabe] is saying is, it is an existing feature and is currently 
used by many deployments. So debating a feature that already exists vs. a new 
feature is not useful.

I agree with [~jingzhao]'s comment. We should address the interaction of this 
feature with Snapshots. Having a feature that fails in unpredictable ways when 
a file has been snapshotted vs. not is confusing. Postponing that will also 
result in an API that changes its behavior.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-01 Thread Roman Shaposhnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155514#comment-14155514
 ] 

Roman Shaposhnik commented on HDFS-3107:


bq. Most users of commercial distros are using snapshots, so even if we pulled 
this into 2.6 and then into the commercial releases based on it, the feature 
still wouldn't get tested. And anyway it's not going to make 2.6, so let's not 
get an artificial sense of urgency here. We have time to get the design right 
and test it.

I think we can only speak on behalf of the companies that we represent. Making 
a generic statement that starts with "Most users of commercial distros" is just 
bad form. With that in mind, I can assure you that regardless of the use cases 
you see around CDH, we (PHD) see it differently. In fact, if I were to rank the 
importance of snapshots vs. truncate to the majority of our users, I'd say 
truncate functionality is way more important.

Now, this is not to say that we should be basing decisions affecting an open 
source project on the immediate needs of *any* commercial distribution.  This 
is just pointing out that the statement above is false.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-01 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155499#comment-14155499
 ] 

Colin Patrick McCabe commented on HDFS-3107:


bq. \[The boolean\] is an optimization for the case when truncate happens on 
the block boundary. Clients will save one RPC call in this particular case. 
From NameNode perspective returning the boolean does not require any extra 
processing.

Fair enough.  Can we change it to return an enum of { {{TRUNCATE_IN_PROGRESS}}, 
{{TRUNCATE_COMPLETED}} }?  A lot of developers just don't read documentation 
and will assume "it returned false, that means it failed."

bq. \[concurrent readers discussion\]

It's true that concurrent unlink has the same problem with making readers get 
mysterious IOExceptions.  I don't consider this a good behavior to copy, though 
(it's more like a bug).  Perhaps in a follow-up change, we could add some 
faculty for clients to ask the NN for information about whether the file has 
been unlinked or truncated after an unrecoverable IOException happened during a 
read?  That's probably better as a follow-up, though.

bq. as HDFS-7056 indicated it will take more time to come up with design and 
implementation of the complimentary functionality that would extend truncate to 
snapshotted files

Clearly, we're still at the stage where the design isn't complete.  This is 
exactly what feature branches are for.

bq. in its current form, this is an extremely useful self-contained feature 
that allows various vendors of solutions running on Hadoop to build products 
having having much easier time running on HDFS we all know that features 
sitting in a branch don't get exposed to commercial distributions and workloads 
as much as the ones hitting trunk do. This is, of course, a totally right 
approach to features that are half-baked or not self-contained, but it feels 
that in this particular case committing the patch would benefit us all by 
giving customers access to the self-contained feature AND start receiving 
feedback for the more extended functionality much earlier.

Most users of commercial distros are using snapshots, so even if we pulled this 
into 2.6 and then into the commercial releases based on it, the feature still 
wouldn't get tested.  And anyway it's not going to make 2.6, so let's not get 
an artificial sense of urgency here.  We have time to get the design right and 
test it.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-01 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155485#comment-14155485
 ] 

Jing Zhao commented on HDFS-3107:
-

Actually I agree with [~cmccabe] that it's better to finish HDFS-7056 before 
committing the current patch into trunk, especially considering the patch 
contains a DistributedFileSystem API already. Thus a feature branch makes sense 
to me.

I have not thought through the details about HDFS-7056 yet. But it does not 
seem to be a very hard issue. I will also be happy to help there.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-01 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155445#comment-14155445
 ] 

Tsz Wo Nicholas Sze commented on HDFS-3107:
---

Checked the design doc.  It looks quite good.  Just a minor comment below:

{quote}
Current implementation of snapshots does not require moving any data. Both 
approaches 2 and 3 mean that this property will be lost. Some data will need to 
be moved for the snapshots to work with truncates and appends.
{quote}
Even with truncate, snapshot creation does not require moving data.  The data 
movement happens at truncate time.  Is it correct?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-01 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155367#comment-14155367
 ] 

Konstantin Shvachko commented on HDFS-3107:
---

[~cmccabe] feels like some clarification is needed from you here, so that we 
could avoid misunderstanding like in the other jira.

# I assume that your [veto of Sept 17 conditional on the design 
document|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14137882&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14137882]
 had been addressed. Now with subsequent corrections in line with your 
suggestions.
# [Earlier this week you 
stated|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14152094&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14152094]
??we should hold off on committing to trunk until we figure out the snapshot 
story.??
Given the design in place, the subtask HDFS-7056 opened, and the general 
acceptance of the approach by other contributors, the snapshot story seem to be 
clear.

So if your statement is a veto could you please clarify the reason(s)? If not 
please say so, so that people could proceed with the subtasks? I hope you can 
reply by tomorrow.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-09-30 Thread Roman Shaposhnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154121#comment-14154121
 ] 

Roman Shaposhnik commented on HDFS-3107:


FWIW I would like to provide a few additional datapoints to what [~shv] has 
said:
   # in its current form, this is an extremely useful self-contained feature 
that allows various vendors of solutions running on Hadoop to build products 
having having much easier time running on HDFS.
  # it is true that currently there's not immediate integration with snapshot 
functionality, but the way the current patch is implemented makes it extremely 
easy to expand the scope of the feature to snapshots. In other words, if this 
current implementation gets committed it will NOT create a migration 
opportunity. The snapshot+truncate can be added into later releases of HDFS and 
applications targeting truncate as it is implemented currently will continue to 
run unmodified.
  # as HDFS-7056 indicated it will take more time to come up with design and 
implementation of the complimentary functionality that would extend truncate to 
snapshotted files. It feels unfortunate if we had to hold the current patch 
hostage, even though today it delivers a very much needed functionality AND it 
allows for smooth migration for when snapshot+truncate gets implemented.
  # we all know that features sitting in a branch don't get exposed to 
commercial distributions and workloads as much as the ones hitting trunk do. 
This is, of course, a totally right approach to features that are half-baked or 
not self-contained, but it feels that in this particular case committing the 
patch would benefit us all by giving customers access to the self-contained 
feature AND start receiving feedback for the more extended functionality much 
earlier.
 
Hope this provides additional food for thought to reconsider this patch for 
inclusion. Also, FWIW, based on our testing, this feels like an extremely 
useful and important feature to get into Hadoop now and extend to cover 
snapshots later.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-09-29 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152875#comment-14152875
 ] 

Konstantin Shvachko commented on HDFS-3107:
---

Thanks Dhruba and Colin for your reviews of the design document.
Colin, I'll incorporate your suggestions. But it looks that you got everything 
right from the current edition.

??{{boolean truncate(Path src, long newLength)}}. do we really need the boolean 
here???
* This is an optimization for the case when truncate happens on the block 
boundary. Clients will save one RPC call in this particular case.
>From NameNode perspective returning the boolean does not require any extra 
>processing.

??DFSInputStream#locatedBlocks will continue to have the block information it 
had prior to truncation.??
* Don't we have the same behaviour with deletes.
Somebody can delete a file on the NameNode, but readers will keep reading old 
blocks until they are deleted.
Truncate doesn't add anything new in that regard.

??I don't think we should commit anything to trunk until we figure out how this 
integrates with snapshots.??
* You should have seen HDFS-7056 subtask. Mentioning it again to reassure there 
is no intension to avoid the snapshot issue.
* People agreed above that [they are 
OK|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14129351&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14129351]
 implementing snapshot integration in a separate jira.
* We also [agreed not to port it to branch 
2|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14148406&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14148406]
 until this is completed.
* And there was a [request to 
commit|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14150308&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14150308]
 this sooner rather than later.
* Besides, it seems from your comments that you yourself are in favour of 
option 3 for snapshots from the design.

So the question arise what is not clear in the truncate-snapshot story and why 
you object committing anything to trunk?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-09-29 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152094#comment-14152094
 ] 

Colin Patrick McCabe commented on HDFS-3107:


Thanks for waiting.  I'm checking out the design doc.

{code}
In proposed approach truncate is performed only on a closed file. If the file 
is opened for write an 
attempt to truncate fails.
{code}

Just a style change, but maybe "Truncate cannot be performed on a file which is 
currently open for writing" would be clearer.

{code}
Conceptually, truncate removes all full blocks of the file and then starts a 
recovery process for the 
last block if it is not fully truncated. The truncate recovery is similar to 
standard HDFS lease recovery
procedure. That is, NameNode sends a DatanodeCommand to one of the DataNodes 
containing block 
replicas. The primary DataNode synchronizes the new length among the replicas, 
and then confirms it to 
the NameNode by sending commitBlockSynchronization() message, which completes 
the 
truncate. Until the truncate recovery is complete the file is assigned a lease, 
which revokes the ability for 
other clients to modify that file.
{code}

I think a diagram might help here.  The impression I'm getting is that we have 
some "truncation point" like this:
{code}
   truncation point
|
V 
+-+-+-+-+-+-+
| A   | B   | C   | D   | E   | F   |
+-+-+-+-+-+-+
{code}
In this case, blocks E and F would be invalidated by the NameNode, and block 
recovery would begin on block D?

"Conceptually, truncate removes all full blocks of the file" seems to suggest 
we're removing all blocks, so it might be nice to rewrite this as "Truncate 
removes all full blocks after the truncation point."

{code}
 Full blocks if any are deleted instantaneously. And if there is nothing more 
to truncate NameNode returns success to the client.
{code}

They're invalidated instantly, but not deleted instantly, right?  Clients may 
still be reading from them on the various datanodes.

{code}
public boolean truncate(Path src, long newLength)
throws IOException;
Truncate file src to the specified newLength.
Returns:
- true if the file have been truncated to the desired newLength and is 
immediately available to 
be reused for write operations such as append, or
- false if a background process of adjusting the length of the last block has 
been started, and 
clients should wait for it to complete before they can proceed with further 
file updates.
{code}

Hmm, do we really need the boolean here?  It seems like the client could simply 
try to reopen the file until it no longer got an 
{{RecoveryInProgressException.}} (or lease exception, as the case may be.)  The 
client will have to do this anyway most of the time, since most truncates don't 
fall on even block boundaries.

{code}
 It should be noted that applications that cache data may still see old bytes 
of the file stored 
in the cache. It is advised for such applications to incorporate techniques, 
which would retire cache 
when the data is truncated.
{code}

One issue that I see here is that {{DFSInputStream}} users will continue to see 
the old, longer length for a long time potentially.  
{{DFSInputStream#locatedBlocks}} will continue to have the block information it 
had prior to truncation.  And eventually, whenever they try to read from that 
longer length, they'll get read failures since the blocks will actually be 
unlinked.  These will look like IOExceptions to the user.  I don't know if 
there's a good way around this problem with the design proposed here.

bq. \[truncate with snapshots\]

I don't think we should commit anything to trunk until we figure out how this 
integrates with snapshots.  It just impacts the design too much.  When you 
start seriously thinking about snapshots, integrating this with block recovery 
(by adding {{BEING_TRUNCATED}}, etc.) does not look like a very good option.  A 
better option would be simply to copy the partial block and have the 
snapshotted version reference the old block, and the new version reference the 
(shorter) copy.  That corresponds to your approach #3, right?  truncate is 
presumably a rare operation and doing the truncation in-place for 
non-snapshotted files is an optimization we could do later.

The copy approach is also nice for {{DFSInputStream}}, since readers can 
continue reading from the old (longer) copy until the readers close.  If we 
truncated that copy directly, this would not work.

We could commit this to a branch, but I think we should hold off on committing 
to trunk until we figure out the snapshot story.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Compon

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-09-28 Thread dhruba borthakur (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151051#comment-14151051
 ] 

dhruba borthakur commented on HDFS-3107:


The design-doc looks cool. And thanks for waiting to commit, Konstantin.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-09-27 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14150830#comment-14150830
 ] 

Konstantin Shvachko commented on HDFS-3107:
---

Hi Dhruba, good to hear from you.
The design was actually [outlined on Sept 
5|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14123590&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14123590].
 The idea is rather simple. We just formalized it in a document per Colin's 
request last week. More reviews is better. So if people need time lets wait 
until end of day Monday Sept 29.
Hope Roman is OK with that?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-09-27 Thread dhruba borthakur (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14150789#comment-14150789
 ] 

dhruba borthakur commented on HDFS-3107:


hi konstantin, is it possible to wait for a few more days before committing? 
from the comments, it feels that aaron might want to do a detailed review of 
the design. Especially since the design doc was added only 4 days ago and the 
'truncate' feature is not trivial.



> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-09-27 Thread Konstantin Boudnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14150769#comment-14150769
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

Don't miss the 
{{hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java}}
 that needs to be explicitely added to the commit as it's a new file.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 174 matches

Mail list logo