[jira] [Updated] (HDFS-10763) Open files can leak permanently due to inconsistent lease update

2017-08-31 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-10763:
--
Fix Version/s: 2.8.0
   2.9.0

> Open files can leak permanently due to inconsistent lease update
> 
>
> Key: HDFS-10763
> URL: https://issues.apache.org/jira/browse/HDFS-10763
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.6.4
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.8.0, 2.9.0, 2.6.5, 2.7.4, 3.0.0-alpha1
>
> Attachments: HDFS-10763.br27.patch, 
> HDFS-10763.branch-2.7.supplement.patch, HDFS-10763.branch-2.7.v2.patch, 
> HDFS-10763.patch
>
>
> This can heppen during {{commitBlockSynchronization()}} or a client gives up 
> on closing a file after retries.
> From {{finalizeINodeFileUnderConstruction()}}, the lease is removed first and 
> then the inode is turned into the closed state. But if any block is not in 
> COMPLETE state, 
> {{INodeFile#assertAllBlocksComplete()}} will throw an exception. This will 
> cause the lease is removed from the lease manager, but not from the inode. 
> Since the lease manager does not have a lease for the file, no lease recovery 
> will happen for this file. Moreover, this broken state is persisted and 
> reconstructed through saving and loading of fsimage. Since no replication is 
> scheduled for the blocks for the file, this can cause a data loss and also 
> block decommissioning of datanode.
> The lease cannot be manually recovered either. It fails with
> {noformat}
> ...AlreadyBeingCreatedException): Failed to RECOVER_LEASE /xyz/xyz for user1 
> on
>  0.0.0.1 because the file is under construction but no leases found.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2950)
> ...
> {noformat}
> When a client retries {{close()}}, the same inconsistent state is created, 
> but it can work in the next time since {{checkLease()}} only looks at the 
> inode, not the lease manager in this case. The close behavior is different if 
> HDFS-8999 is activated by setting 
> {{dfs.namenode.file.close.num-committed-allowed}} to 1 (unlikely) or 2 
> (never). 
> In principle, the under-construction feature of an inode and the lease in the 
> lease manager should never go out of sync. The fix involves two parts.
> 1) Prevent inconsistent lease updates. We can achieve this by calling 
> {{removeLease()}} after checking the block state. 
> 2) Avoid reconstructing inconsistent lease states from a fsimage. 1) alone 
> does not correct the existing inconsistencies surviving through fsimages.  
> This can be done during fsimage loading time by making sure a corresponding 
> lease exists for each inode that are with the underconstruction feature. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10763) Open files can leak permanently due to inconsistent lease update

2016-11-21 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-10763:
---
Fix Version/s: (was: 3.0.0-alpha2)
   3.0.0-alpha1

> Open files can leak permanently due to inconsistent lease update
> 
>
> Key: HDFS-10763
> URL: https://issues.apache.org/jira/browse/HDFS-10763
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.6.4
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.6.5, 2.7.4, 3.0.0-alpha1
>
> Attachments: HDFS-10763.br27.patch, 
> HDFS-10763.branch-2.7.supplement.patch, HDFS-10763.branch-2.7.v2.patch, 
> HDFS-10763.patch
>
>
> This can heppen during {{commitBlockSynchronization()}} or a client gives up 
> on closing a file after retries.
> From {{finalizeINodeFileUnderConstruction()}}, the lease is removed first and 
> then the inode is turned into the closed state. But if any block is not in 
> COMPLETE state, 
> {{INodeFile#assertAllBlocksComplete()}} will throw an exception. This will 
> cause the lease is removed from the lease manager, but not from the inode. 
> Since the lease manager does not have a lease for the file, no lease recovery 
> will happen for this file. Moreover, this broken state is persisted and 
> reconstructed through saving and loading of fsimage. Since no replication is 
> scheduled for the blocks for the file, this can cause a data loss and also 
> block decommissioning of datanode.
> The lease cannot be manually recovered either. It fails with
> {noformat}
> ...AlreadyBeingCreatedException): Failed to RECOVER_LEASE /xyz/xyz for user1 
> on
>  0.0.0.1 because the file is under construction but no leases found.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2950)
> ...
> {noformat}
> When a client retries {{close()}}, the same inconsistent state is created, 
> but it can work in the next time since {{checkLease()}} only looks at the 
> inode, not the lease manager in this case. The close behavior is different if 
> HDFS-8999 is activated by setting 
> {{dfs.namenode.file.close.num-committed-allowed}} to 1 (unlikely) or 2 
> (never). 
> In principle, the under-construction feature of an inode and the lease in the 
> lease manager should never go out of sync. The fix involves two parts.
> 1) Prevent inconsistent lease updates. We can achieve this by calling 
> {{removeLease()}} after checking the block state. 
> 2) Avoid reconstructing inconsistent lease states from a fsimage. 1) alone 
> does not correct the existing inconsistencies surviving through fsimages.  
> This can be done during fsimage loading time by making sure a corresponding 
> lease exists for each inode that are with the underconstruction feature. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10763) Open files can leak permanently due to inconsistent lease update

2016-09-14 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HDFS-10763:
---
Fix Version/s: 2.6.5

Cherry-picked it to 2.6.5 (trivial).

> Open files can leak permanently due to inconsistent lease update
> 
>
> Key: HDFS-10763
> URL: https://issues.apache.org/jira/browse/HDFS-10763
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.6.4
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.6.5, 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-10763.br27.patch, 
> HDFS-10763.branch-2.7.supplement.patch, HDFS-10763.branch-2.7.v2.patch, 
> HDFS-10763.patch
>
>
> This can heppen during {{commitBlockSynchronization()}} or a client gives up 
> on closing a file after retries.
> From {{finalizeINodeFileUnderConstruction()}}, the lease is removed first and 
> then the inode is turned into the closed state. But if any block is not in 
> COMPLETE state, 
> {{INodeFile#assertAllBlocksComplete()}} will throw an exception. This will 
> cause the lease is removed from the lease manager, but not from the inode. 
> Since the lease manager does not have a lease for the file, no lease recovery 
> will happen for this file. Moreover, this broken state is persisted and 
> reconstructed through saving and loading of fsimage. Since no replication is 
> scheduled for the blocks for the file, this can cause a data loss and also 
> block decommissioning of datanode.
> The lease cannot be manually recovered either. It fails with
> {noformat}
> ...AlreadyBeingCreatedException): Failed to RECOVER_LEASE /xyz/xyz for user1 
> on
>  0.0.0.1 because the file is under construction but no leases found.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2950)
> ...
> {noformat}
> When a client retries {{close()}}, the same inconsistent state is created, 
> but it can work in the next time since {{checkLease()}} only looks at the 
> inode, not the lease manager in this case. The close behavior is different if 
> HDFS-8999 is activated by setting 
> {{dfs.namenode.file.close.num-committed-allowed}} to 1 (unlikely) or 2 
> (never). 
> In principle, the under-construction feature of an inode and the lease in the 
> lease manager should never go out of sync. The fix involves two parts.
> 1) Prevent inconsistent lease updates. We can achieve this by calling 
> {{removeLease()}} after checking the block state. 
> 2) Avoid reconstructing inconsistent lease states from a fsimage. 1) alone 
> does not correct the existing inconsistencies surviving through fsimages.  
> This can be done during fsimage loading time by making sure a corresponding 
> lease exists for each inode that are with the underconstruction feature. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10763) Open files can leak permanently due to inconsistent lease update

2016-08-18 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-10763:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed the patch to branch-2.7.

> Open files can leak permanently due to inconsistent lease update
> 
>
> Key: HDFS-10763
> URL: https://issues.apache.org/jira/browse/HDFS-10763
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.6.4
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-10763.br27.patch, 
> HDFS-10763.branch-2.7.supplement.patch, HDFS-10763.branch-2.7.v2.patch, 
> HDFS-10763.patch
>
>
> This can heppen during {{commitBlockSynchronization()}} or a client gives up 
> on closing a file after retries.
> From {{finalizeINodeFileUnderConstruction()}}, the lease is removed first and 
> then the inode is turned into the closed state. But if any block is not in 
> COMPLETE state, 
> {{INodeFile#assertAllBlocksComplete()}} will throw an exception. This will 
> cause the lease is removed from the lease manager, but not from the inode. 
> Since the lease manager does not have a lease for the file, no lease recovery 
> will happen for this file. Moreover, this broken state is persisted and 
> reconstructed through saving and loading of fsimage. Since no replication is 
> scheduled for the blocks for the file, this can cause a data loss and also 
> block decommissioning of datanode.
> The lease cannot be manually recovered either. It fails with
> {noformat}
> ...AlreadyBeingCreatedException): Failed to RECOVER_LEASE /xyz/xyz for user1 
> on
>  0.0.0.1 because the file is under construction but no leases found.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2950)
> ...
> {noformat}
> When a client retries {{close()}}, the same inconsistent state is created, 
> but it can work in the next time since {{checkLease()}} only looks at the 
> inode, not the lease manager in this case. The close behavior is different if 
> HDFS-8999 is activated by setting 
> {{dfs.namenode.file.close.num-committed-allowed}} to 1 (unlikely) or 2 
> (never). 
> In principle, the under-construction feature of an inode and the lease in the 
> lease manager should never go out of sync. The fix involves two parts.
> 1) Prevent inconsistent lease updates. We can achieve this by calling 
> {{removeLease()}} after checking the block state. 
> 2) Avoid reconstructing inconsistent lease states from a fsimage. 1) alone 
> does not correct the existing inconsistencies surviving through fsimages.  
> This can be done during fsimage loading time by making sure a corresponding 
> lease exists for each inode that are with the underconstruction feature. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10763) Open files can leak permanently due to inconsistent lease update

2016-08-18 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-10763:
--
Attachment: HDFS-10763.branch-2.7.v2.patch

Reverted the original commit from branch-2.7.
Attaching a new patch that includes everything + addressing the review comment.

> Open files can leak permanently due to inconsistent lease update
> 
>
> Key: HDFS-10763
> URL: https://issues.apache.org/jira/browse/HDFS-10763
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.6.4
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-10763.br27.patch, 
> HDFS-10763.branch-2.7.supplement.patch, HDFS-10763.branch-2.7.v2.patch, 
> HDFS-10763.patch
>
>
> This can heppen during {{commitBlockSynchronization()}} or a client gives up 
> on closing a file after retries.
> From {{finalizeINodeFileUnderConstruction()}}, the lease is removed first and 
> then the inode is turned into the closed state. But if any block is not in 
> COMPLETE state, 
> {{INodeFile#assertAllBlocksComplete()}} will throw an exception. This will 
> cause the lease is removed from the lease manager, but not from the inode. 
> Since the lease manager does not have a lease for the file, no lease recovery 
> will happen for this file. Moreover, this broken state is persisted and 
> reconstructed through saving and loading of fsimage. Since no replication is 
> scheduled for the blocks for the file, this can cause a data loss and also 
> block decommissioning of datanode.
> The lease cannot be manually recovered either. It fails with
> {noformat}
> ...AlreadyBeingCreatedException): Failed to RECOVER_LEASE /xyz/xyz for user1 
> on
>  0.0.0.1 because the file is under construction but no leases found.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2950)
> ...
> {noformat}
> When a client retries {{close()}}, the same inconsistent state is created, 
> but it can work in the next time since {{checkLease()}} only looks at the 
> inode, not the lease manager in this case. The close behavior is different if 
> HDFS-8999 is activated by setting 
> {{dfs.namenode.file.close.num-committed-allowed}} to 1 (unlikely) or 2 
> (never). 
> In principle, the under-construction feature of an inode and the lease in the 
> lease manager should never go out of sync. The fix involves two parts.
> 1) Prevent inconsistent lease updates. We can achieve this by calling 
> {{removeLease()}} after checking the block state. 
> 2) Avoid reconstructing inconsistent lease states from a fsimage. 1) alone 
> does not correct the existing inconsistencies surviving through fsimages.  
> This can be done during fsimage loading time by making sure a corresponding 
> lease exists for each inode that are with the underconstruction feature. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10763) Open files can leak permanently due to inconsistent lease update

2016-08-17 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-10763:
--
Status: Patch Available  (was: Reopened)

> Open files can leak permanently due to inconsistent lease update
> 
>
> Key: HDFS-10763
> URL: https://issues.apache.org/jira/browse/HDFS-10763
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.4, 2.7.3
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-10763.br27.patch, 
> HDFS-10763.branch-2.7.supplement.patch, HDFS-10763.patch
>
>
> This can heppen during {{commitBlockSynchronization()}} or a client gives up 
> on closing a file after retries.
> From {{finalizeINodeFileUnderConstruction()}}, the lease is removed first and 
> then the inode is turned into the closed state. But if any block is not in 
> COMPLETE state, 
> {{INodeFile#assertAllBlocksComplete()}} will throw an exception. This will 
> cause the lease is removed from the lease manager, but not from the inode. 
> Since the lease manager does not have a lease for the file, no lease recovery 
> will happen for this file. Moreover, this broken state is persisted and 
> reconstructed through saving and loading of fsimage. Since no replication is 
> scheduled for the blocks for the file, this can cause a data loss and also 
> block decommissioning of datanode.
> The lease cannot be manually recovered either. It fails with
> {noformat}
> ...AlreadyBeingCreatedException): Failed to RECOVER_LEASE /xyz/xyz for user1 
> on
>  0.0.0.1 because the file is under construction but no leases found.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2950)
> ...
> {noformat}
> When a client retries {{close()}}, the same inconsistent state is created, 
> but it can work in the next time since {{checkLease()}} only looks at the 
> inode, not the lease manager in this case. The close behavior is different if 
> HDFS-8999 is activated by setting 
> {{dfs.namenode.file.close.num-committed-allowed}} to 1 (unlikely) or 2 
> (never). 
> In principle, the under-construction feature of an inode and the lease in the 
> lease manager should never go out of sync. The fix involves two parts.
> 1) Prevent inconsistent lease updates. We can achieve this by calling 
> {{removeLease()}} after checking the block state. 
> 2) Avoid reconstructing inconsistent lease states from a fsimage. 1) alone 
> does not correct the existing inconsistencies surviving through fsimages.  
> This can be done during fsimage loading time by making sure a corresponding 
> lease exists for each inode that are with the underconstruction feature. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10763) Open files can leak permanently due to inconsistent lease update

2016-08-17 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-10763:
--
Attachment: HDFS-10763.branch-2.7.supplement.patch

Attaching a supplemental patch for branch-2.7. This skips restoration of lease 
for deleted files that are still under construction in a snapshot, just like 
before.  Again, this behavior did not change with the initial patch for trunk 
through branch-2.8. It only affected branch-2.7 as the lease is path based.

> Open files can leak permanently due to inconsistent lease update
> 
>
> Key: HDFS-10763
> URL: https://issues.apache.org/jira/browse/HDFS-10763
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.6.4
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-10763.br27.patch, 
> HDFS-10763.branch-2.7.supplement.patch, HDFS-10763.patch
>
>
> This can heppen during {{commitBlockSynchronization()}} or a client gives up 
> on closing a file after retries.
> From {{finalizeINodeFileUnderConstruction()}}, the lease is removed first and 
> then the inode is turned into the closed state. But if any block is not in 
> COMPLETE state, 
> {{INodeFile#assertAllBlocksComplete()}} will throw an exception. This will 
> cause the lease is removed from the lease manager, but not from the inode. 
> Since the lease manager does not have a lease for the file, no lease recovery 
> will happen for this file. Moreover, this broken state is persisted and 
> reconstructed through saving and loading of fsimage. Since no replication is 
> scheduled for the blocks for the file, this can cause a data loss and also 
> block decommissioning of datanode.
> The lease cannot be manually recovered either. It fails with
> {noformat}
> ...AlreadyBeingCreatedException): Failed to RECOVER_LEASE /xyz/xyz for user1 
> on
>  0.0.0.1 because the file is under construction but no leases found.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2950)
> ...
> {noformat}
> When a client retries {{close()}}, the same inconsistent state is created, 
> but it can work in the next time since {{checkLease()}} only looks at the 
> inode, not the lease manager in this case. The close behavior is different if 
> HDFS-8999 is activated by setting 
> {{dfs.namenode.file.close.num-committed-allowed}} to 1 (unlikely) or 2 
> (never). 
> In principle, the under-construction feature of an inode and the lease in the 
> lease manager should never go out of sync. The fix involves two parts.
> 1) Prevent inconsistent lease updates. We can achieve this by calling 
> {{removeLease()}} after checking the block state. 
> 2) Avoid reconstructing inconsistent lease states from a fsimage. 1) alone 
> does not correct the existing inconsistencies surviving through fsimages.  
> This can be done during fsimage loading time by making sure a corresponding 
> lease exists for each inode that are with the underconstruction feature. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10763) Open files can leak permanently due to inconsistent lease update

2016-08-15 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-10763:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha2
   2.7.4
   Status: Resolved  (was: Patch Available)

Thanks for the review Daryn. I've committed this to trunk through branch-2.7. 
[~ctrezzo], do you want this in 2.6.? 

> Open files can leak permanently due to inconsistent lease update
> 
>
> Key: HDFS-10763
> URL: https://issues.apache.org/jira/browse/HDFS-10763
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.6.4
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-10763.br27.patch, HDFS-10763.patch
>
>
> This can heppen during {{commitBlockSynchronization()}} or a client gives up 
> on closing a file after retries.
> From {{finalizeINodeFileUnderConstruction()}}, the lease is removed first and 
> then the inode is turned into the closed state. But if any block is not in 
> COMPLETE state, 
> {{INodeFile#assertAllBlocksComplete()}} will throw an exception. This will 
> cause the lease is removed from the lease manager, but not from the inode. 
> Since the lease manager does not have a lease for the file, no lease recovery 
> will happen for this file. Moreover, this broken state is persisted and 
> reconstructed through saving and loading of fsimage. Since no replication is 
> scheduled for the blocks for the file, this can cause a data loss and also 
> block decommissioning of datanode.
> The lease cannot be manually recovered either. It fails with
> {noformat}
> ...AlreadyBeingCreatedException): Failed to RECOVER_LEASE /xyz/xyz for user1 
> on
>  0.0.0.1 because the file is under construction but no leases found.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2950)
> ...
> {noformat}
> When a client retries {{close()}}, the same inconsistent state is created, 
> but it can work in the next time since {{checkLease()}} only looks at the 
> inode, not the lease manager in this case. The close behavior is different if 
> HDFS-8999 is activated by setting 
> {{dfs.namenode.file.close.num-committed-allowed}} to 1 (unlikely) or 2 
> (never). 
> In principle, the under-construction feature of an inode and the lease in the 
> lease manager should never go out of sync. The fix involves two parts.
> 1) Prevent inconsistent lease updates. We can achieve this by calling 
> {{removeLease()}} after checking the block state. 
> 2) Avoid reconstructing inconsistent lease states from a fsimage. 1) alone 
> does not correct the existing inconsistencies surviving through fsimages.  
> This can be done during fsimage loading time by making sure a corresponding 
> lease exists for each inode that are with the underconstruction feature. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10763) Open files can leak permanently due to inconsistent lease update

2016-08-15 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-10763:
--
Status: Patch Available  (was: Open)

> Open files can leak permanently due to inconsistent lease update
> 
>
> Key: HDFS-10763
> URL: https://issues.apache.org/jira/browse/HDFS-10763
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.4, 2.7.3
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-10763.br27.patch, HDFS-10763.patch
>
>
> This can heppen during {{commitBlockSynchronization()}} or a client gives up 
> on closing a file after retries.
> From {{finalizeINodeFileUnderConstruction()}}, the lease is removed first and 
> then the inode is turned into the closed state. But if any block is not in 
> COMPLETE state, 
> {{INodeFile#assertAllBlocksComplete()}} will throw an exception. This will 
> cause the lease is removed from the lease manager, but not from the inode. 
> Since the lease manager does not have a lease for the file, no lease recovery 
> will happen for this file. Moreover, this broken state is persisted and 
> reconstructed through saving and loading of fsimage. Since no replication is 
> scheduled for the blocks for the file, this can cause a data loss and also 
> block decommissioning of datanode.
> The lease cannot be manually recovered either. It fails with
> {noformat}
> ...AlreadyBeingCreatedException): Failed to RECOVER_LEASE /xyz/xyz for user1 
> on
>  0.0.0.1 because the file is under construction but no leases found.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2950)
> ...
> {noformat}
> When a client retries {{close()}}, the same inconsistent state is created, 
> but it can work in the next time since {{checkLease()}} only looks at the 
> inode, not the lease manager in this case. The close behavior is different if 
> HDFS-8999 is activated by setting 
> {{dfs.namenode.file.close.num-committed-allowed}} to 1 (unlikely) or 2 
> (never). 
> In principle, the under-construction feature of an inode and the lease in the 
> lease manager should never go out of sync. The fix involves two parts.
> 1) Prevent inconsistent lease updates. We can achieve this by calling 
> {{removeLease()}} after checking the block state. 
> 2) Avoid reconstructing inconsistent lease states from a fsimage. 1) alone 
> does not correct the existing inconsistencies surviving through fsimages.  
> This can be done during fsimage loading time by making sure a corresponding 
> lease exists for each inode that are with the underconstruction feature. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10763) Open files can leak permanently due to inconsistent lease update

2016-08-15 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-10763:
--
Attachment: HDFS-10763.patch
HDFS-10763.br27.patch

> Open files can leak permanently due to inconsistent lease update
> 
>
> Key: HDFS-10763
> URL: https://issues.apache.org/jira/browse/HDFS-10763
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.6.4
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-10763.br27.patch, HDFS-10763.patch
>
>
> This can heppen during {{commitBlockSynchronization()}} or a client gives up 
> on closing a file after retries.
> From {{finalizeINodeFileUnderConstruction()}}, the lease is removed first and 
> then the inode is turned into the closed state. But if any block is not in 
> COMPLETE state, 
> {{INodeFile#assertAllBlocksComplete()}} will throw an exception. This will 
> cause the lease is removed from the lease manager, but not from the inode. 
> Since the lease manager does not have a lease for the file, no lease recovery 
> will happen for this file. Moreover, this broken state is persisted and 
> reconstructed through saving and loading of fsimage. Since no replication is 
> scheduled for the blocks for the file, this can cause a data loss and also 
> block decommissioning of datanode.
> The lease cannot be manually recovered either. It fails with
> {noformat}
> ...AlreadyBeingCreatedException): Failed to RECOVER_LEASE /xyz/xyz for user1 
> on
>  0.0.0.1 because the file is under construction but no leases found.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2950)
> ...
> {noformat}
> When a client retries {{close()}}, the same inconsistent state is created, 
> but it can work in the next time since {{checkLease()}} only looks at the 
> inode, not the lease manager in this case. The close behavior is different if 
> HDFS-8999 is activated by setting 
> {{dfs.namenode.file.close.num-committed-allowed}} to 1 (unlikely) or 2 
> (never). 
> In principle, the under-construction feature of an inode and the lease in the 
> lease manager should never go out of sync. The fix involves two parts.
> 1) Prevent inconsistent lease updates. We can achieve this by calling 
> {{removeLease()}} after checking the block state. 
> 2) Avoid reconstructing inconsistent lease states from a fsimage. 1) alone 
> does not correct the existing inconsistencies surviving through fsimages.  
> This can be done during fsimage loading time by making sure a corresponding 
> lease exists for each inode that are with the underconstruction feature. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org