[jira] [Updated] (HDFS-11817) A faulty node can cause a lease leak and NPE on accessing data

2018-04-19 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-11817:
---
Attachment: HDFS-11817.branch-2.7.001.patch

> A faulty node can cause a lease leak and NPE on accessing data
> --
>
> Key: HDFS-11817
> URL: https://issues.apache.org/jira/browse/HDFS-11817
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2
>
> Attachments: HDFS-11817.branch-2.7.001.patch, 
> HDFS-11817.branch-2.patch, HDFS-11817.v2.branch-2.8.patch, 
> HDFS-11817.v2.branch-2.patch, HDFS-11817.v2.trunk.patch, 
> hdfs-11817_supplement.txt
>
>
> When the namenode performs a lease recovery for a failed write, the 
> {{commitBlockSynchronization()}} will fail, if none of the new target has 
> sent a received-IBR.  At this point, the data is inaccessible, as the 
> namenode will throw a {{NullPointerException}} upon {{getBlockLocations()}}.
> The lease recovery will be retried in about an hour by the namenode. If the 
> nodes are faulty (usually when there is only one new target), they may not 
> block report until this point. If this happens, lease recovery throws an 
> {{AlreadyBeingCreatedException}}, which causes LeaseManager to simply remove 
> the lease without  finalizing the inode.  
> This results in an inconsistent lease state. The inode stays 
> under-construction, but no more lease recovery is attempted. A manual lease 
> recovery is also not allowed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11817) A faulty node can cause a lease leak and NPE on accessing data

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11817:
-
Fix Version/s: 2.9.0

> A faulty node can cause a lease leak and NPE on accessing data
> --
>
> Key: HDFS-11817
> URL: https://issues.apache.org/jira/browse/HDFS-11817
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2
>
> Attachments: HDFS-11817.branch-2.patch, hdfs-11817_supplement.txt, 
> HDFS-11817.v2.branch-2.8.patch, HDFS-11817.v2.branch-2.patch, 
> HDFS-11817.v2.trunk.patch
>
>
> When the namenode performs a lease recovery for a failed write, the 
> {{commitBlockSynchronization()}} will fail, if none of the new target has 
> sent a received-IBR.  At this point, the data is inaccessible, as the 
> namenode will throw a {{NullPointerException}} upon {{getBlockLocations()}}.
> The lease recovery will be retried in about an hour by the namenode. If the 
> nodes are faulty (usually when there is only one new target), they may not 
> block report until this point. If this happens, lease recovery throws an 
> {{AlreadyBeingCreatedException}}, which causes LeaseManager to simply remove 
> the lease without  finalizing the inode.  
> This results in an inconsistent lease state. The inode stays 
> under-construction, but no more lease recovery is attempted. A manual lease 
> recovery is also not allowed. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11817) A faulty node can cause a lease leak and NPE on accessing data

2017-05-25 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11817:
--
Attachment: HDFS-11817.v2.branch-2.8.patch

Attaching what's committed to 2.8 as reference.  The cherry-pick from branch-2 
was clean, but had to update one method call in the test, since the containing 
class has been changed.

> A faulty node can cause a lease leak and NPE on accessing data
> --
>
> Key: HDFS-11817
> URL: https://issues.apache.org/jira/browse/HDFS-11817
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 3.0.0-alpha3, 2.8.2
>
> Attachments: HDFS-11817.branch-2.patch, hdfs-11817_supplement.txt, 
> HDFS-11817.v2.branch-2.8.patch, HDFS-11817.v2.branch-2.patch, 
> HDFS-11817.v2.trunk.patch
>
>
> When the namenode performs a lease recovery for a failed write, the 
> {{commitBlockSynchronization()}} will fail, if none of the new target has 
> sent a received-IBR.  At this point, the data is inaccessible, as the 
> namenode will throw a {{NullPointerException}} upon {{getBlockLocations()}}.
> The lease recovery will be retried in about an hour by the namenode. If the 
> nodes are faulty (usually when there is only one new target), they may not 
> block report until this point. If this happens, lease recovery throws an 
> {{AlreadyBeingCreatedException}}, which causes LeaseManager to simply remove 
> the lease without  finalizing the inode.  
> This results in an inconsistent lease state. The inode stays 
> under-construction, but no more lease recovery is attempted. A manual lease 
> recovery is also not allowed. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11817) A faulty node can cause a lease leak and NPE on accessing data

2017-05-25 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11817:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.2
   3.0.0-alpha3
   Status: Resolved  (was: Patch Available)

Thanks for the review, Daryn.

> A faulty node can cause a lease leak and NPE on accessing data
> --
>
> Key: HDFS-11817
> URL: https://issues.apache.org/jira/browse/HDFS-11817
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 3.0.0-alpha3, 2.8.2
>
> Attachments: HDFS-11817.branch-2.patch, hdfs-11817_supplement.txt, 
> HDFS-11817.v2.branch-2.patch, HDFS-11817.v2.trunk.patch
>
>
> When the namenode performs a lease recovery for a failed write, the 
> {{commitBlockSynchronization()}} will fail, if none of the new target has 
> sent a received-IBR.  At this point, the data is inaccessible, as the 
> namenode will throw a {{NullPointerException}} upon {{getBlockLocations()}}.
> The lease recovery will be retried in about an hour by the namenode. If the 
> nodes are faulty (usually when there is only one new target), they may not 
> block report until this point. If this happens, lease recovery throws an 
> {{AlreadyBeingCreatedException}}, which causes LeaseManager to simply remove 
> the lease without  finalizing the inode.  
> This results in an inconsistent lease state. The inode stays 
> under-construction, but no more lease recovery is attempted. A manual lease 
> recovery is also not allowed. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11817) A faulty node can cause a lease leak and NPE on accessing data

2017-05-22 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11817:
--
Attachment: HDFS-11817.v2.trunk.patch
HDFS-11817.v2.branch-2.patch

> A faulty node can cause a lease leak and NPE on accessing data
> --
>
> Key: HDFS-11817
> URL: https://issues.apache.org/jira/browse/HDFS-11817
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-11817.branch-2.patch, hdfs-11817_supplement.txt, 
> HDFS-11817.v2.branch-2.patch, HDFS-11817.v2.trunk.patch
>
>
> When the namenode performs a lease recovery for a failed write, the 
> {{commitBlockSynchronization()}} will fail, if none of the new target has 
> sent a received-IBR.  At this point, the data is inaccessible, as the 
> namenode will throw a {{NullPointerException}} upon {{getBlockLocations()}}.
> The lease recovery will be retried in about an hour by the namenode. If the 
> nodes are faulty (usually when there is only one new target), they may not 
> block report until this point. If this happens, lease recovery throws an 
> {{AlreadyBeingCreatedException}}, which causes LeaseManager to simply remove 
> the lease without  finalizing the inode.  
> This results in an inconsistent lease state. The inode stays 
> under-construction, but no more lease recovery is attempted. A manual lease 
> recovery is also not allowed. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11817) A faulty node can cause a lease leak and NPE on accessing data

2017-05-19 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11817:
--
Status: Patch Available  (was: Open)

> A faulty node can cause a lease leak and NPE on accessing data
> --
>
> Key: HDFS-11817
> URL: https://issues.apache.org/jira/browse/HDFS-11817
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-11817.branch-2.patch, hdfs-11817_supplement.txt
>
>
> When the namenode performs a lease recovery for a failed write, the 
> {{commitBlockSynchronization()}} will fail, if none of the new target has 
> sent a received-IBR.  At this point, the data is inaccessible, as the 
> namenode will throw a {{NullPointerException}} upon {{getBlockLocations()}}.
> The lease recovery will be retried in about an hour by the namenode. If the 
> nodes are faulty (usually when there is only one new target), they may not 
> block report until this point. If this happens, lease recovery throws an 
> {{AlreadyBeingCreatedException}}, which causes LeaseManager to simply remove 
> the lease without  finalizing the inode.  
> This results in an inconsistent lease state. The inode stays 
> under-construction, but no more lease recovery is attempted. A manual lease 
> recovery is also not allowed. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11817) A faulty node can cause a lease leak and NPE on accessing data

2017-05-19 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11817:
--
Attachment: HDFS-11817.branch-2.patch

I started the patch for branch-2.8 and branch-2.  The trunk version is not 
ready yet, but want to run the branch-2 version through precommit before the 
weekend.

> A faulty node can cause a lease leak and NPE on accessing data
> --
>
> Key: HDFS-11817
> URL: https://issues.apache.org/jira/browse/HDFS-11817
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-11817.branch-2.patch, hdfs-11817_supplement.txt
>
>
> When the namenode performs a lease recovery for a failed write, the 
> {{commitBlockSynchronization()}} will fail, if none of the new target has 
> sent a received-IBR.  At this point, the data is inaccessible, as the 
> namenode will throw a {{NullPointerException}} upon {{getBlockLocations()}}.
> The lease recovery will be retried in about an hour by the namenode. If the 
> nodes are faulty (usually when there is only one new target), they may not 
> block report until this point. If this happens, lease recovery throws an 
> {{AlreadyBeingCreatedException}}, which causes LeaseManager to simply remove 
> the lease without  finalizing the inode.  
> This results in an inconsistent lease state. The inode stays 
> under-construction, but no more lease recovery is attempted. A manual lease 
> recovery is also not allowed. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11817) A faulty node can cause a lease leak and NPE on accessing data

2017-05-18 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11817:
--
Attachment: hdfs-11817_supplement.txt

Attaching supplemental information including stack traces.

> A faulty node can cause a lease leak and NPE on accessing data
> --
>
> Key: HDFS-11817
> URL: https://issues.apache.org/jira/browse/HDFS-11817
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: hdfs-11817_supplement.txt
>
>
> When the namenode performs a lease recovery for a failed write, the 
> {{commitBlockSynchronization()}} will fail, if none of the new target has 
> sent a received-IBR.  At this point, the data is inaccessible, as the 
> namenode will throw a {{NullPointerException}} upon {{getBlockLocations()}}.
> The lease recovery will be retried in about an hour by the namenode. If the 
> nodes are faulty (usually when there is only one new target), they may not 
> block report until this point. If this happens, lease recovery throws an 
> {{AlreadyBeingCreatedException}}, which causes LeaseManager to simply remove 
> the lease without  finalizing the inode.  
> This results in an inconsistent lease state. The inode stays 
> under-construction, but no more lease recovery is attempted. A manual lease 
> recovery is also not allowed. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org