[jira] [Commented] (HDFS-11945) Internal lease recovery may not be retried for a long time

2017-06-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044490#comment-16044490
 ] 

Kihwal Lee commented on HDFS-11945:
---

Thanks, [~liuml07]!

> Internal lease recovery may not be retried for a long time
> --
>
> Key: HDFS-11945
> URL: https://issues.apache.org/jira/browse/HDFS-11945
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Fix For: 3.0.0-alpha4, 2.8.2
>
> Attachments: HDFS-11945.branch-2.v2.patch, HDFS-11945.trunk.patch, 
> HDFS-11945.trunk.v2.patch
>
>
> Lease is assigned per client who is identified by its holder ID or client ID, 
> thus a renewal or an expiration of a lease affects all files being written by 
> the client.
> When a client/writer dies without closing a file, its lease expires in one 
> hour (hard limit) and the namenode tries to recover the lease. As a part of 
> the process, the namenode takes the ownership of the lease and renews it. If 
> the recovery does not finish successfully, the lease will expire in one hour 
> and the namenode will try again to recover the lease.
> However, if a file system has another lease expiring within the hour, the 
> recovery attempt for the lease will push forward the expiration of the lease 
> held by the namenode.  This causes failed lease recoveries to be not retried 
> for a long time. We have seen it happening for days.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11945) Internal lease recovery may not be retried for a long time

2017-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043523#comment-16043523
 ] 

Hudson commented on HDFS-11945:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11846 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11846/])
HDFS-11945. Internal lease recovery may not be retried for a long time. 
(liuml07: rev 1a33c9d58927186c2f219a5ecb5f1573801823ad)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery2.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestLeaseManager.java


> Internal lease recovery may not be retried for a long time
> --
>
> Key: HDFS-11945
> URL: https://issues.apache.org/jira/browse/HDFS-11945
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Fix For: 3.0.0-alpha4, 2.8.2
>
> Attachments: HDFS-11945.branch-2.v2.patch, HDFS-11945.trunk.patch, 
> HDFS-11945.trunk.v2.patch
>
>
> Lease is assigned per client who is identified by its holder ID or client ID, 
> thus a renewal or an expiration of a lease affects all files being written by 
> the client.
> When a client/writer dies without closing a file, its lease expires in one 
> hour (hard limit) and the namenode tries to recover the lease. As a part of 
> the process, the namenode takes the ownership of the lease and renews it. If 
> the recovery does not finish successfully, the lease will expire in one hour 
> and the namenode will try again to recover the lease.
> However, if a file system has another lease expiring within the hour, the 
> recovery attempt for the lease will push forward the expiration of the lease 
> held by the namenode.  This causes failed lease recoveries to be not retried 
> for a long time. We have seen it happening for days.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11945) Internal lease recovery may not be retried for a long time

2017-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043281#comment-16043281
 ] 

Hadoop QA commented on HDFS-11945:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 
0 new + 75 unchanged - 2 fixed = 75 total (was 77) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m  1s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 91m 48s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.ha.TestBootstrapStandbyWithQJM |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-11945 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12872107/HDFS-11945.trunk.v2.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux aba8a116f711 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / a062374 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19839/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19839/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19839/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Internal lease recovery may not be retried for a long time
> --
>
> Key: HDFS-11945
> URL: https://issues.apache.org/jira/browse/HDFS-11945
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-11945.branch-2.v2.patch, HDFS-11945.trunk.patch, 
> HDFS-11945.trunk.v2.patch
>
>

[jira] [Commented] (HDFS-11945) Internal lease recovery may not be retried for a long time

2017-06-07 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041561#comment-16041561
 ] 

Mingliang Liu commented on HDFS-11945:
--

I'm +1 on the patch.

Minor comments:
# The {{internalLeaseHolder}} value to be concatenated by _ instead of space
# The last test statement:
{code}
assertFalse(holder.equals(lm.getInternalLeaseHolder()));
{code}
Better to use:
{code}
assertNotEquals("some meaningful message", holder, lm.getInternalLeaseHolder());
{code}

> Internal lease recovery may not be retried for a long time
> --
>
> Key: HDFS-11945
> URL: https://issues.apache.org/jira/browse/HDFS-11945
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-11945.trunk.patch
>
>
> Lease is assigned per client who is identified by its holder ID or client ID, 
> thus a renewal or an expiration of a lease affects all files being written by 
> the client.
> When a client/writer dies without closing a file, its lease expires in one 
> hour (hard limit) and the namenode tries to recover the lease. As a part of 
> the process, the namenode takes the ownership of the lease and renews it. If 
> the recovery does not finish successfully, the lease will expire in one hour 
> and the namenode will try again to recover the lease.
> However, if a file system has another lease expiring within the hour, the 
> recovery attempt for the lease will push forward the expiration of the lease 
> held by the namenode.  This causes failed lease recoveries to be not retried 
> for a long time. We have seen it happening for days.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11945) Internal lease recovery may not be retried for a long time

2017-06-07 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041535#comment-16041535
 ] 

Kihwal Lee commented on HDFS-11945:
---

The failed tests all pass when I run them.
{noformat}
---
 T E S T S
---
Running 
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 88.762 sec
 - in org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped
Running org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure010
Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 170.511 sec
 - in org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure010
Running org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080
Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 106.453 sec
 - in org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080

Results :

Tests run: 32, Failures: 0, Errors: 0, Skipped: 0
{noformat}

> Internal lease recovery may not be retried for a long time
> --
>
> Key: HDFS-11945
> URL: https://issues.apache.org/jira/browse/HDFS-11945
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-11945.trunk.patch
>
>
> Lease is assigned per client who is identified by its holder ID or client ID, 
> thus a renewal or an expiration of a lease affects all files being written by 
> the client.
> When a client/writer dies without closing a file, its lease expires in one 
> hour (hard limit) and the namenode tries to recover the lease. As a part of 
> the process, the namenode takes the ownership of the lease and renews it. If 
> the recovery does not finish successfully, the lease will expire in one hour 
> and the namenode will try again to recover the lease.
> However, if a file system has another lease expiring within the hour, the 
> recovery attempt for the lease will push forward the expiration of the lease 
> held by the namenode.  This causes failed lease recoveries to be not retried 
> for a long time. We have seen it happening for days.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11945) Internal lease recovery may not be retried for a long time

2017-06-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041475#comment-16041475
 ] 

Hadoop QA commented on HDFS-11945:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 
0 new + 75 unchanged - 2 fixed = 75 total (was 77) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 31s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 89m  2s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure010 |
|   | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-11945 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12871868/HDFS-11945.trunk.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 3a0e209ce470 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 
09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 24181f5 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19826/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19826/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19826/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Internal lease recovery may not be retried for a long time
> --
>
> Key: HDFS-11945
> URL: https://issues.apache.org/jira/browse/HDFS-11945
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-11945.trunk.patch
>
>
> Lease 

[jira] [Commented] (HDFS-11945) Internal lease recovery may not be retried for a long time

2017-06-07 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16040927#comment-16040927
 ] 

Kihwal Lee commented on HDFS-11945:
---

We could change the namenode lease holder ID every hour.  Normally there will 
be only a brief moment of two being active in the system. Multiple ones can be 
active If there are failures. If the ID is suffixed by time stamp or date 
string, the log message for recovery will show how old the leases are.

The major cause of lease recovery failures is datanodes having problems during 
block recoveries. One interesting case is when the namenode throws "server too 
busy" to datanodes. A {{commitBlockSynchronization()}} call can fail for this 
reason and won't be retried.

> Internal lease recovery may not be retried for a long time
> --
>
> Key: HDFS-11945
> URL: https://issues.apache.org/jira/browse/HDFS-11945
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>
> Lease is assigned per client who is identified by its holder ID or client ID, 
> thus a renewal or an expiration of a lease affects all files being written by 
> the client.
> When a client/writer dies without closing a file, its lease expires in one 
> hour (hard limit) and the namenode tries to recover the lease. As a part of 
> the process, the namenode takes the ownership of the lease and renews it. If 
> the recovery does not finish successfully, the lease will expire in one hour 
> and the namenode will try again to recover the lease.
> However, if a file system has another lease expiring within the hour, the 
> recovery attempt for the lease will push forward the expiration of the lease 
> held by the namenode.  This causes failed lease recoveries to be not retried 
> for a long time. We have seen it happening for days.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org