[ 
https://issues.apache.org/jira/browse/HDFS-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16267734#comment-16267734
 ] 

Kihwal Lee edited comment on HDFS-12754 at 11/27/17 10:55 PM:
--------------------------------------------------------------

I ran {{TestLeaseRecovery2}} *without* the patch and two cases failed: 
{{testHardLeaseRecoveryWithRenameAfterNameNodeRestart}} and 
{{testHardLeaseRecoveryAfterNameNodeRestart2}}. When ran individually they 
worked fine, so it must be some kind of interaction involving runtime ordering.

{noformat}
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was 
removed in 8.0
Running org.apache.hadoop.hdfs.TestLeaseRecovery2
Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 68.971 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.TestLeaseRecovery2
testHardLeaseRecoveryAfterNameNodeRestart2(org.apache.hadoop.hdfs.TestLeaseRecovery2)
  Time elapsed: 4.375 sec  <<< FAILURE!
java.lang.AssertionError: lease holder should now be the NN
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at 
org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568)
        at 
org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520)
        at 
org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2(TestLeaseRecovery2.java:437)
testHardLeaseRecoveryWithRenameAfterNameNodeRestart(org.apache.hadoop.hdfs.TestLeaseRecovery2)
  Time elapsed: 4.339 sec  <<< FAILURE!
java.lang.AssertionError: lease holder should now be the NN
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at 
org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568)
        at 
org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520)
        at 
org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart(TestLeaseRecovery2.java:443)
Results :
Failed tests: 
  
TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2:437->hardLeaseRecoveryRestartHelper:520->checkLease:568
 lease holder should now be the NN
  
TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart:443->hardLeaseRecoveryRestartHelper:520->checkLease:568
 lease holder should now be the NN
Tests run: 7, Failures: 2, Errors: 0, Skipped: 0
{noformat}

With the patch it didn't fail. So it seems unrelated to the patch and the 
failures are random.  We will need to harden the test, but that's outside the 
scope of this jira.
{noformat}
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was 
removed in 8.0
Running org.apache.hadoop.hdfs.TestLeaseRecovery2
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 74.511 sec - in 
org.apache.hadoop.hdfs.TestLeaseRecovery2

Results :

Tests run: 7, Failures: 0, Errors: 0, Skipped: 0
{noformat}


was (Author: kihwal):
I ran {{TestLeaseRecovery2}} *without* the patch and two cases failed: 
{{testHardLeaseRecoveryWithRenameAfterNameNodeRestart}} and 
{{testHardLeaseRecoveryAfterNameNodeRestart2}}. When ran individually they 
worked fine, so it must be some kind of interaction involving runtime ordering.

{noformat}
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was 
removed in 8.0
Running org.apache.hadoop.hdfs.TestLeaseRecovery2
Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 68.971 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.TestLeaseRecovery2
testHardLeaseRecoveryAfterNameNodeRestart2(org.apache.hadoop.hdfs.TestLeaseRecovery2)
  Time elapsed: 4.375 sec  <<< FAILURE!
java.lang.AssertionError: lease holder should now be the NN
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at 
org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568)
        at 
org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520)
        at 
org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2(TestLeaseRecovery2.java:437)
testHardLeaseRecoveryWithRenameAfterNameNodeRestart(org.apache.hadoop.hdfs.TestLeaseRecovery2)
  Time elapsed: 4.339 sec  <<< FAILURE!
java.lang.AssertionError: lease holder should now be the NN
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at 
org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568)
        at 
org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520)
        at 
org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart(TestLeaseRecovery2.java:443)
Results :
Failed tests: 
  
TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2:437->hardLeaseRecoveryRestartHelper:520->checkLease:568
 lease holder should now be the NN
  
TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart:443->hardLeaseRecoveryRestartHelper:520->checkLease:568
 lease holder should now be the NN
Tests run: 7, Failures: 2, Errors: 0, Skipped: 0
{noformat}

With the patch it didn't fail. So it seems unrelated to the patch and the 
fialures are random.  We will need to harden the test, but that's outside the 
scope of this jira.
{noformat}
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was 
removed in 8.0
Running org.apache.hadoop.hdfs.TestLeaseRecovery2
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 74.511 sec - in 
org.apache.hadoop.hdfs.TestLeaseRecovery2

Results :

Tests run: 7, Failures: 0, Errors: 0, Skipped: 0
{noformat}

> Lease renewal can hit a deadlock 
> ---------------------------------
>
>                 Key: HDFS-12754
>                 URL: https://issues.apache.org/jira/browse/HDFS-12754
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.8.1
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>             Fix For: 3.0.0, 3.1.0, 2.10.0
>
>         Attachments: HDFS-12754-branch-2.patch, HDFS-12754.001.patch, 
> HDFS-12754.002.patch, HDFS-12754.003.patch, HDFS-12754.004.patch, 
> HDFS-12754.005.patch, HDFS-12754.006.patch, HDFS-12754.007.patch, 
> HDFS-12754.008.patch, HDFS-12754.009.patch
>
>
> The Client and the renewer can hit a deadlock during close operation since 
> closeFile() reaches back to the DFSClient#removeFileBeingWritten. This is 
> possible if the client class close when the renewer is renewing a lease.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to