[jira] [Commented] (HDFS-9185) TestRecoverStripedFile is failing
[ https://issues.apache.org/jira/browse/HDFS-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940929#comment-14940929 ] Rakesh R commented on HDFS-9185: Note: It looks like test case failures are not related to the patch. [TestRecoverStripedFile|https://builds.apache.org/job/PreCommit-HDFS-Build/12769/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/] case is consistently passing now. > TestRecoverStripedFile is failing > - > > Key: HDFS-9185 > URL: https://issues.apache.org/jira/browse/HDFS-9185 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Critical > Attachments: HDFS-9185-00.patch, HDFS-9185-01.patch > > > Below is the message taken from build: > {code} > Error Message > Time out waiting for EC block recovery. > Stacktrace > java.io.IOException: Time out waiting for EC block recovery. > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168) > {code} > Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9185) TestRecoverStripedFile is failing
[ https://issues.apache.org/jira/browse/HDFS-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941492#comment-14941492 ] Jing Zhao commented on HDFS-9185: - The new patch looks good to me. All the failed tests passed in my local machine. +1. I will commit it shortly. > TestRecoverStripedFile is failing > - > > Key: HDFS-9185 > URL: https://issues.apache.org/jira/browse/HDFS-9185 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Critical > Attachments: HDFS-9185-00.patch, HDFS-9185-01.patch > > > Below is the message taken from build: > {code} > Error Message > Time out waiting for EC block recovery. > Stacktrace > java.io.IOException: Time out waiting for EC block recovery. > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168) > {code} > Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9185) TestRecoverStripedFile is failing
[ https://issues.apache.org/jira/browse/HDFS-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939438#comment-14939438 ] Uma Maheswara Rao G commented on HDFS-9185: --- Thank you Rakesh for reporting it. Changes looked good to me. Lets wait for jenkins to see this test failure fixed. > TestRecoverStripedFile is failing > - > > Key: HDFS-9185 > URL: https://issues.apache.org/jira/browse/HDFS-9185 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Critical > Attachments: HDFS-9185-00.patch > > > Below is the message taken from build: > {code} > Error Message > Time out waiting for EC block recovery. > Stacktrace > java.io.IOException: Time out waiting for EC block recovery. > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168) > {code} > Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9185) TestRecoverStripedFile is failing
[ https://issues.apache.org/jira/browse/HDFS-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939724#comment-14939724 ] Hadoop QA commented on HDFS-9185: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 20m 12s | Pre-patch trunk has 7 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 2s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 54s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 15s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 12s | The applied patch generated 1 new checkstyle issues (total was 288, now 285). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 23s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 9s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 215m 47s | Tests failed in hadoop-hdfs. | | {color:green}+1{color} | hdfs tests | 0m 31s | Tests passed in hadoop-hdfs-client. | | | | 267m 17s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped | | | hadoop.hdfs.TestRollingUpgrade | | | hadoop.hdfs.util.TestByteArrayManager | | | hadoop.hdfs.server.blockmanagement.TestNodeCount | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12764550/HDFS-9185-00.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 5db371f | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/12760/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs-client.html | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/12760/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/12760/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12760/artifact/patchprocess/testrun_hadoop-hdfs.txt | | hadoop-hdfs-client test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12760/artifact/patchprocess/testrun_hadoop-hdfs-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12760/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12760/console | This message was automatically generated. > TestRecoverStripedFile is failing > - > > Key: HDFS-9185 > URL: https://issues.apache.org/jira/browse/HDFS-9185 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Critical > Attachments: HDFS-9185-00.patch > > > Below is the message taken from build: > {code} > Error Message > Time out waiting for EC block recovery. > Stacktrace > java.io.IOException: Time out waiting for EC block recovery. > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168) > {code} > Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9185) TestRecoverStripedFile is failing
[ https://issues.apache.org/jira/browse/HDFS-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940084#comment-14940084 ] Rakesh R commented on HDFS-9185: Note: It seems test case failures are not related to the patch. Also, release audit and checkstyle warning are unrelated. > TestRecoverStripedFile is failing > - > > Key: HDFS-9185 > URL: https://issues.apache.org/jira/browse/HDFS-9185 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Critical > Attachments: HDFS-9185-00.patch > > > Below is the message taken from build: > {code} > Error Message > Time out waiting for EC block recovery. > Stacktrace > java.io.IOException: Time out waiting for EC block recovery. > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168) > {code} > Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9185) TestRecoverStripedFile is failing
[ https://issues.apache.org/jira/browse/HDFS-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940200#comment-14940200 ] Jing Zhao commented on HDFS-9185: - Thanks for working on this, [~rakeshr]. The changes looks good to me. One comment is about the log level change. Changing the log level from debug to warn may generate unnecessary exception trace for DFSStripedInputStream since the failure can be covered by later decoding. So how about we change the log level for the unit test? We can need to add the following code to {{TestRecoverStripedBlocks}}: {code} static { GenericTestUtils.setLogLevel(DFSClient.LOG, Level.ALL); } {code} > TestRecoverStripedFile is failing > - > > Key: HDFS-9185 > URL: https://issues.apache.org/jira/browse/HDFS-9185 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Critical > Attachments: HDFS-9185-00.patch > > > Below is the message taken from build: > {code} > Error Message > Time out waiting for EC block recovery. > Stacktrace > java.io.IOException: Time out waiting for EC block recovery. > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168) > {code} > Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9185) TestRecoverStripedFile is failing
[ https://issues.apache.org/jira/browse/HDFS-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939402#comment-14939402 ] Rakesh R commented on HDFS-9185: Following is my analysis: # ErasureCodingWorker is creating the {{RemoteBlockReader2}} with null {{tracer}}, during the {{RemoteBlockReader2#read}} function call, it is hitting NPE and resulting in the failure. To fix this, how about passing the {{datanode#getTracer()}} to the reader ? {code} ErasureCodingWorker .java return RemoteBlockReader2.newBlockReader( "dummy", block, blockToken, offsetInBlock, block.getNumBytes() - offsetInBlock, true, "", newConnectedPeer(block, dnAddr, blockToken, dnInfo), dnInfo, null, cachingStrategy, null); {code} {code} RemoteBlockReader2.java public synchronized int read(ByteBuffer buf) throws IOException { if (curDataSlice == null || curDataSlice.remaining() == 0 && bytesNeededToFinish > 0) { TraceScope scope = tracer.newScope( "RemoteBlockReader2#readNextPacket(" + blockId + ")"); try { readNextPacket(); } finally { scope.close(); } } {code} # The root cause is not visible in the log messages as StripedBlockUtil#getNextCompletedStripedRead() is logging the exception with {{DEBUG}} level, IMHO the log level has to be changed to {{INFO}} to know the failure reason. {code} if (DFSClient.LOG.isDebugEnabled()) { DFSClient.LOG.debug("ExecutionException " + e); } {code} I'll soon prepare a patch including these changes. > TestRecoverStripedFile is failing > - > > Key: HDFS-9185 > URL: https://issues.apache.org/jira/browse/HDFS-9185 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Critical > > Below is the message taken from build: > {code} > Error Message > Time out waiting for EC block recovery. > Stacktrace > java.io.IOException: Time out waiting for EC block recovery. > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168) > {code} > Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9185) TestRecoverStripedFile is failing
[ https://issues.apache.org/jira/browse/HDFS-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940669#comment-14940669 ] Rakesh R commented on HDFS-9185: Thank you [~umamaheswararao], [~jingzhao] for the review comments. Attached another patch addressing the above comment. Kindly review it again. > TestRecoverStripedFile is failing > - > > Key: HDFS-9185 > URL: https://issues.apache.org/jira/browse/HDFS-9185 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Critical > Attachments: HDFS-9185-00.patch, HDFS-9185-01.patch > > > Below is the message taken from build: > {code} > Error Message > Time out waiting for EC block recovery. > Stacktrace > java.io.IOException: Time out waiting for EC block recovery. > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168) > {code} > Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9185) TestRecoverStripedFile is failing
[ https://issues.apache.org/jira/browse/HDFS-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940766#comment-14940766 ] Hadoop QA commented on HDFS-9185: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 19m 33s | Pre-patch trunk has 7 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 55s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 3s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 15s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 9s | The applied patch generated 1 new checkstyle issues (total was 288, now 285). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 30s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 10s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 181m 4s | Tests failed in hadoop-hdfs. | | {color:green}+1{color} | hdfs tests | 0m 30s | Tests passed in hadoop-hdfs-client. | | | | 232m 3s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.namenode.TestFSNamesystem | | Timed out tests | org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics | | | org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement | | | org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache | | | org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure010 | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12764715/HDFS-9185-01.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / fd026f5 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/12769/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs-client.html | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/12769/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/12769/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12769/artifact/patchprocess/testrun_hadoop-hdfs.txt | | hadoop-hdfs-client test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12769/artifact/patchprocess/testrun_hadoop-hdfs-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12769/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12769/console | This message was automatically generated. > TestRecoverStripedFile is failing > - > > Key: HDFS-9185 > URL: https://issues.apache.org/jira/browse/HDFS-9185 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Critical > Attachments: HDFS-9185-00.patch, HDFS-9185-01.patch > > > Below is the message taken from build: > {code} > Error Message > Time out waiting for EC block recovery. > Stacktrace > java.io.IOException: Time out waiting for EC block recovery. > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168) > {code} > Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758 -- This message was sent by Atlassian JIRA (v6.3.4#6332)