[ https://issues.apache.org/jira/browse/HDFS-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939402#comment-14939402 ]
Rakesh R commented on HDFS-9185: -------------------------------- Following is my analysis: # ErasureCodingWorker is creating the {{RemoteBlockReader2}} with null {{tracer}}, during the {{RemoteBlockReader2#read}} function call, it is hitting NPE and resulting in the failure. To fix this, how about passing the {{datanode#getTracer()}} to the reader ? {code} ErasureCodingWorker .java return RemoteBlockReader2.newBlockReader( "dummy", block, blockToken, offsetInBlock, block.getNumBytes() - offsetInBlock, true, "", newConnectedPeer(block, dnAddr, blockToken, dnInfo), dnInfo, null, cachingStrategy, null); {code} {code} RemoteBlockReader2.java public synchronized int read(ByteBuffer buf) throws IOException { if (curDataSlice == null || curDataSlice.remaining() == 0 && bytesNeededToFinish > 0) { TraceScope scope = tracer.newScope( "RemoteBlockReader2#readNextPacket(" + blockId + ")"); try { readNextPacket(); } finally { scope.close(); } } {code} # The root cause is not visible in the log messages as StripedBlockUtil#getNextCompletedStripedRead() is logging the exception with {{DEBUG}} level, IMHO the log level has to be changed to {{INFO}} to know the failure reason. {code} if (DFSClient.LOG.isDebugEnabled()) { DFSClient.LOG.debug("ExecutionException " + e); } {code} I'll soon prepare a patch including these changes. > TestRecoverStripedFile is failing > --------------------------------- > > Key: HDFS-9185 > URL: https://issues.apache.org/jira/browse/HDFS-9185 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding > Reporter: Rakesh R > Assignee: Rakesh R > Priority: Critical > > Below is the message taken from build: > {code} > Error Message > Time out waiting for EC block recovery. > Stacktrace > java.io.IOException: Time out waiting for EC block recovery. > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168) > {code} > Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758 -- This message was sent by Atlassian JIRA (v6.3.4#6332)