[ 
https://issues.apache.org/jira/browse/HDFS-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939402#comment-14939402
 ] 

Rakesh R commented on HDFS-9185:
--------------------------------

Following is my analysis:

# ErasureCodingWorker is creating the {{RemoteBlockReader2}} with null 
{{tracer}}, during the {{RemoteBlockReader2#read}} function call, it is hitting 
NPE and resulting in the failure. To fix this, how about passing the 
{{datanode#getTracer()}} to the reader ?
{code}
ErasureCodingWorker .java

        return RemoteBlockReader2.newBlockReader(
            "dummy", block, blockToken, offsetInBlock, 
            block.getNumBytes() - offsetInBlock, true,
            "", newConnectedPeer(block, dnAddr, blockToken, dnInfo), dnInfo,
            null, cachingStrategy, null);
{code}
{code}
RemoteBlockReader2.java

  public synchronized int read(ByteBuffer buf) throws IOException {
    if (curDataSlice == null || curDataSlice.remaining() == 0 && 
bytesNeededToFinish > 0) {
      TraceScope scope = tracer.newScope(
          "RemoteBlockReader2#readNextPacket(" + blockId + ")");
      try {
        readNextPacket();
      } finally {
        scope.close();
      }
    }
{code}
# The root cause is not visible in the log messages as 
StripedBlockUtil#getNextCompletedStripedRead() is logging the exception with 
{{DEBUG}} level, IMHO the log level has to be changed to {{INFO}}  to know the 
failure reason.
{code}
if (DFSClient.LOG.isDebugEnabled()) {
        DFSClient.LOG.debug("ExecutionException " + e);
      }
{code}

I'll soon prepare a patch including these changes.

> TestRecoverStripedFile is failing
> ---------------------------------
>
>                 Key: HDFS-9185
>                 URL: https://issues.apache.org/jira/browse/HDFS-9185
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: erasure-coding
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>            Priority: Critical
>
> Below is the message taken from build:
> {code}
> Error Message
> Time out waiting for EC block recovery.
> Stacktrace
> java.io.IOException: Time out waiting for EC block recovery.
>       at 
> org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383)
>       at 
> org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283)
>       at 
> org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168)
> {code}
> Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to