[ 
https://issues.apache.org/jira/browse/HDFS-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367012#comment-15367012
 ] 

Yongjun Zhang commented on HDFS-10512:
--------------------------------------

Thanks [~jojochuang] and [~linyiqun] for working on the issue and [~ajisakaa] 
for the review.

I looked at the patch, and have one question:

The patch changed from 
{code}
        datanode.reportBadBlocks(new ExtendedBlock(bpid, corruptBlock));
{code}
to
{code}
        datanode.reportBadBlocks(new ExtendedBlock(bpid, memBlockInfo),
            memBlockInfo.getVolume());
{code}

where the second parameter of constructor {{ExtendedBlock}} was changed from 
{{corruptBlock}} to {{memBlockInfo}}. As we know, the block size recorded in 
{{corruptBlock}} and {{memBlockInfo}} are different per the following code:

{code}
      // Compare block size
      if (memBlockInfo.getNumBytes() != memFile.length()) {
        // Update the length based on the block file
        corruptBlock = new Block(memBlockInfo);
        LOG.warn("Updating size of block " + blockId + " from "
            + memBlockInfo.getNumBytes() + " to " + memFile.length());
        memBlockInfo.setNumBytes(memFile.length());
      }
{code}

When reporting the bad block, do we intend to report the new length or the old 
length back to NN (the old code reported the old length, the patch reported the 
new length)?

I can see this might not be a real issue, just want to point it out, Is it 
intended change? I guess passing either  {{corruptBlock}} or {{memBlockInfo}} 
to  the second parameter of constructor {{ExtendedBlock}}  is fine.

Would you guys please comment?

Other than that, the patch looks good to me.

Thanks.



> VolumeScanner may terminate due to NPE in DataNode.reportBadBlocks
> ------------------------------------------------------------------
>
>                 Key: HDFS-10512
>                 URL: https://issues.apache.org/jira/browse/HDFS-10512
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Wei-Chiu Chuang
>            Assignee: Yiqun Lin
>         Attachments: HDFS-10512.001.patch, HDFS-10512.002.patch, 
> HDFS-10512.004.patch, HDFS-10512.005.patch
>
>
> VolumeScanner may terminate due to unexpected NullPointerException thrown in 
> {{DataNode.reportBadBlocks()}}. This is different from HDFS-8850/HDFS-9190
> I observed this bug in a production CDH 5.5.1 cluster and the same bug still 
> persist in upstream trunk.
> {noformat}
> 2016-04-07 20:30:53,830 WARN 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad 
> BP-1800173197-10.204.68.5-1444425156296:blk_1170134484_96468685 on /dfs/dn
> 2016-04-07 20:30:53,831 ERROR 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting because of exception
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.reportBadBlocks(DataNode.java:1018)
>         at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner$ScanResultHandler.handle(VolumeScanner.java:287)
>         at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.scanBlock(VolumeScanner.java:443)
>         at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:547)
>         at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:621)
> 2016-04-07 20:30:53,832 INFO 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting.
> {noformat}
> I think the NPE comes from the volume variable in the following code snippet. 
> Somehow the volume scanner know the volume, but the datanode can not lookup 
> the volume using the block.
> {code}
> public void reportBadBlocks(ExtendedBlock block) throws IOException{
>     BPOfferService bpos = getBPOSForBlock(block);
>     FsVolumeSpi volume = getFSDataset().getVolume(block);
>     bpos.reportBadBlocks(
>         block, volume.getStorageID(), volume.getStorageType());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to