[ 
https://issues.apache.org/jira/browse/HDFS-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5412:
--------------------------------

    Description: 
The directory scanner periodically compiles a report of differences between the 
datanode's on-disk and in-memory state.

The code to generate the reports is in {{DirectoryScanner#scan}} and 
{{DirectoryScanner#getDiskReport}}. It looks like the volume field in 
{{ScanInfo}} is not correctly initialized while compiling the diffs. This was 
not an issue before but now we depend on the volume information being present. 
The bug triggers the following NPE during a scan if a block is present in the 
Datanode's in-memory block map but missing on disk:

{code}
java.lang.NullPointerException
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkAndUpdate(FsDatasetImpl.java:1404)
        at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:416)
        at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:365)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
        at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
        at java.lang.Thread.run(Thread.java:695)
{code}

Another NPE exposed in {{BlockManager#reportDiff}}

{code}
java.lang.NullPointerException: null
        at java.util.TreeMap.getEntry(TreeMap.java:324)
        at java.util.TreeMap.remove(TreeMap.java:580)
        at java.util.TreeSet.remove(TreeSet.java:259)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1836)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1709)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1637)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:978)
        at org.apache.hadoop.
{code}

  was:
The directory scanner periodically compiles a report of differences between the 
datanode's on-disk and in-memory state.

The code to generate the reports is in {{DirectoryScanner#scan}} and 
{{DirectoryScanner#getDiskReport}}. It looks like the volume field in 
{{ScanInfo}} is not correctly initialized while compiling the diffs. This was 
not an issue before but now we depend on the volume information being present. 
The bug triggers the following NPE during a scan if a block is present in the 
Datanode's in-memory block map but missing on disk:

{code}
java.lang.NullPointerException
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkAndUpdate(FsDatasetImpl.java:1404)
        at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:416)
        at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:365)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
        at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
        at java.lang.Thread.run(Thread.java:695)
{code}


> Fix NPEs in BlockManager and DirectoryScanner
> ---------------------------------------------
>
>                 Key: HDFS-5412
>                 URL: https://issues.apache.org/jira/browse/HDFS-5412
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>         Attachments: h5412.02.patch, h5412.03.patch
>
>
> The directory scanner periodically compiles a report of differences between 
> the datanode's on-disk and in-memory state.
> The code to generate the reports is in {{DirectoryScanner#scan}} and 
> {{DirectoryScanner#getDiskReport}}. It looks like the volume field in 
> {{ScanInfo}} is not correctly initialized while compiling the diffs. This was 
> not an issue before but now we depend on the volume information being 
> present. The bug triggers the following NPE during a scan if a block is 
> present in the Datanode's in-memory block map but missing on disk:
> {code}
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkAndUpdate(FsDatasetImpl.java:1404)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:416)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:365)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>       at 
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>       at java.lang.Thread.run(Thread.java:695)
> {code}
> Another NPE exposed in {{BlockManager#reportDiff}}
> {code}
> java.lang.NullPointerException: null
>         at java.util.TreeMap.getEntry(TreeMap.java:324)
>         at java.util.TreeMap.remove(TreeMap.java:580)
>         at java.util.TreeSet.remove(TreeSet.java:259)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1836)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1709)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1637)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:978)
>         at org.apache.hadoop.
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to