[ 
https://issues.apache.org/jira/browse/HDFS-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elmer J Fudd updated HDFS-17590:
--------------------------------
    Description: 
While reading blocks of data using `DFSInputStream` in `createBlockReader`, a 
`IOException` originating from `getBlockAt()` that triggers a retry iteration 
results in a `NullPointerException` when passing `dnInfo` to 
`addToLocalDeadNodes` in the catch block.

This is the relevant callstack portion from our logs (from 3.4.0, but we still 
experience this with versions going into 3.4.1 as recently as late June):
{noformat}
...
java.lang.NullPointerException 

at 
java.base/java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
 
at java.base/ 
java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006) 
at 
org.apache.hadoop.hdfs.DFSInputStream.addToLocalDeadNodes(DFSInputStream.java:184)
 
at 
org.apache.hadoop.hdfs.DFSStripedInputStream.createBlockReader(DFSStripedInputStream.java:279)
 
at org.apache.hadoop.hdfs.StripeReader.readChunk(StripeReader.java:304) 
at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:335) 
at 
org.apache.hadoop.hdfs.DFSStripedInputStream.fetchBlockByteRange(DFSStripedInputStream.java:504)
 
at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1472) 
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1436) 
at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:124) 
at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:119)
...{noformat}
What we observe is that `getBlockAt()` thorws an `IOException` 
[here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L478]
{code:java}
//check offset 
if (offset < 0 || offset >= getFileLength()) { 
  throw new IOException("offset < 0 || offset >= getFileLength(), offset=" 
      + offset 
      + ", locatedBlocks=" + locatedBlocks); 
} 
{code}
This is eventually caught in `createBlockReader`. The catch block attempts to 
handle the error and, as part of the error handling, invokes the 
`addToLocalDeadNodes` method. The `dnInfo` object passed to this method is 
actually `NULL` as it wasn't fully allocated 
([here)|https://github.com/apache/hadoop/blob/4525c7e35ea22d7a6350b8af10eb8d2ff68376e7/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSStripedInputStream.java#L247],
 which results in a `NullPointerException`. 

To sum up, this is the failure path according to the logs:
 # `IOException` is thrown in `getBlockAt` 
([code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L479])
 # The exception propagates to `getBlockGroupAt` 
([code|https://github.com/apache/hadoop/blob/4525c7e35ea22d7a6350b8af10eb8d2ff68376e7/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSStripedInputStream.java#L476])
 # It further propagates to `refreshLocatedBlock` 
([code)|https://github.com/apache/hadoop/blob/4525c7e35ea22d7a6350b8af10eb8d2ff68376e7/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSStripedInputStream.java#L459]
 # `IOException` caught in `createBlockReader` 
([code|https://github.com/apache/hadoop/blob/4525c7e35ea22d7a6350b8af10eb8d2ff68376e7/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSStripedInputStream.java#L247])
 # Error handling in the catch block of `createBlockReader` invokes 
`addToLocalDeadNodes` 
([code|https://github.com/apache/hadoop/blob/4525c7e35ea22d7a6350b8af10eb8d2ff68376e7/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSStripedInputStream.java#L281])
 # Execution throws `NullPointerException` since `dnInfo` is NULL

A simple fix as a `NULL` check before calling `addToLocalDeadNodes`, and 
similarly adjusting the log messages in the `catch` block where `dnInfo` is 
dereferenced, should solve the issue.

  was:
While reading blocks of data using `DFSInputStream` in `createBlockReader`, a 
`IOException` originating from `getBlockAt()` that triggers a retry iteration 
results in a `NullPointerException` when passing `dnInfo` to 
`addToLocalDeadNodes` in the catch block.

This is the relevant callstack portion from our logs (from 3.4.0, but we still 
experience this with versions going into 3.4.1 as recently as late June):
{noformat}
...
java.lang.NullPointerException 

at 
java.base/java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
 
at java.base/ 
java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006) 
at 
org.apache.hadoop.hdfs.DFSInputStream.addToLocalDeadNodes(DFSInputStream.java:184)
 
at 
org.apache.hadoop.hdfs.DFSStripedInputStream.createBlockReader(DFSStripedInputStream.java:279)
 
at org.apache.hadoop.hdfs.StripeReader.readChunk(StripeReader.java:304) 
at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:335) 
at 
org.apache.hadoop.hdfs.DFSStripedInputStream.fetchBlockByteRange(DFSStripedInputStream.java:504)
 
at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1472) 
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1436) 
at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:124) 
at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:119)
...{noformat}
What we observe is that `getBlockAt()` thorws an `IOException` 
[here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L478]
{code:java}
//check offset 
if (offset < 0 || offset >= getFileLength()) { 
  throw new IOException("offset < 0 || offset >= getFileLength(), offset=" 
      + offset 
      + ", locatedBlocks=" + locatedBlocks); 
} 
{code}
This is eventually caught in `createBlockReader`. The catch block attempts to 
handle the error and, as part of the error handling, invokes the 
`addToLocalDeadNodes` method. The `dnInfo` object passed to this method is 
actually `NULL` as it wasn't fully allocated 
([here)|https://github.com/apache/hadoop/blob/4525c7e35ea22d7a6350b8af10eb8d2ff68376e7/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSStripedInputStream.java#L247],
 which results in a `NullPointerException`. 

To sum up, this is the failure path according to the logs:
 # `IOException` is thrown in `getBlockAt` 
([code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L479])
 # The exception propagates to `getBlockGroupAt` 
([code|https://github.com/apache/hadoop/blob/4525c7e35ea22d7a6350b8af10eb8d2ff68376e7/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSStripedInputStream.java#L476])
 # It further propagates to `refreshLocatedBlock` 
([code)|https://github.com/apache/hadoop/blob/4525c7e35ea22d7a6350b8af10eb8d2ff68376e7/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSStripedInputStream.java#L459]
 # `IOException` caught in `createBlockReader` 
([code|https://github.com/apache/hadoop/blob/4525c7e35ea22d7a6350b8af10eb8d2ff68376e7/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSStripedInputStream.java#L247])
 # Error handling in the catch block of `createBlockReader` invokes 
`addToLocalDeadNodes` 
([code|https://github.com/apache/hadoop/blob/4525c7e35ea22d7a6350b8af10eb8d2ff68376e7/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSStripedInputStream.java#L281])
 # Execution throws `NullPointerException` since `dnInfo` is NULL

A simple fix as a `NULL` check before calling `addToLocalDeadNodes`, and 
similarly adjusting the log messages in the `catch` block, should solve the 
issue.


> `NullPointerException` triggered in `createBlockReader` during retry iteration
> ------------------------------------------------------------------------------
>
>                 Key: HDFS-17590
>                 URL: https://issues.apache.org/jira/browse/HDFS-17590
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.4.0
>            Reporter: Elmer J Fudd
>            Priority: Critical
>
> While reading blocks of data using `DFSInputStream` in `createBlockReader`, a 
> `IOException` originating from `getBlockAt()` that triggers a retry iteration 
> results in a `NullPointerException` when passing `dnInfo` to 
> `addToLocalDeadNodes` in the catch block.
> This is the relevant callstack portion from our logs (from 3.4.0, but we 
> still experience this with versions going into 3.4.1 as recently as late 
> June):
> {noformat}
> ...
> java.lang.NullPointerException 
> at 
> java.base/java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
>  
> at java.base/ 
> java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006) 
> at 
> org.apache.hadoop.hdfs.DFSInputStream.addToLocalDeadNodes(DFSInputStream.java:184)
>  
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.createBlockReader(DFSStripedInputStream.java:279)
>  
> at org.apache.hadoop.hdfs.StripeReader.readChunk(StripeReader.java:304) 
> at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:335) 
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.fetchBlockByteRange(DFSStripedInputStream.java:504)
>  
> at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1472) 
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1436) 
> at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:124) 
> at 
> org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:119)
> ...{noformat}
> What we observe is that `getBlockAt()` thorws an `IOException` 
> [here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L478]
> {code:java}
> //check offset 
> if (offset < 0 || offset >= getFileLength()) { 
>   throw new IOException("offset < 0 || offset >= getFileLength(), offset=" 
>       + offset 
>       + ", locatedBlocks=" + locatedBlocks); 
> } 
> {code}
> This is eventually caught in `createBlockReader`. The catch block attempts to 
> handle the error and, as part of the error handling, invokes the 
> `addToLocalDeadNodes` method. The `dnInfo` object passed to this method is 
> actually `NULL` as it wasn't fully allocated 
> ([here)|https://github.com/apache/hadoop/blob/4525c7e35ea22d7a6350b8af10eb8d2ff68376e7/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSStripedInputStream.java#L247],
>  which results in a `NullPointerException`. 
> To sum up, this is the failure path according to the logs:
>  # `IOException` is thrown in `getBlockAt` 
> ([code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L479])
>  # The exception propagates to `getBlockGroupAt` 
> ([code|https://github.com/apache/hadoop/blob/4525c7e35ea22d7a6350b8af10eb8d2ff68376e7/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSStripedInputStream.java#L476])
>  # It further propagates to `refreshLocatedBlock` 
> ([code)|https://github.com/apache/hadoop/blob/4525c7e35ea22d7a6350b8af10eb8d2ff68376e7/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSStripedInputStream.java#L459]
>  # `IOException` caught in `createBlockReader` 
> ([code|https://github.com/apache/hadoop/blob/4525c7e35ea22d7a6350b8af10eb8d2ff68376e7/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSStripedInputStream.java#L247])
>  # Error handling in the catch block of `createBlockReader` invokes 
> `addToLocalDeadNodes` 
> ([code|https://github.com/apache/hadoop/blob/4525c7e35ea22d7a6350b8af10eb8d2ff68376e7/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSStripedInputStream.java#L281])
>  # Execution throws `NullPointerException` since `dnInfo` is NULL
> A simple fix as a `NULL` check before calling `addToLocalDeadNodes`, and 
> similarly adjusting the log messages in the `catch` block where `dnInfo` is 
> dereferenced, should solve the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to