[jira] [Commented] (HDFS-17332) DFSInputStream: avoid logging stacktrace until when we really need to fail a read request with a MissingBlockException

ASF GitHub Bot (Jira) Fri, 12 Jan 2024 14:05:05 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-17332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806216#comment-17806216
 ]


ASF GitHub Bot commented on HDFS-17332:
---------------------------------------

li-leyang commented on code in PR #6446:
URL: https://github.com/apache/hadoop/pull/6446#discussion_r1450967849


##########
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java:
##########
@@ -1494,19 +1533,39 @@ private int pread(long position, ByteBuffer buffer)
       try {
         if (dfsClient.isHedgedReadsEnabled() && !blk.isStriped()) {
           hedgedFetchBlockByteRange(blk, targetStart,
-              targetEnd, buffer, corruptedBlocks);
+              targetEnd, buffer, corruptedBlocks, ioExceptionMap);
         } else {
           fetchBlockByteRange(blk, targetStart, targetEnd,
-              buffer, corruptedBlocks);
+              buffer, corruptedBlocks, ioExceptionMap);
         }
-      } finally {
+      } catch (IOException e) {
+        // When we reach here, it means we fail to fetch the current block 
from all available
+        // datanodes. Send IOExceptions in ioExceptionMap to the log and 
rethrow the exception to
+        // fail this request.
+        String msg = String.format("Failed to read from all available 
datanodes for block %s "

Review Comment:
   This error handling is duplicated in multiple places. Can we abstract it in 
a common function?





> DFSInputStream: avoid logging stacktrace until when we really need to fail a 
> read request with a MissingBlockException
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-17332
>                 URL: https://issues.apache.org/jira/browse/HDFS-17332
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>            Reporter: Xing Lin
>            Assignee: Xing Lin
>            Priority: Minor
>              Labels: pull-request-available
>
> In DFSInputStream#actualGetFromOneDataNode(), it would send the exception 
> stacktrace to the dfsClient.LOG whenever we fail on a DN. However, in most 
> cases, the read request will be served successfully by reading from the next 
> available DN. The existence of exception stacktrace in the log has caused 
> multiple hadoop users at Linkedin to consider this WARN message as the 
> RC/fatal error for their jobs.  We would like to improve the log message and 
> avoid sending the stacktrace to dfsClient.LOG when a read succeeds. The 
> stackTrace when reading reach DN is sent to the log only when we really need 
> to fail a read request (when chooseDataNode()/refetchLocations() throws a 
> BlockMissingException). 
>  
> Example stack trace
> {code:java}
> [12]<stderr>:23/11/30 23:01:33 WARN hdfs.DFSClient: Connection failure: 
> Failed to connect to 10.150.91.13/10.150.91.13:71 for file 
> /XXXX/part-yyyy-95b9909c-zzz-c000.avro for block 
> BP-364971551-DatanodeIP-1448516588954:blk_zzzz_129864739321:java.net.SocketTimeoutException:
>  60000 millis timeout while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/ip:40492 
> remote=datanodeIP:71] [12]<stderr>:java.net.SocketTimeoutException: 60000 
> millis timeout while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/localIp:40492 
> remote=datanodeIP:71] [12]<stderr>: at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) 
> [12]<stderr>: at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) 
> [12]<stderr>: at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) 
> [12]<stderr>: at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) 
> [12]<stderr>: at java.io.FilterInputStream.read(FilterInputStream.java:83) 
> [12]<stderr>: at 
> org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:458)
>  [12]<stderr>: at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderRemote2.newBlockReader(BlockReaderRemote2.java:412)
>  [12]<stderr>: at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:864)
>  [12]<stderr>: at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:753)
>  [12]<stderr>: at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:387)
>  [12]<stderr>: at 
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:736) 
> [12]<stderr>: at 
> org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1268)
>  [12]<stderr>: at 
> org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1216)
>  [12]<stderr>: at 
> org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1608) 
> [12]<stderr>: at 
> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1568) 
> [12]<stderr>: at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:93) 
> [12]<stderr>: at 
> hdfs_metrics_shade.org.apache.hadoop.fs.InstrumentedFSDataInputStream$InstrumentedFilterInputStream.lambda$read$0(InstrumentedFSDataInputStream.java:108)
>  [12]<stderr>: at 
> com.linkedin.hadoop.metrics.fs.PerformanceTrackingFSDataInputStream.process(PerformanceTrackingFSDataInputStream.java:39)
>  [12]<stderr>: at 
> hdfs_metrics_shade.org.apache.hadoop.fs.InstrumentedFSDataInputStream$InstrumentedFilterInputStream.read(InstrumentedFSDataInputStream.java:108)
>  [12]<stderr>: at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:93) 
> [12]<stderr>: at 
> org.apache.hadoop.fs.RetryingInputStream.lambda$read$2(RetryingInputStream.java:153)
>  [12]<stderr>: at 
> org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36) 
> [12]<stderr>: at 
> org.apache.hadoop.fs.RetryingInputStream.read(RetryingInputStream.java:149) 
> [12]<stderr>: at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:93){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17332) DFSInputStream: avoid logging stacktrace until when we really need to fail a read request with a MissingBlockException

Reply via email to