[ https://issues.apache.org/jira/browse/HDFS-11974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053440#comment-16053440 ]
Yongjun Zhang commented on HDFS-11974: -------------------------------------- A further thought. The exception reported was thrown here {code} if (finishedReceiving && received != advertisedSize) { // only throw this exception if we think we read all of it on our end // -- otherwise a client-side IOException would be masked by this // exception that makes it look like a server-side problem! deleteTmpFiles(localPaths); throw new IOException("File " + url + " received length " + received + " is not of the advertised size " + advertisedSize + ". Fsimage name: " + fsImageName + " lastReceived: " + num); } {code} where {{finishedReceiving}} is true. It's only true when the loop finishes {code} byte[] buf = new byte[IO_FILE_BUFFER_SIZE]; while (num > 0) { num = stream.read(buf); if (num > 0) { received += num; for (FileOutputStream fos : outputStreams) { fos.write(buf, 0, num); } if (throttler != null) { throttler.throttle(num); } } } finishedReceiving = true; {code} It's puzzling: if there is socket time out exception, it should be thrown in the above loop, and {{finishedReceiving}} should not have been set to true. If {{finishedReceiving}} is set to true, then no exception is expected to have been thrown in the above loop presumably. > Fsimage transfer failed due to socket timeout, but logs doesn't show that > ------------------------------------------------------------------------- > > Key: HDFS-11974 > URL: https://issues.apache.org/jira/browse/HDFS-11974 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Yongjun Zhang > Assignee: Yongjun Zhang > > The idea of HDFS-11914 is to add more diagnosis information to understand > what happened when we saw > {code} > WARN org.apache.hadoop.security.UserGroupInformation: > PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException: > File http://x.y.z:50070/imagetransfer?getimage=1&txid=latest received length > xyz is not of the advertised size abc. > {code} > After further study, I realize that the above exception is thrown in the > {{finally}} block of {{TransferFsImage#receiveFile}} method, thus other > exception thrown in the main code is not reported, such as SocketTimeOut. > We should include the information of the exceptions thrown in the main code > when throwing exception in the {{finally}} block. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org