[ 
https://issues.apache.org/jira/browse/HDFS-11974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053440#comment-16053440
 ] 

Yongjun Zhang commented on HDFS-11974:
--------------------------------------

A further thought.

The exception reported was thrown here
{code}
      if (finishedReceiving && received != advertisedSize) {
        // only throw this exception if we think we read all of it on our end
        // -- otherwise a client-side IOException would be masked by this
        // exception that makes it look like a server-side problem!
        deleteTmpFiles(localPaths);
        throw new IOException("File " + url + " received length " + received +
            " is not of the advertised size " + advertisedSize +
            ". Fsimage name: " + fsImageName + " lastReceived: " + num);
      }
{code}
where {{finishedReceiving}} is true. It's only true when the loop finishes
{code}
    byte[] buf = new byte[IO_FILE_BUFFER_SIZE];
      while (num > 0) {
        num = stream.read(buf);
        if (num > 0) {
          received += num;
          for (FileOutputStream fos : outputStreams) {
            fos.write(buf, 0, num);
          }
          if (throttler != null) {
            throttler.throttle(num);
          }
        }
      }
      finishedReceiving = true;
{code}

It's puzzling: if there is socket time out exception, it should be thrown in 
the above loop, and {{finishedReceiving}} should not have been set to true. If 
{{finishedReceiving}}  is set to true, then no exception is expected to have 
been thrown in the above loop presumably.






> Fsimage transfer failed due to socket timeout, but logs doesn't show that
> -------------------------------------------------------------------------
>
>                 Key: HDFS-11974
>                 URL: https://issues.apache.org/jira/browse/HDFS-11974
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>
> The idea of HDFS-11914 is to add more diagnosis information to understand 
> what happened when we saw
> {code}
> WARN org.apache.hadoop.security.UserGroupInformation: 
> PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException: 
> File http://x.y.z:50070/imagetransfer?getimage=1&txid=latest received length 
> xyz is not of the advertised size abc.
> {code}
> After further study, I realize that the above exception is thrown in the 
> {{finally}} block of {{TransferFsImage#receiveFile}} method, thus other 
> exception thrown in the main code is not reported, such as SocketTimeOut.
> We should include the information of the exceptions thrown in the main code 
> when throwing exception in the {{finally}} block.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to