[ 
https://issues.apache.org/jira/browse/HDFS-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14330452#comment-14330452
 ] 

Chris Nauroth commented on HDFS-7009:
-------------------------------------

[~szetszwo], thank you for taking a look.

bq. Is there any reason that the input stream ends right after reading the 
totalLen or just a coincidence?

Good question.  Ultimately, this was just a coincidence of a DataNode trying to 
register during a poorly timed NameNode restart.  Both Ming and I have observed 
slightly different versions of this problem.  HDFS-7714 fixed the problem I saw 
by handling {{EOFException}} during registration, but we still need Ming's 
patch here to cover the slightly different problem he saw.

There are 4 separate cases to consider:

# DataNode connects to NameNode and sends registration request.  NameNode shuts 
down and terminates socket connection before writing any RPC response bytes.  
At the DataNode, the RPC client observes this as an {{EOFException}} thrown 
from the {{DataInputStream#readInt}} call.  With HDFS-7714, this case is 
handled correctly.
# DataNode connects to NameNode.  NameNode sends response length and starts 
sending a response header, but it shuts down and terminates the socket 
connection before writing the complete response header.  The contract of 
{{parseDelimitedFrom}} states that unexpected EOF part-way through parsing will 
propagate an {{EOFException}} to the caller.  At the DataNode, the RPC client 
observes the {{EOFException}} and therefore HDFS-7714 handles this case 
correctly too.
# DataNode connects to NameNode.  NameNode sends response length and complete 
response header, and then starts writing the response body, but shuts down and 
terminates the socket connection before writing the complete response body.  At 
the DataNode, the RPC client observes {{EOFException}} while trying to read the 
response body bytes, and therefore HDFS-7714 handles this case correctly too.
# DataNode connects to NameNode.  NameNode sends only response length, and then 
shuts down and terminates the socket connection before sending anything else.  
The contract of {{parseDelimitedFrom}} states that if the stream is already 
positioned at EOF, then the return value is {{null}}.  At the DataNode, the 
current RPC client code handles this case by throwing {{IOException}}.  This 
isn't sufficient information for the DataNode to know if it's safe to reattempt 
registration, even with HDFS-7714, so this is still a registration failure.

Here is the documentation for {{parseDelimitedFrom}}:

https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/AbstractParser#parseDelimitedFrom(java.io.InputStream)

It's probably a documentation bug that they say the return value is {{false}}.  
Here is the actual protobuf code from 
{{AbstractParser#parsePartialDelimitedFrom}} where we see it checking the 
stream for EOF and returning {{null}} before attempting to parse:

{code}
  public MessageType parsePartialDelimitedFrom(
      InputStream input,
      ExtensionRegistryLite extensionRegistry)
      throws InvalidProtocolBufferException {
    int size;
    try {
      int firstByte = input.read();
      if (firstByte == -1) {
        return null;
      }
      size = CodedInputStream.readRawVarint32(firstByte, input);
    } catch (IOException e) {
      throw new InvalidProtocolBufferException(e.getMessage());
    }
    InputStream limitedInput = new LimitedInputStream(input, size);
    return parsePartialFrom(limitedInput, extensionRegistry);
  }
{code}

To summarize, HDFS-7714 is sufficient to handle cases 1-3, but we still need 
Ming's patch here for correct handling of case 4.  I also think it's correct 
behavior for all RPC clients, not just the specific case of DataNode 
registration.

> Active NN and standby NN have different live nodes
> --------------------------------------------------
>
>                 Key: HDFS-7009
>                 URL: https://issues.apache.org/jira/browse/HDFS-7009
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HDFS-7009-2.patch, HDFS-7009-3.patch, HDFS-7009-4.patch, 
> HDFS-7009.patch
>
>
> To follow up on https://issues.apache.org/jira/browse/HDFS-6478, in most 
> cases, given DN sends HB and BR to NN regularly, if a specific RPC call 
> fails, it isn't a big deal.
> However, there are cases where DN fails to register with NN during initial 
> handshake due to exceptions not covered by RPC client's connection retry. 
> When this happens, the DN won't talk to that NN until the DN restarts.
> {noformat}
> BPServiceActor
>   public void run() {
>     LOG.info(this + " starting to offer service");
>     try {
>       // init stuff
>       try {
>         // setup storage
>         connectToNNAndHandshake();
>       } catch (IOException ioe) {
>         // Initial handshake, storage recovery or registration failed
>         // End BPOfferService thread
>         LOG.fatal("Initialization failed for block pool " + this, ioe);
>         return;
>       }
>       initialized = true; // bp is initialized;
>       
>       while (shouldRun()) {
>         try {
>           offerService();
>         } catch (Exception ex) {
>           LOG.error("Exception in BPOfferService for " + this, ex);
>           sleepAndLogInterrupts(5000, "offering service");
>         }
>       }
> ...
> {noformat}
> Here is an example of the call stack.
> {noformat}
> java.io.IOException: Failed on local exception: java.io.IOException: Response 
> is null.; Host Details : local host is: "xxx"; destination host is: 
> "yyy":8030;
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1239)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>         at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>         at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Response is null.
>         at 
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:949)
>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844)
> {noformat}
> This will create discrepancy between active NN and standby NN in terms of 
> live nodes.
>  
> Here is a possible scenario of missing blocks after failover.
> 1. DN A, B set up handshakes with active NN, but not with standby NN.
> 2. A block is replicated to DN A, B and C.
> 3. From standby NN's point of view, given A and B are dead nodes, the block 
> is under replicated.
> 4. DN C is down.
> 5. Before active NN detects DN C is down, it fails over.
> 6. The new active NN considers the block is missing. Even though there are 
> two replicas on DN A and B.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to