[ 
https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13894081#comment-13894081
 ] 

Hadoop QA commented on HDFS-4858:
---------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12626217/HDFS-4858.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
                        Please justify why no new tests are needed for this 
patch.
                        Also please list what manual steps were performed to 
verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6061//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6061//console

This message is automatically generated.

> HDFS DataNode to NameNode RPC should timeout
> --------------------------------------------
>
>                 Key: HDFS-4858
>                 URL: https://issues.apache.org/jira/browse/HDFS-4858
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha
>         Environment: Redhat/CentOS 6.4 64 bit Linux
>            Reporter: Jagane Sundar
>            Assignee: Konstantin Boudnik
>            Priority: Minor
>             Fix For: 3.0.0, 2.3.0
>
>         Attachments: HDFS-4858.patch, HDFS-4858.patch
>
>
> The DataNode is configured with ipc.client.ping false and ipc.ping.interval 
> 14000. This configuration means that the IPC Client (DataNode, in this case) 
> should timeout in 14000 seconds if the Standby NameNode does not respond to a 
> sendHeartbeat.
> What we observe is this: If the Standby NameNode happens to reboot for any 
> reason, the DataNodes that are heartbeating to this Standby get stuck forever 
> while trying to sendHeartbeat. See Stack trace included below. When the 
> Standby NameNode comes back up, we find that the DataNode never re-registers 
> with the Standby NameNode. Thereafter failover completely fails.
> The desired behavior is that the DataNode's sendHeartbeat should timeout in 
> 14 seconds, and keep retrying till the Standby NameNode comes back up. When 
> it does, the DataNode should reconnect, re-register, and offer service.
> Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the 
> method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to 
> create the DatanodeProtocolPB object.
> Stack trace of thread stuck in the DataNode after the Standby NN has rebooted:
> Thread 25 (DataNode: [file:///opt/hadoop/data]  heartbeating to 
> vmhost6-vm1/10.10.10.151:8020):
>   State: WAITING
>   Blocked count: 23843
>   Waited count: 45676
>   Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5
>   Stack:
>     java.lang.Object.wait(Native Method)
>     java.lang.Object.wait(Object.java:485)
>     org.apache.hadoop.ipc.Client.call(Client.java:1220)
>     
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>     sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
>     sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>     
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     java.lang.reflect.Method.invoke(Method.java:597)
>     
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>     
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>     sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
>     
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167)
>     
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445)
>     
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525)
>     
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676)
>     java.lang.Thread.run(Thread.java:662)
> DataNode RPC to Standby NameNode never times out. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to