[ 
https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TanYuxin updated HDFS-12749:
----------------------------
    Description: 
Now our cluster have 7000+ DN, files num 180+ million, block num 180+ million.
After SNN restart,DN will call BPServiceActor#reRegister method to register. 
But register RPC will get a IOException since NN is busy for deal with Block 
Report.  The exception is caught at BPServiceActor#processCommand.
Next is the caught IOException:
{code:java}
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing datanode 
Command
java.io.IOException: Failed on local exception: java.io.IOException: 
java.net.SocketTimeoutException: 60000 millis timeout while w
aiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/10.14.110.33:24562 
remote=namenode.host.03/10.14.27.17:8040]; Host Details : local host is: 
"datanode-2220/10.14.110.33"; destination host is: "namenode.host.03":8040;
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
        at org.apache.hadoop.ipc.Client.call(Client.java:1474)
        at org.apache.hadoop.ipc.Client.call(Client.java:1407)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
        at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926)
        at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864)
        at java.lang.Thread.run(Thread.java:745)
{code}



  was:After SNN restart,DN will call BPServiceActor#reRegister method to 
register. But when DN registers to NN, SNN will return 


> DN may not send block report to NN after NN restart
> ---------------------------------------------------
>
>                 Key: HDFS-12749
>                 URL: https://issues.apache.org/jira/browse/HDFS-12749
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: TanYuxin
>
> Now our cluster have 7000+ DN, files num 180+ million, block num 180+ million.
> After SNN restart,DN will call BPServiceActor#reRegister method to register. 
> But register RPC will get a IOException since NN is busy for deal with Block 
> Report.  The exception is caught at BPServiceActor#processCommand.
> Next is the caught IOException:
> {code:java}
> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing 
> datanode Command
> java.io.IOException: Failed on local exception: java.io.IOException: 
> java.net.SocketTimeoutException: 60000 millis timeout while w
> aiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/10.14.110.33:24562 
> remote=namenode.host.03/10.14.27.17:8040]; Host Details : local host is: 
> "datanode-2220/10.14.110.33"; destination host is: "namenode.host.03":8040;
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1474)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1407)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>         at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864)
>         at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to