[ https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
TanYuxin updated HDFS-12749: ---------------------------- Description: Now our cluster have 7000+ DN, files num 180+ million, block num 180+ million. After SNN restart,DN will call BPServiceActor#reRegister method to register. But register RPC will get a IOException since NN is busy for deal with Block Report. The exception is caught at BPServiceActor#processCommand. Next is the caught IOException: {code:java} WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing datanode Command java.io.IOException: Failed on local exception: java.io.IOException: java.net.SocketTimeoutException: 60000 millis timeout while w aiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.14.110.33:24562 remote=namenode.host.03/10.14.27.17:8040]; Host Details : local host is: "datanode-2220/10.14.110.33"; destination host is: "namenode.host.03":8040; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773) at org.apache.hadoop.ipc.Client.call(Client.java:1474) at org.apache.hadoop.ipc.Client.call(Client.java:1407) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864) at java.lang.Thread.run(Thread.java:745) {code} was:After SNN restart,DN will call BPServiceActor#reRegister method to register. But when DN registers to NN, SNN will return > DN may not send block report to NN after NN restart > --------------------------------------------------- > > Key: HDFS-12749 > URL: https://issues.apache.org/jira/browse/HDFS-12749 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: TanYuxin > > Now our cluster have 7000+ DN, files num 180+ million, block num 180+ million. > After SNN restart,DN will call BPServiceActor#reRegister method to register. > But register RPC will get a IOException since NN is busy for deal with Block > Report. The exception is caught at BPServiceActor#processCommand. > Next is the caught IOException: > {code:java} > WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing > datanode Command > java.io.IOException: Failed on local exception: java.io.IOException: > java.net.SocketTimeoutException: 60000 millis timeout while w > aiting for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/10.14.110.33:24562 > remote=namenode.host.03/10.14.27.17:8040]; Host Details : local host is: > "datanode-2220/10.14.110.33"; destination host is: "namenode.host.03":8040; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773) > at org.apache.hadoop.ipc.Client.call(Client.java:1474) > at org.apache.hadoop.ipc.Client.call(Client.java:1407) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org