[
https://issues.apache.org/jira/browse/HDFS-16097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
lei w updated HDFS-16097:
-
Description:
Datanode receives ipc requests will throw NPE when datanode quickly restart.
This is because when DN is reStarted, BlockPool is first registered with
blockPoolManager and then fsdataset is initialized. When BlockPool is
registered to blockPoolManager without initializing fsdataset, DataNode
receives an IPC request will throw NPE, because it will call related methods
provided by fsdataset. The stack exception is as follows:
{code:java}
java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.datanode.DataNode.initReplicaRecovery(DataNode.java:3468)
at
org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolServerSideTranslatorPB.initReplicaRecovery(InterDatanodeProtocolServerSideTranslatorPB.java:55)
at
org.apache.hadoop.hdfs.protocol.proto.InterDatanodeProtocolProtos$InterDatanodeProtocolService$2.callBlockingMethod(InterDatanodeProtocolProtos.java:3105)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:916)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
{code}
The client side stack exception is as follows:
{code:java}
WARN org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to
recover block (block=BP-###:blk_###,
datanode=DatanodeInfoWithStorage[,null,null])
org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException):
java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.datanode.DataNode.initReplicaRecovery(DataNode.java:3468)
at
org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolServerSideTranslatorPB.initReplicaRecovery(InterDatanodeProtocolServerSideTranslatorPB.java:55)
at
org.apache.hadoop.hdfs.protocol.proto.InterDatanodeProtocolProtos$InterDatanodeProtocolService$2.callBlockingMethod(InterDatanodeProtocolProtos.java:3105)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:916)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2873)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1511)
at org.apache.hadoop.ipc.Client.call(Client.java:1457)
at org.apache.hadoop.ipc.Client.call(Client.java:1367)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy26.initReplicaRecovery(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolTranslatorPB.initReplicaRecovery(InterDatanodeProtocolTranslatorPB.java:83)
at
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker.callInitReplicaRecovery(BlockRecoveryWorker.java:571)
at
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker.access$400(BlockRecoveryWorker.java:57)
at
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.recover(BlockRecoveryWorker.java:142)
at
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:610)
at java.lang.Thread.run(Thread.java:748)
{code}
was:
Datanode receives ipc requests will throw NPE when datanode quickly restart.
This is because when DN is reStarted, BlockPool is first registered with
blockPoolManager and then fsdataset is initialized. When BlockPool is
registered to blockPoolManager without initializing fsdataset, DataNode
receives an IPC request will throw NPE, because it will call related methods
provided by fsdataset. The stack exception is as follows:
{code:java}
java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.datanode.DataNode.initReplicaRecovery(DataNode.java:3468)
at
org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolServerSideTranslatorPB.initReplicaRecovery(InterDatanodeProtocolServerSideTranslatorPB.java:55)
at
org.apache.hadoop.hdfs.protocol.proto.InterDatanodeProtocolProtos$InterDatanodeProtocolService$2.callBlockingMethod(InterDatanodeProtocolProtos.java:3105)
at