[
https://issues.apache.org/jira/browse/HDFS-17157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18029934#comment-18029934
]
ASF GitHub Bot commented on HDFS-17157:
---------------------------------------
github-actions[bot] commented on PR #5954:
URL: https://github.com/apache/hadoop/pull/5954#issuecomment-3404078067
We're closing this stale PR because it has been open for 100 days with no
activity. This isn't a judgement on the merit of the PR in any way. It's just a
way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working
on it, please feel free to re-open it and ask for a committer to remove the
stale tag and review again.
Thanks all for your contribution.
> Transient network failure in lease recovery could be mitigated to ensure
> better consistency
> -------------------------------------------------------------------------------------------
>
> Key: HDFS-17157
> URL: https://issues.apache.org/jira/browse/HDFS-17157
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Affects Versions: 2.0.0-alpha, 3.3.6
> Reporter: Haoze Wu
> Priority: Major
> Labels: pull-request-available
>
> This case is related to HDFS-12070.
> In HDFS-12070, we saw how a faulty drive at a certain datanode could lead to
> permanent block recovery failure and leaves the file open indefinitely. In
> the patch, instead of failing the whole lease recovery process when the
> second stage of block recovery is failed at one datanode, the whole lease
> recovery process is failed if only these are failed for all the datanodes.
> Attached is the code snippet for the second stage of the block recovery, in
> BlockRecoveryWorker#syncBlock:
> {code:java}
> ...
> final List<BlockRecord> successList = new ArrayList<>();
> for (BlockRecord r : participatingList) {
> try {
> r.updateReplicaUnderRecovery(bpid, recoveryId, blockId,
> newBlock.getNumBytes());
> successList.add(r);
> } catch (IOException e) {
> ...{code}
> However, because of transient network failure, the RPC in
> updateReplicaUnderRecovery initiated from the primary datanode to another
> datanode could return an EOFException while the other side does not process
> the RPC at all or throw an IOException when reading from the socket.
> {code:java}
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:788)
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1495)
> at org.apache.hadoop.ipc.Client.call(Client.java:1437)
> at org.apache.hadoop.ipc.Client.call(Client.java:1347)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy29.updateReplicaUnderRecovery(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolTranslatorPB.updateReplicaUnderRecovery(InterDatanodeProtocolTranslatorPB.java:112)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.updateReplicaUnderRecovery(BlockRecoveryWorker.java:88)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.access$700(BlockRecoveryWorker.java:71)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.syncBlock(BlockRecoveryWorker.java:300)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.recover(BlockRecoveryWorker.java:188)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:606)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at
> org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1796)
> at
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1165)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1061)
> {code}
> Then if there is any other datanode in which the second stage of block
> recovery success, the lease recovery would be successful and close the file.
> However, the last block failed to be synced to that failed datanode and this
> inconsistency could potentially last for a very long time.
> To fix the issue, I propose adding a configurable retry of
> updateReplicaUnderRecovery RPC so that transient network failure could be
> mitigated.
> Any comments and suggestions would be appreciated.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]