[jira] [Updated] (HDFS-17430) RecoveringBlock will skip no live replicas when get block recovery command.

2024-03-18 Thread Haiyang Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haiyang Hu updated HDFS-17430:
--
Description: 
RecoveringBlock maybe skip no live replicas when get block recovery command.

*Issue:*
Currently the following scenarios may lead to failure in the execution of 
BlockRecoveryWorker by the datanode, resulting file being not to be closed for 
a long time.

t1.  The block_xxx_xxx has two replicas[dn1,dn2]; the dn1 machine shut down and 
will be dead status, the dn2 is live status.
t2.  Occurs block recovery.
related logs:

{code:java}
2024-03-13 21:58:00.651 WARN hdfs.StateChange   DIR* 
NameSystem.internalReleaseLease: File /xxx/file has not been closed. Lease 
recovery is in progress. RecoveryId = 28577373754 for block blk_xxx_xxx
{code}

t3.  The dn2 is chosen for block recovery.
dn1 is marked as stale (is dead state) at this time, here the recoveryLocations 
size is 1, currently according to the following logic, dn1 and dn2 will be 
chosen to participate in block recovery.

DatanodeManager#getBlockRecoveryCommand
{code:java}
   // Skip stale nodes during recovery
 final List recoveryLocations =
 new ArrayList<>(storages.length);
 final List storageIdx = new ArrayList<>(storages.length);
 for (int i = 0; i < storages.length; ++i) {
   if (!storages[i].getDatanodeDescriptor().isStale(staleInterval)) {
 recoveryLocations.add(storages[i]);
 storageIdx.add(i);
   }
 }
 ...
 // If we only get 1 replica after eliminating stale nodes, choose all
 // replicas for recovery and let the primary data node handle failures.
 DatanodeInfo[] recoveryInfos;
 if (recoveryLocations.size() > 1) {
   if (recoveryLocations.size() != storages.length) {
 LOG.info("Skipped stale nodes for recovery : "
 + (storages.length - recoveryLocations.size()));
   }
   recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(recoveryLocations);
 } else {
   // If too many replicas are stale, then choose all replicas to
   // participate in block recovery.
   recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(storages);
 }
{code}


{code:java}
2024-03-13 21:58:01,425 INFO  datanode.DataNode 
(BlockRecoveryWorker.java:logRecoverBlock(563))
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
BlockRecoveryWorker: NameNode at xxx:8040 calls 
recoverBlock(BP-xxx:blk_xxx_xxx, 
targets=[DatanodeInfoWithStorage[dn1:50010,null,null], 
DatanodeInfoWithStorage[dn2:50010,null,null]], newGenerationStamp=28577373754, 
newBlock=null, isStriped=false)
{code}

t.4. When dn2 executes BlockRecoveryWorker#recover, it will call 
initReplicaRecovery operation on dn1, however, since the dn1 machine is 
currently down state at this time, it will take a very long time to timeout,  
the default number of retries to establish a server connection is 45 times.
related logs:

{code:java}
2024-03-13 21:59:31,518 INFO  ipc.Client 
(Client.java:handleConnectionTimeout(904)) 
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - 
Retrying connect to server: dn1:8010. Already tried 0 time(s); maxRetries=45
...
2024-03-13 23:05:35,295 INFO  ipc.Client 
(Client.java:handleConnectionTimeout(904)) 
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - 
Retrying connect to server: dn2:8010. Already tried 44 time(s); maxRetries=45

2024-03-13 23:07:05,392 WARN  protocol.InterDatanodeProtocol 
(BlockRecoveryWorker.java:recover(170)) 
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
Failed to recover block (block=BP-xxx:blk_xxx_xxx, 
datanode=DatanodeInfoWithStorage[dn1:50010,null,null]) 
org.apache.hadoop.net.ConnectTimeoutException:
Call From dn2 to dn1:8010 failed on socket timeout exception: 
org.apache.hadoop.net.ConnectTimeoutException: 9 millis timeout while 
waiting for channel to be ready for connect.ch : 
java.nio.channels.SocketChannel[connection-pending remote=dn:8010]; For more 
details see:  http://wiki.apache.org/hadoop/SocketTimeout
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:931)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:866)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1583)
at org.apache.hadoop.ipc.Client.call(Client.java:1511)
at org.apache.hadoop.ipc.Client.call(Client.java:1402)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:268)
at 
org.apach

[jira] [Updated] (HDFS-17430) RecoveringBlock will skip no live replicas when get block recovery command.

2024-03-18 Thread Haiyang Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haiyang Hu updated HDFS-17430:
--
Description: 
RecoveringBlock maybe skip no live replicas when get block recovery command.

*Issue:*
Currently the following scenarios may lead to failure in the execution of 
BlockRecoveryWorker by the datanode, resulting file being not to be closed for 
a long time.

*t1.*  The block_xxx_xxx has two replicas[dn1,dn2]; the dn1 machine shut down 
and will be dead status, the dn2 is live status.
*t2. * Occurs block recovery.
related logs:

{code:java}
2024-03-13 21:58:00.651 WARN hdfs.StateChange   DIR* 
NameSystem.internalReleaseLease: File /xxx/file has not been closed. Lease 
recovery is in progress. RecoveryId = 28577373754 for block blk_xxx_xxx
{code}

*t3. * The dn2 is chosen for block recovery.
dn1 is marked as stale (is dead state) at this time, here the recoveryLocations 
size is 1, currently according to the following logic, dn1 and dn2 will be 
chosen to participate in block recovery.

DatanodeManager#getBlockRecoveryCommand
{code:java}
   // Skip stale nodes during recovery
 final List recoveryLocations =
 new ArrayList<>(storages.length);
 final List storageIdx = new ArrayList<>(storages.length);
 for (int i = 0; i < storages.length; ++i) {
   if (!storages[i].getDatanodeDescriptor().isStale(staleInterval)) {
 recoveryLocations.add(storages[i]);
 storageIdx.add(i);
   }
 }
 ...
 // If we only get 1 replica after eliminating stale nodes, choose all
 // replicas for recovery and let the primary data node handle failures.
 DatanodeInfo[] recoveryInfos;
 if (recoveryLocations.size() > 1) {
   if (recoveryLocations.size() != storages.length) {
 LOG.info("Skipped stale nodes for recovery : "
 + (storages.length - recoveryLocations.size()));
   }
   recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(recoveryLocations);
 } else {
   // If too many replicas are stale, then choose all replicas to
   // participate in block recovery.
   recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(storages);
 }
{code}


{code:java}
2024-03-13 21:58:01,425 INFO  datanode.DataNode 
(BlockRecoveryWorker.java:logRecoverBlock(563))
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
BlockRecoveryWorker: NameNode at xxx:8040 calls 
recoverBlock(BP-xxx:blk_xxx_xxx, 
targets=[DatanodeInfoWithStorage[dn1:50010,null,null], 
DatanodeInfoWithStorage[dn2:50010,null,null]], newGenerationStamp=28577373754, 
newBlock=null, isStriped=false)
{code}

*t.4. *When dn2 executes BlockRecoveryWorker#recover, it will call 
initReplicaRecovery operation on dn1, however, since the dn1 machine is 
currently down state at this time, it will take a very long time to timeout,  
the default number of retries to establish a server connection is 45 times.
related logs:

{code:java}
2024-03-13 21:59:31,518 INFO  ipc.Client 
(Client.java:handleConnectionTimeout(904)) 
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - 
Retrying connect to server: dn1:8010. Already tried 0 time(s); maxRetries=45
...
2024-03-13 23:05:35,295 INFO  ipc.Client 
(Client.java:handleConnectionTimeout(904)) 
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - 
Retrying connect to server: dn2:8010. Already tried 44 time(s); maxRetries=45

2024-03-13 23:07:05,392 WARN  protocol.InterDatanodeProtocol 
(BlockRecoveryWorker.java:recover(170)) 
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
Failed to recover block (block=BP-xxx:blk_xxx_xxx, 
datanode=DatanodeInfoWithStorage[dn1:50010,null,null]) 
org.apache.hadoop.net.ConnectTimeoutException:
Call From dn2 to dn1:8010 failed on socket timeout exception: 
org.apache.hadoop.net.ConnectTimeoutException: 9 millis timeout while 
waiting for channel to be ready for connect.ch : 
java.nio.channels.SocketChannel[connection-pending remote=dn:8010]; For more 
details see:  http://wiki.apache.org/hadoop/SocketTimeout
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:931)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:866)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1583)
at org.apache.hadoop.ipc.Client.call(Client.java:1511)
at org.apache.hadoop.ipc.Client.call(Client.java:1402)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:268)
at 
o

[jira] [Updated] (HDFS-17430) RecoveringBlock will skip no live replicas when get block recovery command.

2024-03-18 Thread Haiyang Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haiyang Hu updated HDFS-17430:
--
Description: 
RecoveringBlock maybe skip no live replicas when get block recovery command.

*Issue:*
Currently the following scenarios may lead to failure in the execution of 
BlockRecoveryWorker by the datanode, resulting file being not to be closed for 
a long time.

*t1.*  The block_xxx_xxx has two replicas[dn1,dn2]; the dn1 machine shut down 
and will be dead status, the dn2 is live status.
*t2.* Occurs block recovery.
related logs:

{code:java}
2024-03-13 21:58:00.651 WARN hdfs.StateChange   DIR* 
NameSystem.internalReleaseLease: File /xxx/file has not been closed. Lease 
recovery is in progress. RecoveryId = 28577373754 for block blk_xxx_xxx
{code}

*t3.*  The dn2 is chosen for block recovery.
dn1 is marked as stale (is dead state) at this time, here the recoveryLocations 
size is 1, currently according to the following logic, dn1 and dn2 will be 
chosen to participate in block recovery.

DatanodeManager#getBlockRecoveryCommand
{code:java}
   // Skip stale nodes during recovery
 final List recoveryLocations =
 new ArrayList<>(storages.length);
 final List storageIdx = new ArrayList<>(storages.length);
 for (int i = 0; i < storages.length; ++i) {
   if (!storages[i].getDatanodeDescriptor().isStale(staleInterval)) {
 recoveryLocations.add(storages[i]);
 storageIdx.add(i);
   }
 }
 ...
 // If we only get 1 replica after eliminating stale nodes, choose all
 // replicas for recovery and let the primary data node handle failures.
 DatanodeInfo[] recoveryInfos;
 if (recoveryLocations.size() > 1) {
   if (recoveryLocations.size() != storages.length) {
 LOG.info("Skipped stale nodes for recovery : "
 + (storages.length - recoveryLocations.size()));
   }
   recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(recoveryLocations);
 } else {
   // If too many replicas are stale, then choose all replicas to
   // participate in block recovery.
   recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(storages);
 }
{code}


{code:java}
2024-03-13 21:58:01,425 INFO  datanode.DataNode 
(BlockRecoveryWorker.java:logRecoverBlock(563))
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
BlockRecoveryWorker: NameNode at xxx:8040 calls 
recoverBlock(BP-xxx:blk_xxx_xxx, 
targets=[DatanodeInfoWithStorage[dn1:50010,null,null], 
DatanodeInfoWithStorage[dn2:50010,null,null]], newGenerationStamp=28577373754, 
newBlock=null, isStriped=false)
{code}

*t4.* When dn2 executes BlockRecoveryWorker#recover, it will call 
initReplicaRecovery operation on dn1, however, since the dn1 machine is 
currently down state at this time, it will take a very long time to timeout,  
the default number of retries to establish a server connection is 45 times.
related logs:

{code:java}
2024-03-13 21:59:31,518 INFO  ipc.Client 
(Client.java:handleConnectionTimeout(904)) 
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - 
Retrying connect to server: dn1:8010. Already tried 0 time(s); maxRetries=45
...
2024-03-13 23:05:35,295 INFO  ipc.Client 
(Client.java:handleConnectionTimeout(904)) 
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - 
Retrying connect to server: dn2:8010. Already tried 44 time(s); maxRetries=45

2024-03-13 23:07:05,392 WARN  protocol.InterDatanodeProtocol 
(BlockRecoveryWorker.java:recover(170)) 
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
Failed to recover block (block=BP-xxx:blk_xxx_xxx, 
datanode=DatanodeInfoWithStorage[dn1:50010,null,null]) 
org.apache.hadoop.net.ConnectTimeoutException:
Call From dn2 to dn1:8010 failed on socket timeout exception: 
org.apache.hadoop.net.ConnectTimeoutException: 9 millis timeout while 
waiting for channel to be ready for connect.ch : 
java.nio.channels.SocketChannel[connection-pending remote=dn:8010]; For more 
details see:  http://wiki.apache.org/hadoop/SocketTimeout
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:931)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:866)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1583)
at org.apache.hadoop.ipc.Client.call(Client.java:1511)
at org.apache.hadoop.ipc.Client.call(Client.java:1402)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:268)
at 
org

[jira] [Updated] (HDFS-17430) RecoveringBlock will skip no live replicas when get block recovery command.

2024-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17430:
--
Labels: pull-request-available  (was: )

> RecoveringBlock will skip no live replicas when get block recovery command.
> ---
>
> Key: HDFS-17430
> URL: https://issues.apache.org/jira/browse/HDFS-17430
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> RecoveringBlock maybe skip no live replicas when get block recovery command.
> *Issue:*
> Currently the following scenarios may lead to failure in the execution of 
> BlockRecoveryWorker by the datanode, resulting file being not to be closed 
> for a long time.
> *t1.*  The block_xxx_xxx has two replicas[dn1,dn2]; the dn1 machine shut down 
> and will be dead status, the dn2 is live status.
> *t2.* Occurs block recovery.
> related logs:
> {code:java}
> 2024-03-13 21:58:00.651 WARN hdfs.StateChange DIR* 
> NameSystem.internalReleaseLease: File /xxx/file has not been closed. Lease 
> recovery is in progress. RecoveryId = 28577373754 for block blk_xxx_xxx
> {code}
> *t3.*  The dn2 is chosen for block recovery.
> dn1 is marked as stale (is dead state) at this time, here the 
> recoveryLocations size is 1, currently according to the following logic, dn1 
> and dn2 will be chosen to participate in block recovery.
> DatanodeManager#getBlockRecoveryCommand
> {code:java}
>// Skip stale nodes during recovery
>  final List recoveryLocations =
>  new ArrayList<>(storages.length);
>  final List storageIdx = new ArrayList<>(storages.length);
>  for (int i = 0; i < storages.length; ++i) {
>if (!storages[i].getDatanodeDescriptor().isStale(staleInterval)) {
>  recoveryLocations.add(storages[i]);
>  storageIdx.add(i);
>}
>  }
>  ...
>  // If we only get 1 replica after eliminating stale nodes, choose all
>  // replicas for recovery and let the primary data node handle failures.
>  DatanodeInfo[] recoveryInfos;
>  if (recoveryLocations.size() > 1) {
>if (recoveryLocations.size() != storages.length) {
>  LOG.info("Skipped stale nodes for recovery : "
>  + (storages.length - recoveryLocations.size()));
>}
>recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(recoveryLocations);
>  } else {
>// If too many replicas are stale, then choose all replicas to
>// participate in block recovery.
>recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(storages);
>  }
> {code}
> {code:java}
> 2024-03-13 21:58:01,425 INFO  datanode.DataNode 
> (BlockRecoveryWorker.java:logRecoverBlock(563))
> [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
> BlockRecoveryWorker: NameNode at xxx:8040 calls 
> recoverBlock(BP-xxx:blk_xxx_xxx, 
> targets=[DatanodeInfoWithStorage[dn1:50010,null,null], 
> DatanodeInfoWithStorage[dn2:50010,null,null]], 
> newGenerationStamp=28577373754, newBlock=null, isStriped=false)
> {code}
> *t4.* When dn2 executes BlockRecoveryWorker#recover, it will call 
> initReplicaRecovery operation on dn1, however, since the dn1 machine is 
> currently down state at this time, it will take a very long time to timeout,  
> the default number of retries to establish a server connection is 45 times.
> related logs:
> {code:java}
> 2024-03-13 21:59:31,518 INFO  ipc.Client 
> (Client.java:handleConnectionTimeout(904)) 
> [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - 
> Retrying connect to server: dn1:8010. Already tried 0 time(s); maxRetries=45
> ...
> 2024-03-13 23:05:35,295 INFO  ipc.Client 
> (Client.java:handleConnectionTimeout(904)) 
> [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - 
> Retrying connect to server: dn2:8010. Already tried 44 time(s); maxRetries=45
> 2024-03-13 23:07:05,392 WARN  protocol.InterDatanodeProtocol 
> (BlockRecoveryWorker.java:recover(170)) 
> [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
> Failed to recover block (block=BP-xxx:blk_xxx_xxx, 
> datanode=DatanodeInfoWithStorage[dn1:50010,null,null]) 
> org.apache.hadoop.net.ConnectTimeoutException:
> Call From dn2 to dn1:8010 failed on socket timeout exception: 
> org.apache.hadoop.net.ConnectTimeoutException: 9 millis timeout while 
> waiting for channel to be ready for connect.ch : 
> java.nio.channels.SocketChannel[connection-pending remote=dn:8010]; For more 
> details see:  http://wiki.apache.org/hadoop/SocketTimeout
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct