[jira] [Resolved] (HDFS-16074) Remove an expensive debug string concatenation

2021-06-16 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16074.

Fix Version/s: 3.3.2
   3.2.3
   3.4.0
   Resolution: Fixed

Thanks a lot for the review

> Remove an expensive debug string concatenation
> --
>
> Key: HDFS-16074
> URL: https://issues.apache.org/jira/browse/HDFS-16074
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: Screen Shot 2021-06-16 at 2.32.29 PM.png, Screen Shot 
> 2021-06-17 at 10.32.21 AM.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Running a YCSB load query, found that we do an expensive string concatenation 
> on the write path in DFSOutputStream.writeChunkPrepare(). 
> Nearly 25% of HDFS client write CPU time is spent here. It is not necessary 
> because it's supposed to be a debug message. So let's remove it.
> {code}
>  if (currentPacket == null) {
>   currentPacket = createPacket(packetSize, chunksPerPacket, getStreamer()
>   .getBytesCurBlock(), getStreamer().getAndIncCurrentSeqno(), false);
>   DFSClient.LOG.debug("WriteChunk allocating new packet seqno={},"
>   + " src={}, packetSize={}, chunksPerPacket={}, 
> bytesCurBlock={}",
>   currentPacket.getSeqno(), src, packetSize, chunksPerPacket,
>   getStreamer().getBytesCurBlock() + ", " + this); < here
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2021-06-16 Thread Hui Fei (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364652#comment-17364652
 ] 

Hui Fei commented on HDFS-13671:


Merged to trunk.

[~huanghaibin] [~aajisaka] [~sodonnell] [~kihwal] [~xyao] Thank you all!

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Assignee: Haibin Huang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, 
> image-2021-06-10-19-28-58-359.png
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2021-06-16 Thread Hui Fei (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Fei updated HDFS-13671:
---
Fix Version/s: 3.4.
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Assignee: Haibin Huang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.
>
> Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, 
> image-2021-06-10-19-28-58-359.png
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2021-06-16 Thread Hui Fei (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Fei updated HDFS-13671:
---
Fix Version/s: (was: 3.4.)
   3.4.0

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Assignee: Haibin Huang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, 
> image-2021-06-10-19-28-58-359.png
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16074) Remove an expensive debug string concatenation

2021-06-16 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364643#comment-17364643
 ] 

Wei-Chiu Chuang commented on HDFS-16074:


 !Screen Shot 2021-06-17 at 10.32.21 AM.png! 

Previously, writeChunkPrepare() took 1.7% of the 6.2% inside HDFS client.
After, writeChunkPrepare() took 0.7% of the 4.4% inside HDFS client.

> Remove an expensive debug string concatenation
> --
>
> Key: HDFS-16074
> URL: https://issues.apache.org/jira/browse/HDFS-16074
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Shot 2021-06-16 at 2.32.29 PM.png, Screen Shot 
> 2021-06-17 at 10.32.21 AM.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Running a YCSB load query, found that we do an expensive string concatenation 
> on the write path in DFSOutputStream.writeChunkPrepare(). 
> Nearly 25% of HDFS client write CPU time is spent here. It is not necessary 
> because it's supposed to be a debug message. So let's remove it.
> {code}
>  if (currentPacket == null) {
>   currentPacket = createPacket(packetSize, chunksPerPacket, getStreamer()
>   .getBytesCurBlock(), getStreamer().getAndIncCurrentSeqno(), false);
>   DFSClient.LOG.debug("WriteChunk allocating new packet seqno={},"
>   + " src={}, packetSize={}, chunksPerPacket={}, 
> bytesCurBlock={}",
>   currentPacket.getSeqno(), src, packetSize, chunksPerPacket,
>   getStreamer().getBytesCurBlock() + ", " + this); < here
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16074) Remove an expensive debug string concatenation

2021-06-16 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-16074:
---
Attachment: Screen Shot 2021-06-17 at 10.32.21 AM.png

> Remove an expensive debug string concatenation
> --
>
> Key: HDFS-16074
> URL: https://issues.apache.org/jira/browse/HDFS-16074
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Shot 2021-06-16 at 2.32.29 PM.png, Screen Shot 
> 2021-06-17 at 10.32.21 AM.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Running a YCSB load query, found that we do an expensive string concatenation 
> on the write path in DFSOutputStream.writeChunkPrepare(). 
> Nearly 25% of HDFS client write CPU time is spent here. It is not necessary 
> because it's supposed to be a debug message. So let's remove it.
> {code}
>  if (currentPacket == null) {
>   currentPacket = createPacket(packetSize, chunksPerPacket, getStreamer()
>   .getBytesCurBlock(), getStreamer().getAndIncCurrentSeqno(), false);
>   DFSClient.LOG.debug("WriteChunk allocating new packet seqno={},"
>   + " src={}, packetSize={}, chunksPerPacket={}, 
> bytesCurBlock={}",
>   currentPacket.getSeqno(), src, packetSize, chunksPerPacket,
>   getStreamer().getBytesCurBlock() + ", " + this); < here
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15150) Introduce read write lock to Datanode

2021-06-16 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364514#comment-17364514
 ] 

Stephen O'Donnell commented on HDFS-15150:
--

+1 on the 003 patch. Your plan on the further back ports sounds good to me. 

> Introduce read write lock to Datanode
> -
>
> Key: HDFS-15150
> URL: https://issues.apache.org/jira/browse/HDFS-15150
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-15150-branch-2.10.001.patch, 
> HDFS-15150-branch-2.10.002.patch, HDFS-15150-branch-2.10.003.patch, 
> HDFS-15150.001.patch, HDFS-15150.002.patch, HDFS-15150.003.patch
>
>
> HDFS-9668 pointed out the issues around the DN lock being a point of 
> contention some time ago, but that Jira went in a direction of creating a new 
> FSDataset implementation which is very risky, and activity on the Jira has 
> stalled for a few years now. Edit: Looks like HDFS-9668 eventually went in a 
> similar direction to what I was thinking, so I will review that Jira in more 
> detail to see if this one is necessary.
> I feel there could be significant gains by moving to a ReentrantReadWrite 
> lock within the DN. The current implementation is simply a ReentrantLock so 
> any locker blocks all others.
> Once place I think a read lock would benefit us significantly, is when the DN 
> is serving a lot of small blocks and there are jobs which perform a lot of 
> reads. The start of reading any blocks right now takes the lock, but if we 
> moved this to a read lock, many reads could do this at the same time.
> The first conservative step, would be to change the current lock and then 
> make all accesses to it obtain the write lock. That way, we should keep the 
> current behaviour and then we can selectively move some lock accesses to the 
> readlock in separate Jiras.
> I would appreciate any thoughts on this, and also if anyone has attempted it 
> before and found any blockers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16074) Remove an expensive debug string concatenation

2021-06-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-16074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364432#comment-17364432
 ] 

Íñigo Goiri commented on HDFS-16074:


It looks like a no-brainer to do this but can you attach a screenshot of the 
stack after the change?
I'm curious about how much we actually save.

> Remove an expensive debug string concatenation
> --
>
> Key: HDFS-16074
> URL: https://issues.apache.org/jira/browse/HDFS-16074
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Shot 2021-06-16 at 2.32.29 PM.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Running a YCSB load query, found that we do an expensive string concatenation 
> on the write path in DFSOutputStream.writeChunkPrepare(). 
> Nearly 25% of HDFS client write CPU time is spent here. It is not necessary 
> because it's supposed to be a debug message. So let's remove it.
> {code}
>  if (currentPacket == null) {
>   currentPacket = createPacket(packetSize, chunksPerPacket, getStreamer()
>   .getBytesCurBlock(), getStreamer().getAndIncCurrentSeqno(), false);
>   DFSClient.LOG.debug("WriteChunk allocating new packet seqno={},"
>   + " src={}, packetSize={}, chunksPerPacket={}, 
> bytesCurBlock={}",
>   currentPacket.getSeqno(), src, packetSize, chunksPerPacket,
>   getStreamer().getBytesCurBlock() + ", " + this); < here
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15618) Improve datanode shutdown latency

2021-06-16 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15618:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

I've committed this to branch-2.10.

> Improve datanode shutdown latency
> -
>
> Key: HDFS-15618
> URL: https://issues.apache.org/jira/browse/HDFS-15618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.4.0, 2.10.2, 3.2.3, 3.3.1, 3.2.2
>
> Attachments: HDFS-15618-branch-2.10.001.patch, 
> HDFS-15618-branch-2.10.002.patch, HDFS-15618-branch-2.10.003.patch, 
> HDFS-15618-branch-3.3.004.patch, HDFS-15618.001.patch, HDFS-15618.002.patch, 
> HDFS-15618.003.patch, HDFS-15618.004.patch
>
>
> The shutdown of Datanode is a very long latency. A block scanner waits for 5 
> minutes to join on each VolumeScanner thread.
> Since the scanners are daemon threads and do not alter the block content, it 
> is safe to ignore such conditions on shutdown of Datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15618) Improve datanode shutdown latency

2021-06-16 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15618:
--
Fix Version/s: 2.10.2

> Improve datanode shutdown latency
> -
>
> Key: HDFS-15618
> URL: https://issues.apache.org/jira/browse/HDFS-15618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0, 2.10.2, 3.2.3
>
> Attachments: HDFS-15618-branch-2.10.001.patch, 
> HDFS-15618-branch-2.10.002.patch, HDFS-15618-branch-2.10.003.patch, 
> HDFS-15618-branch-3.3.004.patch, HDFS-15618.001.patch, HDFS-15618.002.patch, 
> HDFS-15618.003.patch, HDFS-15618.004.patch
>
>
> The shutdown of Datanode is a very long latency. A block scanner waits for 5 
> minutes to join on each VolumeScanner thread.
> Since the scanners are daemon threads and do not alter the block content, it 
> is safe to ignore such conditions on shutdown of Datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15150) Introduce read write lock to Datanode

2021-06-16 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364370#comment-17364370
 ] 

Ahmed Hussein commented on HDFS-15150:
--

After committing [^HDFS-15150-branch-2.10.003.patch] to branch-2.10, I will 
work on create a single PR that backports the following three Jiras:
 * HDFS-15160 : ReplicaMap, Disk Balancer, Directory Scanner and various 
FsDatasetImpl methods should use datanode readlock
 * HDFS-15457 : TestFsDatasetImpl fails intermittently (caused by HDFS-15160)
 * HDFS-15818 : Fix TestFsDatasetImpl.testReadLockCanBeDisabledByConfig (caused 
by HDFS-15160)

CC: [~daryn], [~kihwal]

> Introduce read write lock to Datanode
> -
>
> Key: HDFS-15150
> URL: https://issues.apache.org/jira/browse/HDFS-15150
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-15150-branch-2.10.001.patch, 
> HDFS-15150-branch-2.10.002.patch, HDFS-15150-branch-2.10.003.patch, 
> HDFS-15150.001.patch, HDFS-15150.002.patch, HDFS-15150.003.patch
>
>
> HDFS-9668 pointed out the issues around the DN lock being a point of 
> contention some time ago, but that Jira went in a direction of creating a new 
> FSDataset implementation which is very risky, and activity on the Jira has 
> stalled for a few years now. Edit: Looks like HDFS-9668 eventually went in a 
> similar direction to what I was thinking, so I will review that Jira in more 
> detail to see if this one is necessary.
> I feel there could be significant gains by moving to a ReentrantReadWrite 
> lock within the DN. The current implementation is simply a ReentrantLock so 
> any locker blocks all others.
> Once place I think a read lock would benefit us significantly, is when the DN 
> is serving a lot of small blocks and there are jobs which perform a lot of 
> reads. The start of reading any blocks right now takes the lock, but if we 
> moved this to a read lock, many reads could do this at the same time.
> The first conservative step, would be to change the current lock and then 
> make all accesses to it obtain the write lock. That way, we should keep the 
> current behaviour and then we can selectively move some lock accesses to the 
> readlock in separate Jiras.
> I would appreciate any thoughts on this, and also if anyone has attempted it 
> before and found any blockers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15618) Improve datanode shutdown latency

2021-06-16 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364369#comment-17364369
 ] 

Kihwal Lee commented on HDFS-15618:
---

+1 The 2.10 patch looks good. Thanks, Ahmed for working on the port.

> Improve datanode shutdown latency
> -
>
> Key: HDFS-15618
> URL: https://issues.apache.org/jira/browse/HDFS-15618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HDFS-15618-branch-2.10.001.patch, 
> HDFS-15618-branch-2.10.002.patch, HDFS-15618-branch-2.10.003.patch, 
> HDFS-15618-branch-3.3.004.patch, HDFS-15618.001.patch, HDFS-15618.002.patch, 
> HDFS-15618.003.patch, HDFS-15618.004.patch
>
>
> The shutdown of Datanode is a very long latency. A block scanner waits for 5 
> minutes to join on each VolumeScanner thread.
> Since the scanners are daemon threads and do not alter the block content, it 
> is safe to ignore such conditions on shutdown of Datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16072) TestBlockRecovery fails consistently on Branch-2.10

2021-06-16 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364344#comment-17364344
 ] 

Ahmed Hussein commented on HDFS-16072:
--

{quote}Apologies Ahmed Hussein, I should have added stacktrace of flaky 
failures on HDFS-15940{quote}
Thanks [~vjasani] for the update. I appreciate it.
No worries at all.

Since we are not 100% positive it both branches had the same errors. I will 
keep this jira opened.
I will try to take a look at the refactor done in HDFS-15940 and see if it is 
applicable to back port it to 2.10. 

> TestBlockRecovery fails consistently on Branch-2.10
> ---
>
> Key: HDFS-16072
> URL: https://issues.apache.org/jira/browse/HDFS-16072
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, test
>Affects Versions: 2.10.1
>Reporter: Ahmed Hussein
>Priority: Major
>
> {{TestBlockRecovery}} fails on branch-2.10 consistently.
> I found that the failures were reported in the Qbt-Reports since March 2021 
> to say the least.
> {code:bash}
> [ERROR] Tests run: 20, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 
> 21.422 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery
> [ERROR] 
> testRaceBetweenReplicaRecoveryAndFinalizeBlock(org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery)
>   Time elapsed: 2.814 s  <<< ERROR!
> java.io.IOException: com.google.protobuf.ServiceException: 
> java.lang.IllegalThreadStateException
>   at 
> org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDatanodeReport(ClientNamenodeProtocolTranslatorPB.java:656)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:607)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy28.getDatanodeReport(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.datanodeReport(DFSClient.java:2132)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.waitActive(MiniDFSCluster.java:2699)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.waitActive(MiniDFSCluster.java:2743)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1723)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:905)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:517)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:476)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery.testRaceBetweenReplicaRecoveryAndFinalizeBlock(TestBlockRecovery.java:694)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:607)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: com.google.protobuf.ServiceException: 
> java.lang.IllegalThreadStateException
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:244)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>   at com.sun.proxy.$Proxy27.getDatanodeReport(Unkno

[jira] [Commented] (HDFS-15963) Unreleased volume references cause an infinite loop

2021-06-16 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364299#comment-17364299
 ] 

Kihwal Lee commented on HDFS-15963:
---

I've looked at heap dumps and confirm the analysis by [~zhangshuyan]. 
One failed volume's reference was closed (2^30), but the count never went down 
to 0. As long as this volume is in the head of {{volumesBeingRemoved}}, 
additional volume failures could not be handled, as the handler threads are all 
stuck looping forever for this volume to clear.

> Unreleased volume references cause an infinite loop
> ---
>
> Key: HDFS-15963
> URL: https://issues.apache.org/jira/browse/HDFS-15963
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 2.10.2
>
> Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, 
> HDFS-15963.003.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> When BlockSender throws an exception because the meta-data cannot be found, 
> the volume reference obtained by the thread is not released, which causes the 
> thread trying to remove the volume to wait and fall into an infinite loop.
> {code:java}
> boolean checkVolumesRemoved() {
>   Iterator it = volumesBeingRemoved.iterator();
>   while (it.hasNext()) {
> FsVolumeImpl volume = it.next();
> if (!volume.checkClosed()) {
>   return false;
> }
> it.remove();
>   }
>   return true;
> }
> boolean checkClosed() {
>   // always be true.
>   if (this.reference.getReferenceCount() > 0) {
> FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.",
> this, reference.getReferenceCount());
> return false;
>   }
>   return true;
> }
> {code}
> At the same time, because the thread has been holding checkDirsLock when 
> removing the volume, other threads trying to acquire the same lock will be 
> permanently blocked.
> Similar problems also occur in RamDiskAsyncLazyPersistService and 
> FsDatasetAsyncDiskService.
> This patch releases the three previously unreleased volume references.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16073) Remove redundant RPC requests for getFileLinkInfo in ClientNamenodeProtocolTranslatorPB

2021-06-16 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364197#comment-17364197
 ] 

Ayush Saxena commented on HDFS-16073:
-

Committed to trunk, branch-3.3 and 3.2.
Thanx [~lei w] for the contribution!!!

> Remove redundant RPC requests for getFileLinkInfo in 
> ClientNamenodeProtocolTranslatorPB
> ---
>
> Key: HDFS-16073
> URL: https://issues.apache.org/jira/browse/HDFS-16073
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16073.001.patch, HDFS-16073.patch
>
>
> Remove redundant RPC requests for getFileLinkInfo in 
> ClientNamenodeProtocolTranslatorPB. The original logic is as follows:
> {code:java}
> @Override
> public HdfsFileStatus getFileLinkInfo(String src) throws IOException {
>   GetFileLinkInfoRequestProto req = GetFileLinkInfoRequestProto.newBuilder()
>   .setSrc(src).build();
>   try {
> GetFileLinkInfoResponseProto result = rpcProxy.getFileLinkInfo(null, 
> req);// First getFileLinkInfo RPC request
> return result.hasFs() ?
> PBHelperClient.convert(rpcProxy.getFileLinkInfo(null, req).getFs()) 
> :// Repeated getFileLinkInfo RPC request
> null;
>   } catch (ServiceException e) {
> throw ProtobufHelper.getRemoteException(e);
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16073) Remove redundant RPC requests for getFileLinkInfo in ClientNamenodeProtocolTranslatorPB

2021-06-16 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-16073:

Fix Version/s: 3.3.2
   3.2.3
   3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Remove redundant RPC requests for getFileLinkInfo in 
> ClientNamenodeProtocolTranslatorPB
> ---
>
> Key: HDFS-16073
> URL: https://issues.apache.org/jira/browse/HDFS-16073
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: HDFS-16073.001.patch, HDFS-16073.patch
>
>
> Remove redundant RPC requests for getFileLinkInfo in 
> ClientNamenodeProtocolTranslatorPB. The original logic is as follows:
> {code:java}
> @Override
> public HdfsFileStatus getFileLinkInfo(String src) throws IOException {
>   GetFileLinkInfoRequestProto req = GetFileLinkInfoRequestProto.newBuilder()
>   .setSrc(src).build();
>   try {
> GetFileLinkInfoResponseProto result = rpcProxy.getFileLinkInfo(null, 
> req);// First getFileLinkInfo RPC request
> return result.hasFs() ?
> PBHelperClient.convert(rpcProxy.getFileLinkInfo(null, req).getFs()) 
> :// Repeated getFileLinkInfo RPC request
> null;
>   } catch (ServiceException e) {
> throw ProtobufHelper.getRemoteException(e);
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-16073) Remove redundant RPC requests for getFileLinkInfo in ClientNamenodeProtocolTranslatorPB

2021-06-16 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HDFS-16073:
---

Assignee: lei w

> Remove redundant RPC requests for getFileLinkInfo in 
> ClientNamenodeProtocolTranslatorPB
> ---
>
> Key: HDFS-16073
> URL: https://issues.apache.org/jira/browse/HDFS-16073
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16073.001.patch, HDFS-16073.patch
>
>
> Remove redundant RPC requests for getFileLinkInfo in 
> ClientNamenodeProtocolTranslatorPB. The original logic is as follows:
> {code:java}
> @Override
> public HdfsFileStatus getFileLinkInfo(String src) throws IOException {
>   GetFileLinkInfoRequestProto req = GetFileLinkInfoRequestProto.newBuilder()
>   .setSrc(src).build();
>   try {
> GetFileLinkInfoResponseProto result = rpcProxy.getFileLinkInfo(null, 
> req);// First getFileLinkInfo RPC request
> return result.hasFs() ?
> PBHelperClient.convert(rpcProxy.getFileLinkInfo(null, req).getFs()) 
> :// Repeated getFileLinkInfo RPC request
> null;
>   } catch (ServiceException e) {
> throw ProtobufHelper.getRemoteException(e);
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16073) Remove redundant RPC requests for getFileLinkInfo in ClientNamenodeProtocolTranslatorPB

2021-06-16 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364194#comment-17364194
 ] 

Ayush Saxena commented on HDFS-16073:
-

v001 LGTM +1

> Remove redundant RPC requests for getFileLinkInfo in 
> ClientNamenodeProtocolTranslatorPB
> ---
>
> Key: HDFS-16073
> URL: https://issues.apache.org/jira/browse/HDFS-16073
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: lei w
>Priority: Minor
> Attachments: HDFS-16073.001.patch, HDFS-16073.patch
>
>
> Remove redundant RPC requests for getFileLinkInfo in 
> ClientNamenodeProtocolTranslatorPB. The original logic is as follows:
> {code:java}
> @Override
> public HdfsFileStatus getFileLinkInfo(String src) throws IOException {
>   GetFileLinkInfoRequestProto req = GetFileLinkInfoRequestProto.newBuilder()
>   .setSrc(src).build();
>   try {
> GetFileLinkInfoResponseProto result = rpcProxy.getFileLinkInfo(null, 
> req);// First getFileLinkInfo RPC request
> return result.hasFs() ?
> PBHelperClient.convert(rpcProxy.getFileLinkInfo(null, req).getFs()) 
> :// Repeated getFileLinkInfo RPC request
> null;
>   } catch (ServiceException e) {
> throw ProtobufHelper.getRemoteException(e);
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16072) TestBlockRecovery fails consistently on Branch-2.10

2021-06-16 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364169#comment-17364169
 ] 

Viraj Jasani commented on HDFS-16072:
-

Apologies [~ahussein], I should have added stacktrace of flaky failures on 
HDFS-15940. IIRC, I have seen IllegalThreadStateException in a couple of flaky 
tests that I was looking into, most likely this might be the one. In any case, 
I believe we should backport refactoring of tests from HDFS-15940. This 
refactor was major relief AFAIK and I haven't seen test failure after this 
patch.

> TestBlockRecovery fails consistently on Branch-2.10
> ---
>
> Key: HDFS-16072
> URL: https://issues.apache.org/jira/browse/HDFS-16072
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, test
>Affects Versions: 2.10.1
>Reporter: Ahmed Hussein
>Priority: Major
>
> {{TestBlockRecovery}} fails on branch-2.10 consistently.
> I found that the failures were reported in the Qbt-Reports since March 2021 
> to say the least.
> {code:bash}
> [ERROR] Tests run: 20, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 
> 21.422 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery
> [ERROR] 
> testRaceBetweenReplicaRecoveryAndFinalizeBlock(org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery)
>   Time elapsed: 2.814 s  <<< ERROR!
> java.io.IOException: com.google.protobuf.ServiceException: 
> java.lang.IllegalThreadStateException
>   at 
> org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDatanodeReport(ClientNamenodeProtocolTranslatorPB.java:656)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:607)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy28.getDatanodeReport(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.datanodeReport(DFSClient.java:2132)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.waitActive(MiniDFSCluster.java:2699)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.waitActive(MiniDFSCluster.java:2743)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1723)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:905)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:517)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:476)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery.testRaceBetweenReplicaRecoveryAndFinalizeBlock(TestBlockRecovery.java:694)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:607)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: com.google.protobuf.ServiceException: 
> java.lang.IllegalThreadStateException
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:244)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>   at com.sun.proxy.$Proxy27.getDatanodeReport(Unkno

[jira] [Work logged] (HDFS-16074) Remove an expensive debug string concatenation

2021-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16074?focusedWorklogId=611801&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611801
 ]

ASF GitHub Bot logged work on HDFS-16074:
-

Author: ASF GitHub Bot
Created on: 16/Jun/21 08:21
Start Date: 16/Jun/21 08:21
Worklog Time Spent: 10m 
  Work Description: jojochuang commented on pull request #3107:
URL: https://github.com/apache/hadoop/pull/3107#issuecomment-862157811


   Wow. thanks a lot everyone for jumping on this one :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 611801)
Time Spent: 40m  (was: 0.5h)

> Remove an expensive debug string concatenation
> --
>
> Key: HDFS-16074
> URL: https://issues.apache.org/jira/browse/HDFS-16074
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Shot 2021-06-16 at 2.32.29 PM.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Running a YCSB load query, found that we do an expensive string concatenation 
> on the write path in DFSOutputStream.writeChunkPrepare(). 
> Nearly 25% of HDFS client write CPU time is spent here. It is not necessary 
> because it's supposed to be a debug message. So let's remove it.
> {code}
>  if (currentPacket == null) {
>   currentPacket = createPacket(packetSize, chunksPerPacket, getStreamer()
>   .getBytesCurBlock(), getStreamer().getAndIncCurrentSeqno(), false);
>   DFSClient.LOG.debug("WriteChunk allocating new packet seqno={},"
>   + " src={}, packetSize={}, chunksPerPacket={}, 
> bytesCurBlock={}",
>   currentPacket.getSeqno(), src, packetSize, chunksPerPacket,
>   getStreamer().getBytesCurBlock() + ", " + this); < here
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16073) Remove redundant RPC requests for getFileLinkInfo in ClientNamenodeProtocolTranslatorPB

2021-06-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364132#comment-17364132
 ] 

Hadoop QA commented on HDFS-16073:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
24s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
47s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 49s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 20m 
44s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  2m 
43s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  5s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {col

[jira] [Work logged] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2021-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13671?focusedWorklogId=611800&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611800
 ]

ASF GitHub Bot logged work on HDFS-13671:
-

Author: ASF GitHub Bot
Created on: 16/Jun/21 08:10
Start Date: 16/Jun/21 08:10
Worklog Time Spent: 10m 
  Work Description: AlphaGouGe commented on pull request #3065:
URL: https://github.com/apache/hadoop/pull/3065#issuecomment-862150204


   Thanks @kihwal @xiaoyuyao @sodonnel for review, i have updated the PR, take 
a look please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 611800)
Time Spent: 5h 50m  (was: 5h 40m)

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Assignee: Haibin Huang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, 
> image-2021-06-10-19-28-58-359.png
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-

[jira] [Work logged] (HDFS-16074) Remove an expensive debug string concatenation

2021-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16074?focusedWorklogId=611797&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611797
 ]

ASF GitHub Bot logged work on HDFS-16074:
-

Author: ASF GitHub Bot
Created on: 16/Jun/21 08:08
Start Date: 16/Jun/21 08:08
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3107:
URL: https://github.com/apache/hadoop/pull/3107#issuecomment-862148691


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 56s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 44s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m  1s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 50s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 31s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 56s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 43s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 34s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  18m  6s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 51s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  0s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   1m  0s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 45s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 45s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m  0s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 36s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 28s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  15m 16s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 18s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 33s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  83m 54s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3107/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3107 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux c1a69f2c4c91 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 
17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / c8ba1a525edb66f21ce8849f21399b9e166c8e6e |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3107/1/testReport/ |
   | Max. process+thread count | 552 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-client U: 
hadoop-hdfs-project/hadoop-hdfs-client |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3107/1/console |
  

[jira] [Work logged] (HDFS-16070) DataTransfer block storm when datanode's io is busy.

2021-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16070?focusedWorklogId=611798&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611798
 ]

ASF GitHub Bot logged work on HDFS-16070:
-

Author: ASF GitHub Bot
Created on: 16/Jun/21 08:08
Start Date: 16/Jun/21 08:08
Worklog Time Spent: 10m 
  Work Description: zhengchenyu commented on pull request #3105:
URL: https://github.com/apache/hadoop/pull/3105#issuecomment-862149211


   @goiri Thanks for your help, I have fix by your advice. and rebase to commit 
f46f2a8dc72e60580206c98c73e775631175706f.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 611798)
Time Spent: 3h 20m  (was: 3h 10m)

> DataTransfer block storm when datanode's io is busy.
> 
>
> Key: HDFS-16070
> URL: https://issues.apache.org/jira/browse/HDFS-16070
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.3.0, 3.2.1
>Reporter: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> When I speed up the decommission, I found that some datanode's io is busy, 
> then I found host's load is very high, and ten thousands data transfer thread 
> are running. 
> Then I find log like below.
> {code}
> # setup datatranfer log
> 2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.7.52:9866
> 2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.7.31:9866
> 2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.16.50:9866
> # datatranfer done log
> 2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
> (numBytes=7457424) to /10.201.7.52:9866
> 2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
> (numBytes=7457424) to /10.201.16.50:9866
> {code}
> You will see last datatranfser thread was done on 13:54:08, but next 
> datatranfser was start at 13:52:36. 
> If datatranfser was not done in 10min(pending timeout + check interval), then 
> next datatranfser for same block will be running. Then disk and network are 
> heavy.
> Note: decommission ec block will trigger this problem easily, becuase every 
> ec internal block are unique. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2021-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13671?focusedWorklogId=611796&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611796
 ]

ASF GitHub Bot logged work on HDFS-13671:
-

Author: ASF GitHub Bot
Created on: 16/Jun/21 08:08
Start Date: 16/Jun/21 08:08
Worklog Time Spent: 10m 
  Work Description: AlphaGouGe commented on a change in pull request #3065:
URL: https://github.com/apache/hadoop/pull/3065#discussion_r652451949



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
##
@@ -3111,106 +3042,127 @@ void processFirstBlockReport(
 }
   }
 
-  private void reportDiffSorted(DatanodeStorageInfo storageInfo,
-  Iterable newReport,
+  private void reportDiff(DatanodeStorageInfo storageInfo,
+  BlockListAsLongs newReport,
   Collection toAdd, // add to DatanodeDescriptor
   Collection toRemove,   // remove from DatanodeDescriptor
   Collection toInvalidate,   // should be removed from DN
   Collection toCorrupt, // add to corrupt replicas list
   Collection toUC) { // add to under-construction list
 
-// The blocks must be sorted and the storagenodes blocks must be sorted
-Iterator storageBlocksIterator = storageInfo.getBlockIterator();
+// place a delimiter in the list which separates blocks
+// that have been reported from those that have not
 DatanodeDescriptor dn = storageInfo.getDatanodeDescriptor();
-BlockInfo storageBlock = null;
-
-for (BlockReportReplica replica : newReport) {
-
-  long replicaID = replica.getBlockId();
-  if (BlockIdManager.isStripedBlockID(replicaID)
-  && (!hasNonEcBlockUsingStripedID ||
-  !blocksMap.containsBlock(replica))) {
-replicaID = BlockIdManager.convertToStripedID(replicaID);
-  }
-
-  ReplicaState reportedState = replica.getState();
-
-  LOG.debug("Reported block {} on {} size {} replicaState = {}",
-  replica, dn, replica.getNumBytes(), reportedState);
-
-  if (shouldPostponeBlocksFromFuture
-  && isGenStampInFuture(replica)) {
-queueReportedBlock(storageInfo, replica, reportedState,
-   QUEUE_REASON_FUTURE_GENSTAMP);
-continue;
-  }
-
-  if (storageBlock == null && storageBlocksIterator.hasNext()) {
-storageBlock = storageBlocksIterator.next();
-  }
-
-  do {
-int cmp;
-if (storageBlock == null ||
-(cmp = Long.compare(replicaID, storageBlock.getBlockId())) < 0) {
-  // Check if block is available in NN but not yet on this storage
-  BlockInfo nnBlock = blocksMap.getStoredBlock(new Block(replicaID));
-  if (nnBlock != null) {
-reportDiffSortedInner(storageInfo, replica, reportedState,
-  nnBlock, toAdd, toCorrupt, toUC);
-  } else {
-// Replica not found anywhere so it should be invalidated
-toInvalidate.add(new Block(replica));
-  }
-  break;
-} else if (cmp == 0) {
-  // Replica matched current storageblock
-  reportDiffSortedInner(storageInfo, replica, reportedState,
-storageBlock, toAdd, toCorrupt, toUC);
-  storageBlock = null;
-} else {
-  // replica has higher ID than storedBlock
-  // Remove all stored blocks with IDs lower than replica
-  do {
-toRemove.add(storageBlock);
-storageBlock = storageBlocksIterator.hasNext()
-   ? storageBlocksIterator.next() : null;
-  } while (storageBlock != null &&
-   Long.compare(replicaID, storageBlock.getBlockId()) > 0);
+Block delimiterBlock = new Block();
+BlockInfo delimiter = new BlockInfoContiguous(delimiterBlock,
+(short) 1);
+AddBlockResult result = storageInfo.addBlock(delimiter, delimiterBlock);
+assert result == AddBlockResult.ADDED
+: "Delimiting block cannot be present in the node";
+int headIndex = 0; //currently the delimiter is in the head of the list
+int curIndex;
+
+if (newReport == null) {
+  newReport = BlockListAsLongs.EMPTY;
+}
+// scan the report and process newly reported blocks
+for (BlockReportReplica iblk : newReport) {
+  ReplicaState iState = iblk.getState();
+  LOG.debug("Reported block {} on {} size {} replicaState = {}", iblk, dn,
+  iblk.getNumBytes(), iState);
+  BlockInfo storedBlock = processReportedBlock(storageInfo,
+  iblk, iState, toAdd, toInvalidate, toCorrupt, toUC);
+
+  // move block to the head of the list
+  if (storedBlock != null) {
+curIndex = storedBlock.findStorageInfo(storageInfo);
+if (curIndex >= 0) {
+  headIndex =
+  st

[jira] [Work logged] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2021-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13671?focusedWorklogId=611795&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611795
 ]

ASF GitHub Bot logged work on HDFS-13671:
-

Author: ASF GitHub Bot
Created on: 16/Jun/21 08:06
Start Date: 16/Jun/21 08:06
Worklog Time Spent: 10m 
  Work Description: AlphaGouGe commented on a change in pull request #3065:
URL: https://github.com/apache/hadoop/pull/3065#discussion_r652450652



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
##
@@ -3220,21 +3173,28 @@ private void reportDiffSortedInner(
 // comes from the IBR / FBR and hence what we should use to compare
 // against the memory state.
 // See HDFS-6289 and HDFS-15422 for more context.
-queueReportedBlock(storageInfo, replica, reportedState,
+queueReportedBlock(storageInfo, storedBlock, reportedState,

Review comment:
   sorry for missing the change on HDFS-15422, i have changed it back.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 611795)
Time Spent: 5.5h  (was: 5h 20m)

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Assignee: Haibin Huang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, 
> image-2021-06-10-19-28-58-359.png
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree n

[jira] [Work logged] (HDFS-16070) DataTransfer block storm when datanode's io is busy.

2021-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16070?focusedWorklogId=611792&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611792
 ]

ASF GitHub Bot logged work on HDFS-16070:
-

Author: ASF GitHub Bot
Created on: 16/Jun/21 07:59
Start Date: 16/Jun/21 07:59
Worklog Time Spent: 10m 
  Work Description: zhengchenyu commented on a change in pull request #3105:
URL: https://github.com/apache/hadoop/pull/3105#discussion_r652445254



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
##
@@ -2646,6 +2655,7 @@ public void run() {
   } catch (Throwable t) {
 LOG.error("Failed to transfer block {}", b, t);
   } finally {
+transferringBlock.remove(b);

Review comment:
   I don't catch you advice, transferringBlock.add and 
transferringBlock.remove could be only called DataTransfer.run, I think won't 
leave garbage.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 611792)
Time Spent: 3h 10m  (was: 3h)

> DataTransfer block storm when datanode's io is busy.
> 
>
> Key: HDFS-16070
> URL: https://issues.apache.org/jira/browse/HDFS-16070
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.3.0, 3.2.1
>Reporter: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> When I speed up the decommission, I found that some datanode's io is busy, 
> then I found host's load is very high, and ten thousands data transfer thread 
> are running. 
> Then I find log like below.
> {code}
> # setup datatranfer log
> 2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.7.52:9866
> 2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.7.31:9866
> 2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.16.50:9866
> # datatranfer done log
> 2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
> (numBytes=7457424) to /10.201.7.52:9866
> 2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
> (numBytes=7457424) to /10.201.16.50:9866
> {code}
> You will see last datatranfser thread was done on 13:54:08, but next 
> datatranfser was start at 13:52:36. 
> If datatranfser was not done in 10min(pending timeout + check interval), then 
> next datatranfser for same block will be running. Then disk and network are 
> heavy.
> Note: decommission ec block will trigger this problem easily, becuase every 
> ec internal block are unique. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16065) RBF: Add metrics to record Router's operations

2021-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16065?focusedWorklogId=611788&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611788
 ]

ASF GitHub Bot logged work on HDFS-16065:
-

Author: ASF GitHub Bot
Created on: 16/Jun/21 07:47
Start Date: 16/Jun/21 07:47
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3100:
URL: https://github.com/apache/hadoop/pull/3100#issuecomment-862134661


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 53s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  30m 58s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 43s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 28s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 53s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 15s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  15m  7s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  15m 27s |  |  Used diff version of patch file. 
Binary files and potentially other changes not applied. Please rebase and 
squash commits if necessary.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 21s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3100/5/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 569 new + 0 
unchanged - 0 fixed = 569 total (was 0)  |
   | +1 :green_heart: |  mvnsite  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 33s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 53s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  14m 52s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  27m 12s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 33s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 102m  1s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3100/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3100 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux c3b0a7e76131 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 
17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 674d510e7f4ad0f7c74ed0254df608931f061d26 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Result

[jira] [Work logged] (HDFS-16070) DataTransfer block storm when datanode's io is busy.

2021-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16070?focusedWorklogId=611782&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611782
 ]

ASF GitHub Bot logged work on HDFS-16070:
-

Author: ASF GitHub Bot
Created on: 16/Jun/21 07:39
Start Date: 16/Jun/21 07:39
Worklog Time Spent: 10m 
  Work Description: zhengchenyu commented on a change in pull request #3105:
URL: https://github.com/apache/hadoop/pull/3105#discussion_r652427364



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
##
@@ -2394,16 +2396,22 @@ void transferBlock(ExtendedBlock block, DatanodeInfo[] 
xferTargets,
 
 int numTargets = xferTargets.length;
 if (numTargets > 0) {
-  final String xferTargetsString =
-  StringUtils.join(" ", Arrays.asList(xferTargets));
-  LOG.info("{} Starting thread to transfer {} to {}", bpReg, block,
-  xferTargetsString);
+  if (transferringBlock.contains(block)) {
+LOG.warn(

Review comment:
   Just like I said, when datanode's io is busy, DataTransfer will stuck 
for long time, the pendingReconstructBlock will timeout. Then a duplicated 
DataTransfer thread for same block will start, and the duplicated DataTransfer 
thread will make io busier. In fact, we found our datanode's io is busy for 
long time util we restart datanode. So I think we should avoid the duplicated 
DataTransfer.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 611782)
Time Spent: 3h  (was: 2h 50m)

> DataTransfer block storm when datanode's io is busy.
> 
>
> Key: HDFS-16070
> URL: https://issues.apache.org/jira/browse/HDFS-16070
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.3.0, 3.2.1
>Reporter: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> When I speed up the decommission, I found that some datanode's io is busy, 
> then I found host's load is very high, and ten thousands data transfer thread 
> are running. 
> Then I find log like below.
> {code}
> # setup datatranfer log
> 2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.7.52:9866
> 2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.7.31:9866
> 2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.16.50:9866
> # datatranfer done log
> 2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
> (numBytes=7457424) to /10.201.7.52:9866
> 2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
> (numBytes=7457424) to /10.201.16.50:9866
> {code}
> You will see last datatranfser thread was done on 13:54:08, but next 
> datatranfser was start at 13:52:36. 
> If datatranfser was not done in 10min(pending timeout + check interval), then 
> next datatranfser for same block will be running. Then disk and network are 
> heavy.
> Note: decommission ec block will trigger this problem easily, becuase every 
> ec internal block are unique.

[jira] [Work logged] (HDFS-16070) DataTransfer block storm when datanode's io is busy.

2021-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16070?focusedWorklogId=611779&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611779
 ]

ASF GitHub Bot logged work on HDFS-16070:
-

Author: ASF GitHub Bot
Created on: 16/Jun/21 07:36
Start Date: 16/Jun/21 07:36
Worklog Time Spent: 10m 
  Work Description: zhengchenyu commented on a change in pull request #3105:
URL: https://github.com/apache/hadoop/pull/3105#discussion_r652421980



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBusyIODataNode.java
##
@@ -0,0 +1,221 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdfs.server.datanode;
+
+import static org.mockito.Mockito.atLeastOnce;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.verify;
+
+import java.io.IOException;
+import java.lang.reflect.Field;
+import java.lang.reflect.Modifier;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Random;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.CommonConfigurationKeys;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DFSConfigKeys;
+import org.apache.hadoop.hdfs.HdfsConfiguration;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hdfs.protocol.ExtendedBlock;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockManager;
+import org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor;
+import org.apache.hadoop.hdfs.server.blockmanagement.NumberReplicas;
+import org.apache.hadoop.hdfs.server.namenode.FSNamesystem;
+import org.apache.hadoop.hdfs.server.namenode.INodeFile;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class TestBusyIODataNode {
+
+  public static final Logger LOG = LoggerFactory.getLogger(TestBusyIODataNode
+  .class);
+
+  private MiniDFSCluster cluster;
+  private Configuration conf;
+  private FSNamesystem fsn;
+  private BlockManager bm;
+
+  static final long SEED = 0xDEADBEEFL;
+  static final int BLOCK_SIZE = 8192;
+  private static final int HEARTBEAT_INTERVAL = 1;
+
+  private final Path dir = new Path("/" + this.getClass().getSimpleName());
+
+  @Before
+  public void setUp() throws Exception {
+conf = new HdfsConfiguration();
+conf.setTimeDuration(
+DFSConfigKeys.DFS_DATANODE_DISK_CHECK_MIN_GAP_KEY,
+0, TimeUnit.MILLISECONDS);
+conf.setInt(DFSConfigKeys.DFS_REPLICATION_KEY, 1);
+conf.setInt(
+DFSConfigKeys.DFS_NAMENODE_RECONSTRUCTION_PENDING_TIMEOUT_SEC_KEY,
+1);
+conf.setInt(DFSConfigKeys.DFS_NAMENODE_REDUNDANCY_INTERVAL_SECONDS_KEY, 1);
+conf.setInt(DFSConfigKeys.DFS_BLOCKREPORT_INTERVAL_MSEC_KEY, 1000);
+conf.setInt(DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, HEARTBEAT_INTERVAL);
+cluster = new MiniDFSCluster.Builder(conf).numDataNodes(2).build();
+cluster.waitActive();
+fsn = cluster.getNamesystem();
+bm = fsn.getBlockManager();
+  }
+
+  @After
+  public void tearDown() throws Exception {
+if (cluster != null) {
+  cluster.shutdown();
+  cluster = null;
+}
+  }
+
+  static protected void writeFile(FileSystem fileSys, Path name, int repl)
+  throws IOException {
+writeFile(fileSys, name, repl, 2);
+  }
+
+  static protected void writeFile(FileSystem fileSys, Path name, int repl,
+  int numOfBlocks) throws IOException {
+writeFile(fileSys, name, repl, numOfBlocks, true);
+  }
+
+  static protected FSDataOutputStream writeFile(FileSystem fileSys, Path name,
+  int repl, int numOfBlocks, boolean completeFile)
+  throws IOException {
+// create and write a file that contains two blocks of data
+FSDataOutputStream stm = fileSys.create(name, true, fileSys.getConf()

[jira] [Work logged] (HDFS-16070) DataTransfer block storm when datanode's io is busy.

2021-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16070?focusedWorklogId=611778&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611778
 ]

ASF GitHub Bot logged work on HDFS-16070:
-

Author: ASF GitHub Bot
Created on: 16/Jun/21 07:35
Start Date: 16/Jun/21 07:35
Worklog Time Spent: 10m 
  Work Description: zhengchenyu commented on a change in pull request #3105:
URL: https://github.com/apache/hadoop/pull/3105#discussion_r652427364



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
##
@@ -2394,16 +2396,22 @@ void transferBlock(ExtendedBlock block, DatanodeInfo[] 
xferTargets,
 
 int numTargets = xferTargets.length;
 if (numTargets > 0) {
-  final String xferTargetsString =
-  StringUtils.join(" ", Arrays.asList(xferTargets));
-  LOG.info("{} Starting thread to transfer {} to {}", bpReg, block,
-  xferTargetsString);
+  if (transferringBlock.contains(block)) {
+LOG.warn(

Review comment:
   Just like I said, when datanode's io is busy, DataTransfer will stuck 
for long time, the pendingReconstructBlock will timeout. Then a duplicated 
DataTransfer thread for same block will start, and the duplicated DataTransfer 
thread will make io busier. So I think we should avoid the duplicated 
DataTransfer.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 611778)
Time Spent: 2h 40m  (was: 2.5h)

> DataTransfer block storm when datanode's io is busy.
> 
>
> Key: HDFS-16070
> URL: https://issues.apache.org/jira/browse/HDFS-16070
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.3.0, 3.2.1
>Reporter: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When I speed up the decommission, I found that some datanode's io is busy, 
> then I found host's load is very high, and ten thousands data transfer thread 
> are running. 
> Then I find log like below.
> {code}
> # setup datatranfer log
> 2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.7.52:9866
> 2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.7.31:9866
> 2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.16.50:9866
> # datatranfer done log
> 2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
> (numBytes=7457424) to /10.201.7.52:9866
> 2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
> (numBytes=7457424) to /10.201.16.50:9866
> {code}
> You will see last datatranfser thread was done on 13:54:08, but next 
> datatranfser was start at 13:52:36. 
> If datatranfser was not done in 10min(pending timeout + check interval), then 
> next datatranfser for same block will be running. Then disk and network are 
> heavy.
> Note: decommission ec block will trigger this problem easily, becuase every 
> ec internal block are unique. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

--

[jira] [Work logged] (HDFS-16070) DataTransfer block storm when datanode's io is busy.

2021-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16070?focusedWorklogId=611773&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611773
 ]

ASF GitHub Bot logged work on HDFS-16070:
-

Author: ASF GitHub Bot
Created on: 16/Jun/21 07:31
Start Date: 16/Jun/21 07:31
Worklog Time Spent: 10m 
  Work Description: zhengchenyu commented on a change in pull request #3105:
URL: https://github.com/apache/hadoop/pull/3105#discussion_r652424224



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBusyIODataNode.java
##
@@ -0,0 +1,221 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdfs.server.datanode;
+
+import static org.mockito.Mockito.atLeastOnce;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.verify;
+
+import java.io.IOException;
+import java.lang.reflect.Field;
+import java.lang.reflect.Modifier;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Random;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.CommonConfigurationKeys;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DFSConfigKeys;
+import org.apache.hadoop.hdfs.HdfsConfiguration;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hdfs.protocol.ExtendedBlock;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockManager;
+import org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor;
+import org.apache.hadoop.hdfs.server.blockmanagement.NumberReplicas;
+import org.apache.hadoop.hdfs.server.namenode.FSNamesystem;
+import org.apache.hadoop.hdfs.server.namenode.INodeFile;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class TestBusyIODataNode {
+
+  public static final Logger LOG = LoggerFactory.getLogger(TestBusyIODataNode
+  .class);
+
+  private MiniDFSCluster cluster;
+  private Configuration conf;
+  private FSNamesystem fsn;
+  private BlockManager bm;
+
+  static final long SEED = 0xDEADBEEFL;
+  static final int BLOCK_SIZE = 8192;
+  private static final int HEARTBEAT_INTERVAL = 1;
+
+  private final Path dir = new Path("/" + this.getClass().getSimpleName());
+
+  @Before
+  public void setUp() throws Exception {
+conf = new HdfsConfiguration();
+conf.setTimeDuration(
+DFSConfigKeys.DFS_DATANODE_DISK_CHECK_MIN_GAP_KEY,
+0, TimeUnit.MILLISECONDS);
+conf.setInt(DFSConfigKeys.DFS_REPLICATION_KEY, 1);
+conf.setInt(
+DFSConfigKeys.DFS_NAMENODE_RECONSTRUCTION_PENDING_TIMEOUT_SEC_KEY,
+1);
+conf.setInt(DFSConfigKeys.DFS_NAMENODE_REDUNDANCY_INTERVAL_SECONDS_KEY, 1);
+conf.setInt(DFSConfigKeys.DFS_BLOCKREPORT_INTERVAL_MSEC_KEY, 1000);
+conf.setInt(DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, HEARTBEAT_INTERVAL);
+cluster = new MiniDFSCluster.Builder(conf).numDataNodes(2).build();
+cluster.waitActive();
+fsn = cluster.getNamesystem();
+bm = fsn.getBlockManager();
+  }
+
+  @After
+  public void tearDown() throws Exception {
+if (cluster != null) {
+  cluster.shutdown();
+  cluster = null;
+}
+  }
+
+  static protected void writeFile(FileSystem fileSys, Path name, int repl)
+  throws IOException {
+writeFile(fileSys, name, repl, 2);
+  }
+
+  static protected void writeFile(FileSystem fileSys, Path name, int repl,
+  int numOfBlocks) throws IOException {
+writeFile(fileSys, name, repl, numOfBlocks, true);
+  }
+
+  static protected FSDataOutputStream writeFile(FileSystem fileSys, Path name,
+  int repl, int numOfBlocks, boolean completeFile)
+  throws IOException {
+// create and write a file that contains two blocks of data
+FSDataOutputStream stm = fileSys.create(name, true, fileSys.getConf()

[jira] [Work logged] (HDFS-16070) DataTransfer block storm when datanode's io is busy.

2021-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16070?focusedWorklogId=611771&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611771
 ]

ASF GitHub Bot logged work on HDFS-16070:
-

Author: ASF GitHub Bot
Created on: 16/Jun/21 07:28
Start Date: 16/Jun/21 07:28
Worklog Time Spent: 10m 
  Work Description: zhengchenyu commented on a change in pull request #3105:
URL: https://github.com/apache/hadoop/pull/3105#discussion_r652421980



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBusyIODataNode.java
##
@@ -0,0 +1,221 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdfs.server.datanode;
+
+import static org.mockito.Mockito.atLeastOnce;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.verify;
+
+import java.io.IOException;
+import java.lang.reflect.Field;
+import java.lang.reflect.Modifier;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Random;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.CommonConfigurationKeys;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DFSConfigKeys;
+import org.apache.hadoop.hdfs.HdfsConfiguration;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hdfs.protocol.ExtendedBlock;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockManager;
+import org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor;
+import org.apache.hadoop.hdfs.server.blockmanagement.NumberReplicas;
+import org.apache.hadoop.hdfs.server.namenode.FSNamesystem;
+import org.apache.hadoop.hdfs.server.namenode.INodeFile;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class TestBusyIODataNode {
+
+  public static final Logger LOG = LoggerFactory.getLogger(TestBusyIODataNode
+  .class);
+
+  private MiniDFSCluster cluster;
+  private Configuration conf;
+  private FSNamesystem fsn;
+  private BlockManager bm;
+
+  static final long SEED = 0xDEADBEEFL;
+  static final int BLOCK_SIZE = 8192;
+  private static final int HEARTBEAT_INTERVAL = 1;
+
+  private final Path dir = new Path("/" + this.getClass().getSimpleName());
+
+  @Before
+  public void setUp() throws Exception {
+conf = new HdfsConfiguration();
+conf.setTimeDuration(
+DFSConfigKeys.DFS_DATANODE_DISK_CHECK_MIN_GAP_KEY,
+0, TimeUnit.MILLISECONDS);
+conf.setInt(DFSConfigKeys.DFS_REPLICATION_KEY, 1);
+conf.setInt(
+DFSConfigKeys.DFS_NAMENODE_RECONSTRUCTION_PENDING_TIMEOUT_SEC_KEY,
+1);
+conf.setInt(DFSConfigKeys.DFS_NAMENODE_REDUNDANCY_INTERVAL_SECONDS_KEY, 1);
+conf.setInt(DFSConfigKeys.DFS_BLOCKREPORT_INTERVAL_MSEC_KEY, 1000);
+conf.setInt(DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, HEARTBEAT_INTERVAL);
+cluster = new MiniDFSCluster.Builder(conf).numDataNodes(2).build();
+cluster.waitActive();
+fsn = cluster.getNamesystem();
+bm = fsn.getBlockManager();
+  }
+
+  @After
+  public void tearDown() throws Exception {
+if (cluster != null) {
+  cluster.shutdown();
+  cluster = null;
+}
+  }
+
+  static protected void writeFile(FileSystem fileSys, Path name, int repl)
+  throws IOException {
+writeFile(fileSys, name, repl, 2);
+  }
+
+  static protected void writeFile(FileSystem fileSys, Path name, int repl,
+  int numOfBlocks) throws IOException {
+writeFile(fileSys, name, repl, numOfBlocks, true);
+  }
+
+  static protected FSDataOutputStream writeFile(FileSystem fileSys, Path name,
+  int repl, int numOfBlocks, boolean completeFile)
+  throws IOException {
+// create and write a file that contains two blocks of data
+FSDataOutputStream stm = fileSys.create(name, true, fileSys.getConf()

[jira] [Work logged] (HDFS-16070) DataTransfer block storm when datanode's io is busy.

2021-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16070?focusedWorklogId=611767&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611767
 ]

ASF GitHub Bot logged work on HDFS-16070:
-

Author: ASF GitHub Bot
Created on: 16/Jun/21 07:25
Start Date: 16/Jun/21 07:25
Worklog Time Spent: 10m 
  Work Description: zhengchenyu commented on a change in pull request #3105:
URL: https://github.com/apache/hadoop/pull/3105#discussion_r652420209



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBusyIODataNode.java
##
@@ -0,0 +1,221 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdfs.server.datanode;
+
+import static org.mockito.Mockito.atLeastOnce;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.verify;
+
+import java.io.IOException;
+import java.lang.reflect.Field;
+import java.lang.reflect.Modifier;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Random;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.CommonConfigurationKeys;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DFSConfigKeys;
+import org.apache.hadoop.hdfs.HdfsConfiguration;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hdfs.protocol.ExtendedBlock;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockManager;
+import org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor;
+import org.apache.hadoop.hdfs.server.blockmanagement.NumberReplicas;
+import org.apache.hadoop.hdfs.server.namenode.FSNamesystem;
+import org.apache.hadoop.hdfs.server.namenode.INodeFile;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class TestBusyIODataNode {
+
+  public static final Logger LOG = LoggerFactory.getLogger(TestBusyIODataNode
+  .class);
+
+  private MiniDFSCluster cluster;
+  private Configuration conf;
+  private FSNamesystem fsn;
+  private BlockManager bm;
+
+  static final long SEED = 0xDEADBEEFL;
+  static final int BLOCK_SIZE = 8192;
+  private static final int HEARTBEAT_INTERVAL = 1;
+
+  private final Path dir = new Path("/" + this.getClass().getSimpleName());
+
+  @Before
+  public void setUp() throws Exception {
+conf = new HdfsConfiguration();
+conf.setTimeDuration(
+DFSConfigKeys.DFS_DATANODE_DISK_CHECK_MIN_GAP_KEY,
+0, TimeUnit.MILLISECONDS);
+conf.setInt(DFSConfigKeys.DFS_REPLICATION_KEY, 1);
+conf.setInt(
+DFSConfigKeys.DFS_NAMENODE_RECONSTRUCTION_PENDING_TIMEOUT_SEC_KEY,
+1);
+conf.setInt(DFSConfigKeys.DFS_NAMENODE_REDUNDANCY_INTERVAL_SECONDS_KEY, 1);
+conf.setInt(DFSConfigKeys.DFS_BLOCKREPORT_INTERVAL_MSEC_KEY, 1000);
+conf.setInt(DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, HEARTBEAT_INTERVAL);
+cluster = new MiniDFSCluster.Builder(conf).numDataNodes(2).build();
+cluster.waitActive();
+fsn = cluster.getNamesystem();
+bm = fsn.getBlockManager();
+  }
+
+  @After
+  public void tearDown() throws Exception {
+if (cluster != null) {
+  cluster.shutdown();
+  cluster = null;
+}
+  }
+
+  static protected void writeFile(FileSystem fileSys, Path name, int repl)
+  throws IOException {
+writeFile(fileSys, name, repl, 2);
+  }
+
+  static protected void writeFile(FileSystem fileSys, Path name, int repl,
+  int numOfBlocks) throws IOException {
+writeFile(fileSys, name, repl, numOfBlocks, true);
+  }
+
+  static protected FSDataOutputStream writeFile(FileSystem fileSys, Path name,
+  int repl, int numOfBlocks, boolean completeFile)
+  throws IOException {
+// create and write a file that contains two blocks of data
+FSDataOutputStream stm = fileSys.create(name, true, fileSys.getConf()

[jira] [Work logged] (HDFS-16070) DataTransfer block storm when datanode's io is busy.

2021-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16070?focusedWorklogId=611765&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611765
 ]

ASF GitHub Bot logged work on HDFS-16070:
-

Author: ASF GitHub Bot
Created on: 16/Jun/21 07:24
Start Date: 16/Jun/21 07:24
Worklog Time Spent: 10m 
  Work Description: zhengchenyu commented on a change in pull request #3105:
URL: https://github.com/apache/hadoop/pull/3105#discussion_r652419326



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBusyIODataNode.java
##
@@ -0,0 +1,221 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdfs.server.datanode;
+
+import static org.mockito.Mockito.atLeastOnce;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.verify;
+
+import java.io.IOException;
+import java.lang.reflect.Field;
+import java.lang.reflect.Modifier;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Random;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.CommonConfigurationKeys;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DFSConfigKeys;
+import org.apache.hadoop.hdfs.HdfsConfiguration;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hdfs.protocol.ExtendedBlock;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockManager;
+import org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor;
+import org.apache.hadoop.hdfs.server.blockmanagement.NumberReplicas;
+import org.apache.hadoop.hdfs.server.namenode.FSNamesystem;
+import org.apache.hadoop.hdfs.server.namenode.INodeFile;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class TestBusyIODataNode {
+
+  public static final Logger LOG = LoggerFactory.getLogger(TestBusyIODataNode
+  .class);
+
+  private MiniDFSCluster cluster;
+  private Configuration conf;
+  private FSNamesystem fsn;
+  private BlockManager bm;
+
+  static final long SEED = 0xDEADBEEFL;
+  static final int BLOCK_SIZE = 8192;
+  private static final int HEARTBEAT_INTERVAL = 1;
+
+  private final Path dir = new Path("/" + this.getClass().getSimpleName());
+
+  @Before
+  public void setUp() throws Exception {
+conf = new HdfsConfiguration();
+conf.setTimeDuration(
+DFSConfigKeys.DFS_DATANODE_DISK_CHECK_MIN_GAP_KEY,
+0, TimeUnit.MILLISECONDS);
+conf.setInt(DFSConfigKeys.DFS_REPLICATION_KEY, 1);
+conf.setInt(
+DFSConfigKeys.DFS_NAMENODE_RECONSTRUCTION_PENDING_TIMEOUT_SEC_KEY,
+1);
+conf.setInt(DFSConfigKeys.DFS_NAMENODE_REDUNDANCY_INTERVAL_SECONDS_KEY, 1);
+conf.setInt(DFSConfigKeys.DFS_BLOCKREPORT_INTERVAL_MSEC_KEY, 1000);
+conf.setInt(DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, HEARTBEAT_INTERVAL);
+cluster = new MiniDFSCluster.Builder(conf).numDataNodes(2).build();
+cluster.waitActive();
+fsn = cluster.getNamesystem();
+bm = fsn.getBlockManager();
+  }
+
+  @After
+  public void tearDown() throws Exception {
+if (cluster != null) {
+  cluster.shutdown();
+  cluster = null;
+}
+  }
+
+  static protected void writeFile(FileSystem fileSys, Path name, int repl)
+  throws IOException {
+writeFile(fileSys, name, repl, 2);
+  }
+
+  static protected void writeFile(FileSystem fileSys, Path name, int repl,
+  int numOfBlocks) throws IOException {
+writeFile(fileSys, name, repl, numOfBlocks, true);
+  }
+
+  static protected FSDataOutputStream writeFile(FileSystem fileSys, Path name,
+  int repl, int numOfBlocks, boolean completeFile)
+  throws IOException {
+// create and write a file that contains two blocks of data
+FSDataOutputStream stm = fileSys.create(name, true, fileSys.getConf()

[jira] [Work logged] (HDFS-16061) DFTestUtil.waitReplication can produce false positives

2021-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16061?focusedWorklogId=611759&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611759
 ]

ASF GitHub Bot logged work on HDFS-16061:
-

Author: ASF GitHub Bot
Created on: 16/Jun/21 07:09
Start Date: 16/Jun/21 07:09
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #3095:
URL: https://github.com/apache/hadoop/pull/3095#discussion_r652409047



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
##
@@ -206,6 +206,8 @@
   private static final String[] dirNames = {
 "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", 
"nine"
   };
+
+  private static final int WAIT_REPLICATION_ATTEMPTS = 40;

Review comment:
   Why you have pulled this up? Means nobody apart from `waitReplication ` 
method is using it? We could have kept it there itself, the variable is 
private, no one can change it also




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 611759)
Time Spent: 40m  (was: 0.5h)

> DFTestUtil.waitReplication can produce false positives
> --
>
> Key: HDFS-16061
> URL: https://issues.apache.org/jira/browse/HDFS-16061
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> While checking the intermittent failure in 
> TestBalancerRPCDelay#testBalancerRPCDelayQpsDefault described in HDFS-15146, 
> I found that the implementation of waitReplication is incorrect.
> In the last iteration, when {{correctReplFactor}} is {{false}}, the thread 
> sleeps for 1 second, then a {{TimeoutException}} is thrown without check 
> whether the replication was complete in the last second.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org