[ 
https://issues.apache.org/jira/browse/HDFS-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018189#comment-17018189
 ] 

Stephen O'Donnell edited comment on HDFS-15131 at 1/17/20 5:01 PM:
-------------------------------------------------------------------

Tracking a 3rd occurrence of this issue, on yet another cluster. The two issues 
above were on different clusters too, all 3 on CDH 5 versions.

Taking this log from the DN process which had been running for about 5 months:

{code}
2020-01-17 08:32:37,613 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Took 37189ms to process 1 commands from NN
{code}

It was unable to process commands on the DN for 37 seconds. This generally 
indicates the DN is struggling to get the fsdatasetImpl lock within the DN. 
With jstacks captured at 5 second interval, we can see the following.

First the heartbeat thread is blocked waiting on the fsdatasetimpl lock:

{code}
"Thread-41" daemon prio=10 tid=0x0000000003c39000 nid=0x1c6fe1 waiting for 
monitor entry [0x00007f1a4b7b8000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:1990)
        - waiting to lock <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:686)
        at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:632)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:729)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:539)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:695)
        at java.lang.Thread.run(Thread.java:744)
{code}

It stays this way over all samples covering 37 seconds. We can see several 
other threads waiting on the lock, but the holder shows this trace:

{code}
"DataXceiver for client 
DFSClient_attempt_202001170833_0083_m_000005_0_984153468_57 at 
/10.9.65.183:36388 [Receiving block 
BP-1509854702-10.9.65.8-1392815592442:blk_3771075438_1102260174116]" daemon 
prio=10 tid=0x00007f1a681cc800 nid=0xb59d1 runnable [0x00007f1a46361000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1387)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:200)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:675)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
        at java.lang.Thread.run(Thread.java:744)
{code}

Note it is attempting to get a value from FoldedTreeSet, which should be a very 
fast operation.

Looking further samples, here is the call stack holding the lock at each sample 
(5 second interval):

Not sure why this thread is still blocked, but perhaps it is just about to 
transition to running now it has the lock:

{code}
"DataXceiver for client DFSClient_NONMAPREDUCE_-619388227_1 at 
/10.9.68.14:33965 [Sending block 
BP-1509854702-10.9.65.8-1392815592442:blk_2353271472_1100837778570]" daemon 
prio=10 tid=0x00007f1a68a1a000 nid=0xb5a96 waiting for monitor entry 
[0x00007f1a3c3c3000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFileNoExistsCheck(FsDatasetImpl.java:729)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:741)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:411)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:537)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
        at java.lang.Thread.run(Thread.java:744)
{code}

Next sample:

{code}
"DataXceiver for client DFSClient_NONMAPREDUCE_-679188183_1 at 
/10.9.65.169:50917 [Receiving block 
BP-1509854702-10.9.65.8-1392815592442:blk_3771075929_1102260174609]" daemon 
prio=10 tid=0x00007f1a68af4800 nid=0xb5bcf runnable [0x00007f1a3b8be000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1387)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:200)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:675)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
        at java.lang.Thread.run(Thread.java:744)
{code}

Next:

{code}
"PacketResponder: 
BP-1509854702-10.9.65.8-1392815592442:blk_3771076285_1102260174968, 
type=HAS_DOWNSTREAM_IN_PIPELINE" daemon prio=10 tid=0x00007f1a6d476000 
nid=0xb5d09 runnable [0x00007f1a34cf0000]
   java.lang.Thread.State: RUNNABLE
        at 
org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:472)
        at org.apache.hadoop.hdfs.util.FoldedTreeSet.add(FoldedTreeSet.java:565)
        at 
org.apache.hadoop.hdfs.util.FoldedTreeSet.addOrReplace(FoldedTreeSet.java:527)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.add(ReplicaMap.java:133)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeReplica(FsDatasetImpl.java:1725)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock(FsDatasetImpl.java:1686)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.finalizeBlock(BlockReceiver.java:1427)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1384)
        at java.lang.Thread.run(Thread.java:744)
{code}


Next:

{code}
"DataXceiver for client DFSClient_NONMAPREDUCE_1840455641_1 at 
/10.9.65.169:51035 [Receiving block 
BP-1509854702-10.9.65.8-1392815592442:blk_3771076922_1102260175605]" daemon 
prio=10 tid=0x00007f1707244000 nid=0xb5dfc runnable [0x00007f1a38685000]
   java.lang.Thread.State: RUNNABLE
        at 
org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:472)
        at org.apache.hadoop.hdfs.util.FoldedTreeSet.add(FoldedTreeSet.java:565)
        at 
org.apache.hadoop.hdfs.util.FoldedTreeSet.addOrReplace(FoldedTreeSet.java:527)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.add(ReplicaMap.java:133)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1427)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:200)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:675)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
        at java.lang.Thread.run(Thread.java:744)
{code}

Next:

{code}
"Thread-41" daemon prio=10 tid=0x0000000003c39000 nid=0x1c6fe1 runnable 
[0x00007f1a4b7b8000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:89)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:1991)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:686)
        at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:632)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:729)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:539)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:695)
        at java.lang.Thread.run(Thread.java:744)
{code}

Finally:

{code}
"DataXceiver for client DFSClient_NONMAPREDUCE_-619388227_1 at 
/10.9.68.14:35518 [Sending block 
BP-1509854702-10.9.65.8-1392815592442:blk_3738093208_1102227094469]" daemon 
prio=10 tid=0x00007f1a683cf800 nid=0xb61d1 runnable [0x00007f1a36060000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:89)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getVolume(FsDatasetImpl.java:177)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getVolume(FsDatasetImpl.java:127)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:283)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:537)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
        at java.lang.Thread.run(Thread.java:744)
{code}

The notable point here, is that on most samples, an access to the foldedTreeSet 
is holding the lock and at the top of the call stack. On a healthy datanode, we 
almost never catch a get from the treeset in a jstack sample as its a very fast 
operation.

Again, grepping the all the stacks, I can see the line "at 
org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449)" 
mentioned very frequently, which does not seem to happen when the DNs are 
healthy.



was (Author: sodonnell):
Tracking a 3rd occurrence of this issue, on yet another cluster. The two issues 
above were on different clusters too, all 3 on CDH 5 versions.

Taking this log from the DN:

{code}
2020-01-17 08:32:37,613 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Took 37189ms to process 1 commands from NN
{code}

It was unable to process commands on the DN for 37 seconds. This generally 
indicates the DN is struggling to get the fsdatasetImpl lock within the DN. 
With jstacks captured at 5 second interval, we can see the following.

First the heartbeat thread is blocked waiting on the fsdatasetimpl lock:

{code}
"Thread-41" daemon prio=10 tid=0x0000000003c39000 nid=0x1c6fe1 waiting for 
monitor entry [0x00007f1a4b7b8000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:1990)
        - waiting to lock <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:686)
        at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:632)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:729)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:539)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:695)
        at java.lang.Thread.run(Thread.java:744)
{code}

It stays this way over all samples covering 37 seconds. We can see several 
other threads waiting on the lock, but the holder shows this trace:

{code}
"DataXceiver for client 
DFSClient_attempt_202001170833_0083_m_000005_0_984153468_57 at 
/10.9.65.183:36388 [Receiving block 
BP-1509854702-10.9.65.8-1392815592442:blk_3771075438_1102260174116]" daemon 
prio=10 tid=0x00007f1a681cc800 nid=0xb59d1 runnable [0x00007f1a46361000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1387)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:200)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:675)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
        at java.lang.Thread.run(Thread.java:744)
{code}

Note it is attempting to get a value from FoldedTreeSet, which should be a very 
fast operation.

Looking further samples, here is the call stack holding the lock at each sample 
(5 second interval):

Not sure why this thread is still blocked, but perhaps it is just about to 
transition to running now it has the lock:

{code}
"DataXceiver for client DFSClient_NONMAPREDUCE_-619388227_1 at 
/10.9.68.14:33965 [Sending block 
BP-1509854702-10.9.65.8-1392815592442:blk_2353271472_1100837778570]" daemon 
prio=10 tid=0x00007f1a68a1a000 nid=0xb5a96 waiting for monitor entry 
[0x00007f1a3c3c3000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFileNoExistsCheck(FsDatasetImpl.java:729)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:741)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:411)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:537)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
        at java.lang.Thread.run(Thread.java:744)
{code}

Next sample:

{code}
"DataXceiver for client DFSClient_NONMAPREDUCE_-679188183_1 at 
/10.9.65.169:50917 [Receiving block 
BP-1509854702-10.9.65.8-1392815592442:blk_3771075929_1102260174609]" daemon 
prio=10 tid=0x00007f1a68af4800 nid=0xb5bcf runnable [0x00007f1a3b8be000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1387)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:200)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:675)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
        at java.lang.Thread.run(Thread.java:744)
{code}

Next:

{code}
"PacketResponder: 
BP-1509854702-10.9.65.8-1392815592442:blk_3771076285_1102260174968, 
type=HAS_DOWNSTREAM_IN_PIPELINE" daemon prio=10 tid=0x00007f1a6d476000 
nid=0xb5d09 runnable [0x00007f1a34cf0000]
   java.lang.Thread.State: RUNNABLE
        at 
org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:472)
        at org.apache.hadoop.hdfs.util.FoldedTreeSet.add(FoldedTreeSet.java:565)
        at 
org.apache.hadoop.hdfs.util.FoldedTreeSet.addOrReplace(FoldedTreeSet.java:527)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.add(ReplicaMap.java:133)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeReplica(FsDatasetImpl.java:1725)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock(FsDatasetImpl.java:1686)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.finalizeBlock(BlockReceiver.java:1427)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1384)
        at java.lang.Thread.run(Thread.java:744)
{code}


Next:

{code}
"DataXceiver for client DFSClient_NONMAPREDUCE_1840455641_1 at 
/10.9.65.169:51035 [Receiving block 
BP-1509854702-10.9.65.8-1392815592442:blk_3771076922_1102260175605]" daemon 
prio=10 tid=0x00007f1707244000 nid=0xb5dfc runnable [0x00007f1a38685000]
   java.lang.Thread.State: RUNNABLE
        at 
org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:472)
        at org.apache.hadoop.hdfs.util.FoldedTreeSet.add(FoldedTreeSet.java:565)
        at 
org.apache.hadoop.hdfs.util.FoldedTreeSet.addOrReplace(FoldedTreeSet.java:527)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.add(ReplicaMap.java:133)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1427)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:200)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:675)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
        at java.lang.Thread.run(Thread.java:744)
{code}

Next:

{code}
"Thread-41" daemon prio=10 tid=0x0000000003c39000 nid=0x1c6fe1 runnable 
[0x00007f1a4b7b8000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:89)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:1991)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:686)
        at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:632)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:729)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:539)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:695)
        at java.lang.Thread.run(Thread.java:744)
{code}

Finally:

{code}
"DataXceiver for client DFSClient_NONMAPREDUCE_-619388227_1 at 
/10.9.68.14:35518 [Sending block 
BP-1509854702-10.9.65.8-1392815592442:blk_3738093208_1102227094469]" daemon 
prio=10 tid=0x00007f1a683cf800 nid=0xb61d1 runnable [0x00007f1a36060000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:89)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getVolume(FsDatasetImpl.java:177)
        - locked <0x0000000751682590> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getVolume(FsDatasetImpl.java:127)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:283)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:537)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
        at java.lang.Thread.run(Thread.java:744)
{code}

The notable point here, is that on most samples, an access to the foldedTreeSet 
is holding the lock and at the top of the call stack. On a healthy datanode, we 
almost never catch a get from the treeset in a jstack sample as its a very fast 
operation.

Again, grepping the all the stacks, I can see the line "at 
org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449)" 
mentioned very frequently, which does not seem to happen when the DNs are 
healthy.


> FoldedTreeSet appears to degrade over time
> ------------------------------------------
>
>                 Key: HDFS-15131
>                 URL: https://issues.apache.org/jira/browse/HDFS-15131
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, namenode
>    Affects Versions: 3.3.0
>            Reporter: Stephen O'Donnell
>            Priority: Major
>
> We have seen some occurrences of the Namenode getting very slow on delete 
> operations, to the point where IBRs get blocked frequently and files fail to 
> close. On one cluster in particular, after about 4 weeks of uptime, the 
> Namenode started responding very poorly. Restarting it corrected the problem 
> for another 4 weeks. 
> In that example, jstacks in the namenode always pointed to slow operations 
> around a HDFS delete call which was performing an operation on the 
> FoldedTreeSet structure. The captured jstacks always pointed at an operation 
> on the folded tree set each time they were sampled:
> {code}
> "IPC Server handler 573 on 8020" #663 daemon prio=5 os_prio=0 
> tid=0x00007fe6a4087800 nid=0x97a6 runnable [0x00007fe67bdfd000]
>    java.lang.Thread.State: RUNNABLE
>       at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:879)
>       at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:263)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3676)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3507)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4158)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4132)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4069)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4053)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:845)
>       at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:308)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:603)
>       at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
> {code}
> The observation in this case, was that the namenode worked fine after a 
> restart and then at some point after about 4 weeks of uptime, this problem 
> started happening, and it would persist until the namenode was restarted. 
> Then the problem did not return for about another 4 weeks.
> On a completely different cluster and version, I recently came across a 
> problem where files were again failing to close (last block does not have 
> sufficient number of replicas) and the datanodes were logging a lot of 
> messages like the following:
> {code}
> 2019-11-27 09:00:49,678 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 21540ms to process 1 commands from NN
> {code}
> These messages had a range of durations and were fairly frequent. Focusing on 
> the longer messages at around 20 seconds and checking a few different 
> occurrences, jstacks showed as very similar problem to the NN issue, in that 
> the folderTreeSet seems to be at the root of the problem.
> The heartbeat thread is waiting on a lock:
> {code}
> "Thread-26" #243 daemon prio=5 os_prio=0 tid=0x00007f575225a000 nid=0x1d8bae 
> waiting for monitor entry [0x00007f571ed76000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>     at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2004)
>     - waiting to lock <0x00000006b79baf00> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:699)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:645)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:730)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:539)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:696)
>     at java.lang.Thread.run(Thread.java:748)
> {code}
> What seemed to be blocking it is:
> {code}
> "DataXceiver for client DFSClient_NONMAPREDUCE_672634602_52 at 
> /10.71.224.51:37620 [Sending block 
> BP-551684957-10.71.224.13-1505987946791:blk_1683565334_609907893]" #142018893 
> daemon prio=5 os_prio=0 tid=0x00007f573cd25000 nid=0x1912a runnable 
> [0x00007f570acea000]
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449)
>     at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111)
>     - locked <0x00000006b79baf00> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>     at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getFile(FsDatasetImpl.java:2199)
>     at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.validateBlockFile(FsDatasetImpl.java:1954)
>     - locked <0x00000006b79baf00> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>     at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:709)
>     at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:702)
>     at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getMetaDataInputStream(FsDatasetImpl.java:220)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:301)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:537)
>     at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148)
>     at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
>     at java.lang.Thread.run(Thread.java:748)
> {code}
> Out of curiosity, we restarted 3 of the DNs, and 24 hours since the restart, 
> no messages like the above were logged, but they were getting logged at a 
> reasonable frequency before the restart.
> I have quite a few jstacks at 5 second frequency for some of the DNs on this 
> cluster and many samples have a runnable thread with this line at top of the 
> stack, which seems too frequent for a get operation:
> {code}
> at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111)
> - locked <0x00000006b79baf00> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> {code}
> After the restart, the jstacks never show this line in any samples for at 
> least the next week while were were monitoring the cluster.
> Another interesting point, is that the JStack always shows line 449. This is 
> the method it is running and the line is highlighted:
> {code}
>   public E get(Object obj, Comparator<?> cmp) {
>     Objects.requireNonNull(obj);
>     Node<E> node = root;
>     while (node != null) {
>       E[] entries = node.entries;
>       int leftIndex = node.leftIndex;
>       int result = compare(obj, entries[leftIndex], cmp);
>       if (result < 0) {
>         node = node.left;
>       } else if (result == 0) {
>         return entries[leftIndex];
>       } else {
>         int rightIndex = node.rightIndex;
>         if (leftIndex != rightIndex) {
>           result = compare(obj, entries[rightIndex], cmp);
>         }
>         if (result == 0) {
>           return entries[rightIndex];
>         } else if (result > 0) {
>           node = node.right;
>         } else {
>           int low = leftIndex + 1;
>           int high = rightIndex - 1;
>           while (low <= high) {
>             int mid = (low + high) >>> 1;
>             result = compare(obj, entries[mid], cmp);
>             if (result > 0) {
>               low = mid + 1;
>             } else if (result < 0) {
>               high = mid - 1;
>             } else {
>               return entries[mid];
>             }
>           }
>           return null;
>         }
>       }
>     } // *** This is line 449 which the jstack always has at the top of the 
> stack.
>     return null;
>   }
> {code}
> I cannot explain why that line is always the "running line" except if its 
> somehow related to safepoints in the JVM.
> Based on these two examples and a few other similar issues I have encountered 
> in the namenode, I believe we have an issue with foldedTreeSet where it 
> somehow degrades over time, to the point that a simple get operation takes a 
> relatively long time. It does not seem to be able to recover from this with a 
> process restart.
> HDFS-13671 talks about reverting FoldedTreeSet (at least in the Namenode) for 
> performance reasons. However I wonder if the problems people have seen with 
> it are related to this performance degradation over time rather than the 
> initial performance?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to