[ https://issues.apache.org/jira/browse/HDFS-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018189#comment-17018189 ]
Stephen O'Donnell edited comment on HDFS-15131 at 1/17/20 5:01 PM: ------------------------------------------------------------------- Tracking a 3rd occurrence of this issue, on yet another cluster. The two issues above were on different clusters too, all 3 on CDH 5 versions. Taking this log from the DN process which had been running for about 5 months: {code} 2020-01-17 08:32:37,613 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Took 37189ms to process 1 commands from NN {code} It was unable to process commands on the DN for 37 seconds. This generally indicates the DN is struggling to get the fsdatasetImpl lock within the DN. With jstacks captured at 5 second interval, we can see the following. First the heartbeat thread is blocked waiting on the fsdatasetimpl lock: {code} "Thread-41" daemon prio=10 tid=0x0000000003c39000 nid=0x1c6fe1 waiting for monitor entry [0x00007f1a4b7b8000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:1990) - waiting to lock <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:686) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:632) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:729) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:539) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:695) at java.lang.Thread.run(Thread.java:744) {code} It stays this way over all samples covering 37 seconds. We can see several other threads waiting on the lock, but the holder shows this trace: {code} "DataXceiver for client DFSClient_attempt_202001170833_0083_m_000005_0_984153468_57 at /10.9.65.183:36388 [Receiving block BP-1509854702-10.9.65.8-1392815592442:blk_3771075438_1102260174116]" daemon prio=10 tid=0x00007f1a681cc800 nid=0xb59d1 runnable [0x00007f1a46361000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1387) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:200) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:675) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) at java.lang.Thread.run(Thread.java:744) {code} Note it is attempting to get a value from FoldedTreeSet, which should be a very fast operation. Looking further samples, here is the call stack holding the lock at each sample (5 second interval): Not sure why this thread is still blocked, but perhaps it is just about to transition to running now it has the lock: {code} "DataXceiver for client DFSClient_NONMAPREDUCE_-619388227_1 at /10.9.68.14:33965 [Sending block BP-1509854702-10.9.65.8-1392815592442:blk_2353271472_1100837778570]" daemon prio=10 tid=0x00007f1a68a1a000 nid=0xb5a96 waiting for monitor entry [0x00007f1a3c3c3000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFileNoExistsCheck(FsDatasetImpl.java:729) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:741) at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:411) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:537) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) at java.lang.Thread.run(Thread.java:744) {code} Next sample: {code} "DataXceiver for client DFSClient_NONMAPREDUCE_-679188183_1 at /10.9.65.169:50917 [Receiving block BP-1509854702-10.9.65.8-1392815592442:blk_3771075929_1102260174609]" daemon prio=10 tid=0x00007f1a68af4800 nid=0xb5bcf runnable [0x00007f1a3b8be000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1387) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:200) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:675) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) at java.lang.Thread.run(Thread.java:744) {code} Next: {code} "PacketResponder: BP-1509854702-10.9.65.8-1392815592442:blk_3771076285_1102260174968, type=HAS_DOWNSTREAM_IN_PIPELINE" daemon prio=10 tid=0x00007f1a6d476000 nid=0xb5d09 runnable [0x00007f1a34cf0000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:472) at org.apache.hadoop.hdfs.util.FoldedTreeSet.add(FoldedTreeSet.java:565) at org.apache.hadoop.hdfs.util.FoldedTreeSet.addOrReplace(FoldedTreeSet.java:527) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.add(ReplicaMap.java:133) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeReplica(FsDatasetImpl.java:1725) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock(FsDatasetImpl.java:1686) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.finalizeBlock(BlockReceiver.java:1427) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1384) at java.lang.Thread.run(Thread.java:744) {code} Next: {code} "DataXceiver for client DFSClient_NONMAPREDUCE_1840455641_1 at /10.9.65.169:51035 [Receiving block BP-1509854702-10.9.65.8-1392815592442:blk_3771076922_1102260175605]" daemon prio=10 tid=0x00007f1707244000 nid=0xb5dfc runnable [0x00007f1a38685000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:472) at org.apache.hadoop.hdfs.util.FoldedTreeSet.add(FoldedTreeSet.java:565) at org.apache.hadoop.hdfs.util.FoldedTreeSet.addOrReplace(FoldedTreeSet.java:527) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.add(ReplicaMap.java:133) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1427) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:200) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:675) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) at java.lang.Thread.run(Thread.java:744) {code} Next: {code} "Thread-41" daemon prio=10 tid=0x0000000003c39000 nid=0x1c6fe1 runnable [0x00007f1a4b7b8000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:89) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:1991) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:686) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:632) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:729) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:539) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:695) at java.lang.Thread.run(Thread.java:744) {code} Finally: {code} "DataXceiver for client DFSClient_NONMAPREDUCE_-619388227_1 at /10.9.68.14:35518 [Sending block BP-1509854702-10.9.65.8-1392815592442:blk_3738093208_1102227094469]" daemon prio=10 tid=0x00007f1a683cf800 nid=0xb61d1 runnable [0x00007f1a36060000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:89) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getVolume(FsDatasetImpl.java:177) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getVolume(FsDatasetImpl.java:127) at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:283) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:537) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) at java.lang.Thread.run(Thread.java:744) {code} The notable point here, is that on most samples, an access to the foldedTreeSet is holding the lock and at the top of the call stack. On a healthy datanode, we almost never catch a get from the treeset in a jstack sample as its a very fast operation. Again, grepping the all the stacks, I can see the line "at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449)" mentioned very frequently, which does not seem to happen when the DNs are healthy. was (Author: sodonnell): Tracking a 3rd occurrence of this issue, on yet another cluster. The two issues above were on different clusters too, all 3 on CDH 5 versions. Taking this log from the DN: {code} 2020-01-17 08:32:37,613 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Took 37189ms to process 1 commands from NN {code} It was unable to process commands on the DN for 37 seconds. This generally indicates the DN is struggling to get the fsdatasetImpl lock within the DN. With jstacks captured at 5 second interval, we can see the following. First the heartbeat thread is blocked waiting on the fsdatasetimpl lock: {code} "Thread-41" daemon prio=10 tid=0x0000000003c39000 nid=0x1c6fe1 waiting for monitor entry [0x00007f1a4b7b8000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:1990) - waiting to lock <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:686) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:632) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:729) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:539) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:695) at java.lang.Thread.run(Thread.java:744) {code} It stays this way over all samples covering 37 seconds. We can see several other threads waiting on the lock, but the holder shows this trace: {code} "DataXceiver for client DFSClient_attempt_202001170833_0083_m_000005_0_984153468_57 at /10.9.65.183:36388 [Receiving block BP-1509854702-10.9.65.8-1392815592442:blk_3771075438_1102260174116]" daemon prio=10 tid=0x00007f1a681cc800 nid=0xb59d1 runnable [0x00007f1a46361000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1387) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:200) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:675) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) at java.lang.Thread.run(Thread.java:744) {code} Note it is attempting to get a value from FoldedTreeSet, which should be a very fast operation. Looking further samples, here is the call stack holding the lock at each sample (5 second interval): Not sure why this thread is still blocked, but perhaps it is just about to transition to running now it has the lock: {code} "DataXceiver for client DFSClient_NONMAPREDUCE_-619388227_1 at /10.9.68.14:33965 [Sending block BP-1509854702-10.9.65.8-1392815592442:blk_2353271472_1100837778570]" daemon prio=10 tid=0x00007f1a68a1a000 nid=0xb5a96 waiting for monitor entry [0x00007f1a3c3c3000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFileNoExistsCheck(FsDatasetImpl.java:729) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:741) at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:411) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:537) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) at java.lang.Thread.run(Thread.java:744) {code} Next sample: {code} "DataXceiver for client DFSClient_NONMAPREDUCE_-679188183_1 at /10.9.65.169:50917 [Receiving block BP-1509854702-10.9.65.8-1392815592442:blk_3771075929_1102260174609]" daemon prio=10 tid=0x00007f1a68af4800 nid=0xb5bcf runnable [0x00007f1a3b8be000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1387) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:200) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:675) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) at java.lang.Thread.run(Thread.java:744) {code} Next: {code} "PacketResponder: BP-1509854702-10.9.65.8-1392815592442:blk_3771076285_1102260174968, type=HAS_DOWNSTREAM_IN_PIPELINE" daemon prio=10 tid=0x00007f1a6d476000 nid=0xb5d09 runnable [0x00007f1a34cf0000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:472) at org.apache.hadoop.hdfs.util.FoldedTreeSet.add(FoldedTreeSet.java:565) at org.apache.hadoop.hdfs.util.FoldedTreeSet.addOrReplace(FoldedTreeSet.java:527) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.add(ReplicaMap.java:133) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeReplica(FsDatasetImpl.java:1725) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock(FsDatasetImpl.java:1686) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.finalizeBlock(BlockReceiver.java:1427) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1384) at java.lang.Thread.run(Thread.java:744) {code} Next: {code} "DataXceiver for client DFSClient_NONMAPREDUCE_1840455641_1 at /10.9.65.169:51035 [Receiving block BP-1509854702-10.9.65.8-1392815592442:blk_3771076922_1102260175605]" daemon prio=10 tid=0x00007f1707244000 nid=0xb5dfc runnable [0x00007f1a38685000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:472) at org.apache.hadoop.hdfs.util.FoldedTreeSet.add(FoldedTreeSet.java:565) at org.apache.hadoop.hdfs.util.FoldedTreeSet.addOrReplace(FoldedTreeSet.java:527) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.add(ReplicaMap.java:133) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1427) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:200) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:675) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) at java.lang.Thread.run(Thread.java:744) {code} Next: {code} "Thread-41" daemon prio=10 tid=0x0000000003c39000 nid=0x1c6fe1 runnable [0x00007f1a4b7b8000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:89) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:1991) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:686) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:632) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:729) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:539) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:695) at java.lang.Thread.run(Thread.java:744) {code} Finally: {code} "DataXceiver for client DFSClient_NONMAPREDUCE_-619388227_1 at /10.9.68.14:35518 [Sending block BP-1509854702-10.9.65.8-1392815592442:blk_3738093208_1102227094469]" daemon prio=10 tid=0x00007f1a683cf800 nid=0xb61d1 runnable [0x00007f1a36060000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:89) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getVolume(FsDatasetImpl.java:177) - locked <0x0000000751682590> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getVolume(FsDatasetImpl.java:127) at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:283) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:537) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) at java.lang.Thread.run(Thread.java:744) {code} The notable point here, is that on most samples, an access to the foldedTreeSet is holding the lock and at the top of the call stack. On a healthy datanode, we almost never catch a get from the treeset in a jstack sample as its a very fast operation. Again, grepping the all the stacks, I can see the line "at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449)" mentioned very frequently, which does not seem to happen when the DNs are healthy. > FoldedTreeSet appears to degrade over time > ------------------------------------------ > > Key: HDFS-15131 > URL: https://issues.apache.org/jira/browse/HDFS-15131 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode > Affects Versions: 3.3.0 > Reporter: Stephen O'Donnell > Priority: Major > > We have seen some occurrences of the Namenode getting very slow on delete > operations, to the point where IBRs get blocked frequently and files fail to > close. On one cluster in particular, after about 4 weeks of uptime, the > Namenode started responding very poorly. Restarting it corrected the problem > for another 4 weeks. > In that example, jstacks in the namenode always pointed to slow operations > around a HDFS delete call which was performing an operation on the > FoldedTreeSet structure. The captured jstacks always pointed at an operation > on the folded tree set each time they were sampled: > {code} > "IPC Server handler 573 on 8020" #663 daemon prio=5 os_prio=0 > tid=0x00007fe6a4087800 nid=0x97a6 runnable [0x00007fe67bdfd000] > java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:879) > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:263) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3676) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3507) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4158) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4132) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4069) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4053) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:845) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:308) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:603) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > {code} > The observation in this case, was that the namenode worked fine after a > restart and then at some point after about 4 weeks of uptime, this problem > started happening, and it would persist until the namenode was restarted. > Then the problem did not return for about another 4 weeks. > On a completely different cluster and version, I recently came across a > problem where files were again failing to close (last block does not have > sufficient number of replicas) and the datanodes were logging a lot of > messages like the following: > {code} > 2019-11-27 09:00:49,678 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Took 21540ms to process 1 commands from NN > {code} > These messages had a range of durations and were fairly frequent. Focusing on > the longer messages at around 20 seconds and checking a few different > occurrences, jstacks showed as very similar problem to the NN issue, in that > the folderTreeSet seems to be at the root of the problem. > The heartbeat thread is waiting on a lock: > {code} > "Thread-26" #243 daemon prio=5 os_prio=0 tid=0x00007f575225a000 nid=0x1d8bae > waiting for monitor entry [0x00007f571ed76000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2004) > - waiting to lock <0x00000006b79baf00> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:699) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:645) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:730) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:539) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:696) > at java.lang.Thread.run(Thread.java:748) > {code} > What seemed to be blocking it is: > {code} > "DataXceiver for client DFSClient_NONMAPREDUCE_672634602_52 at > /10.71.224.51:37620 [Sending block > BP-551684957-10.71.224.13-1505987946791:blk_1683565334_609907893]" #142018893 > daemon prio=5 os_prio=0 tid=0x00007f573cd25000 nid=0x1912a runnable > [0x00007f570acea000] > java.lang.Thread.State: RUNNABLE > at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111) > - locked <0x00000006b79baf00> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getFile(FsDatasetImpl.java:2199) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.validateBlockFile(FsDatasetImpl.java:1954) > - locked <0x00000006b79baf00> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:709) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:702) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getMetaDataInputStream(FsDatasetImpl.java:220) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:301) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:537) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:748) > {code} > Out of curiosity, we restarted 3 of the DNs, and 24 hours since the restart, > no messages like the above were logged, but they were getting logged at a > reasonable frequency before the restart. > I have quite a few jstacks at 5 second frequency for some of the DNs on this > cluster and many samples have a runnable thread with this line at top of the > stack, which seems too frequent for a get operation: > {code} > at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111) > - locked <0x00000006b79baf00> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) > {code} > After the restart, the jstacks never show this line in any samples for at > least the next week while were were monitoring the cluster. > Another interesting point, is that the JStack always shows line 449. This is > the method it is running and the line is highlighted: > {code} > public E get(Object obj, Comparator<?> cmp) { > Objects.requireNonNull(obj); > Node<E> node = root; > while (node != null) { > E[] entries = node.entries; > int leftIndex = node.leftIndex; > int result = compare(obj, entries[leftIndex], cmp); > if (result < 0) { > node = node.left; > } else if (result == 0) { > return entries[leftIndex]; > } else { > int rightIndex = node.rightIndex; > if (leftIndex != rightIndex) { > result = compare(obj, entries[rightIndex], cmp); > } > if (result == 0) { > return entries[rightIndex]; > } else if (result > 0) { > node = node.right; > } else { > int low = leftIndex + 1; > int high = rightIndex - 1; > while (low <= high) { > int mid = (low + high) >>> 1; > result = compare(obj, entries[mid], cmp); > if (result > 0) { > low = mid + 1; > } else if (result < 0) { > high = mid - 1; > } else { > return entries[mid]; > } > } > return null; > } > } > } // *** This is line 449 which the jstack always has at the top of the > stack. > return null; > } > {code} > I cannot explain why that line is always the "running line" except if its > somehow related to safepoints in the JVM. > Based on these two examples and a few other similar issues I have encountered > in the namenode, I believe we have an issue with foldedTreeSet where it > somehow degrades over time, to the point that a simple get operation takes a > relatively long time. It does not seem to be able to recover from this with a > process restart. > HDFS-13671 talks about reverting FoldedTreeSet (at least in the Namenode) for > performance reasons. However I wonder if the problems people have seen with > it are related to this performance degradation over time rather than the > initial performance? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org