[
https://issues.apache.org/jira/browse/HDFS-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephen O'Donnell resolved HDFS-15131.
--------------------------------------
Resolution: Won't Fix
HDFS-13671 is now committed which removes the FoldedTreeSet from the code base,
hence closing this as it is no longer relevant.
> FoldedTreeSet appears to degrade over time
> ------------------------------------------
>
> Key: HDFS-15131
> URL: https://issues.apache.org/jira/browse/HDFS-15131
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode, namenode
> Affects Versions: 3.3.0
> Reporter: Stephen O'Donnell
> Assignee: Stephen O'Donnell
> Priority: Major
>
> We have seen some occurrences of the Namenode getting very slow on delete
> operations, to the point where IBRs get blocked frequently and files fail to
> close. On one cluster in particular, after about 4 weeks of uptime, the
> Namenode started responding very poorly. Restarting it corrected the problem
> for another 4 weeks.
> In that example, jstacks in the namenode always pointed to slow operations
> around a HDFS delete call which was performing an operation on the
> FoldedTreeSet structure. The captured jstacks always pointed at an operation
> on the folded tree set each time they were sampled:
> {code}
> "IPC Server handler 573 on 8020" #663 daemon prio=5 os_prio=0
> tid=0x00007fe6a4087800 nid=0x97a6 runnable [0x00007fe67bdfd000]
> java.lang.Thread.State: RUNNABLE
> at
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:879)
> at
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:263)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3676)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3507)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4158)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4132)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4069)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4053)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:845)
> at
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:308)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:603)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> {code}
> The observation in this case, was that the namenode worked fine after a
> restart and then at some point after about 4 weeks of uptime, this problem
> started happening, and it would persist until the namenode was restarted.
> Then the problem did not return for about another 4 weeks.
> On a completely different cluster and version, I recently came across a
> problem where files were again failing to close (last block does not have
> sufficient number of replicas) and the datanodes were logging a lot of
> messages like the following:
> {code}
> 2019-11-27 09:00:49,678 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> Took 21540ms to process 1 commands from NN
> {code}
> These messages had a range of durations and were fairly frequent. Focusing on
> the longer messages at around 20 seconds and checking a few different
> occurrences, jstacks showed as very similar problem to the NN issue, in that
> the folderTreeSet seems to be at the root of the problem.
> The heartbeat thread is waiting on a lock:
> {code}
> "Thread-26" #243 daemon prio=5 os_prio=0 tid=0x00007f575225a000 nid=0x1d8bae
> waiting for monitor entry [0x00007f571ed76000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2004)
> - waiting to lock <0x00000006b79baf00> (a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:699)
> at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:645)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:730)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:539)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:696)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> What seemed to be blocking it is:
> {code}
> "DataXceiver for client DFSClient_NONMAPREDUCE_672634602_52 at
> /10.71.224.51:37620 [Sending block
> BP-551684957-10.71.224.13-1505987946791:blk_1683565334_609907893]" #142018893
> daemon prio=5 os_prio=0 tid=0x00007f573cd25000 nid=0x1912a runnable
> [0x00007f570acea000]
> java.lang.Thread.State: RUNNABLE
> at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111)
> - locked <0x00000006b79baf00> (a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getFile(FsDatasetImpl.java:2199)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.validateBlockFile(FsDatasetImpl.java:1954)
> - locked <0x00000006b79baf00> (a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:709)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:702)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getMetaDataInputStream(FsDatasetImpl.java:220)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:301)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:537)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> Out of curiosity, we restarted 3 of the DNs, and 24 hours since the restart,
> no messages like the above were logged, but they were getting logged at a
> reasonable frequency before the restart.
> I have quite a few jstacks at 5 second frequency for some of the DNs on this
> cluster and many samples have a runnable thread with this line at top of the
> stack, which seems too frequent for a get operation:
> {code}
> at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111)
> - locked <0x00000006b79baf00> (a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> {code}
> After the restart, the jstacks never show this line in any samples for at
> least the next week while were were monitoring the cluster.
> Another interesting point, is that the JStack always shows line 449. This is
> the method it is running and the line is highlighted:
> {code}
> public E get(Object obj, Comparator<?> cmp) {
> Objects.requireNonNull(obj);
> Node<E> node = root;
> while (node != null) {
> E[] entries = node.entries;
> int leftIndex = node.leftIndex;
> int result = compare(obj, entries[leftIndex], cmp);
> if (result < 0) {
> node = node.left;
> } else if (result == 0) {
> return entries[leftIndex];
> } else {
> int rightIndex = node.rightIndex;
> if (leftIndex != rightIndex) {
> result = compare(obj, entries[rightIndex], cmp);
> }
> if (result == 0) {
> return entries[rightIndex];
> } else if (result > 0) {
> node = node.right;
> } else {
> int low = leftIndex + 1;
> int high = rightIndex - 1;
> while (low <= high) {
> int mid = (low + high) >>> 1;
> result = compare(obj, entries[mid], cmp);
> if (result > 0) {
> low = mid + 1;
> } else if (result < 0) {
> high = mid - 1;
> } else {
> return entries[mid];
> }
> }
> return null;
> }
> }
> } // *** This is line 449 which the jstack always has at the top of the
> stack.
> return null;
> }
> {code}
> I cannot explain why that line is always the "running line" except if its
> somehow related to safepoints in the JVM.
> Based on these two examples and a few other similar issues I have encountered
> in the namenode, I believe we have an issue with foldedTreeSet where it
> somehow degrades over time, to the point that a simple get operation takes a
> relatively long time. It does not seem to be able to recover from this
> without a process restart.
> HDFS-13671 talks about reverting FoldedTreeSet (at least in the Namenode) for
> performance reasons. However I wonder if the problems people have seen with
> it are related to this performance degradation over time rather than the
> initial performance?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]