[ https://issues.apache.org/jira/browse/HDFS-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238735#comment-14238735 ]
Colin Patrick McCabe commented on HDFS-7489: -------------------------------------------- It looks like HADOOP-10530 broke the HDFS precommit job. Hopefully it should be fixed now... > Slow FsVolumeList checkDirs can hang datanodes > ---------------------------------------------- > > Key: HDFS-7489 > URL: https://issues.apache.org/jira/browse/HDFS-7489 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 2.5.0, 2.6.0 > Reporter: Noah Lorang > Priority: Critical > Attachments: HDFS-7489-v1.patch, HDFS-7489-v2.patch, > HDFS-7489-v2.patch.1 > > > Starting after upgrading to 2.5.0 (CDH 5.2.1), we started to see datanodes > hanging their heartbeat and requests from clients. After some digging, I > identified the culprit as being the checkDiskError() triggered by catching > IOExceptions (in our case, SocketExceptions being triggered on one datanode > by ReplicaAlreadyExistsExceptions on another datanode). > Thread dumps reveal that the checkDiskErrors() thread is holding a lock on > the FsVolumeList: > {code} > "Thread-409" daemon prio=10 tid=0x00007f4e50200800 nid=0x5b8e runnable > [0x00007f4e2f855000] > java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:973) > at java.io.File.listFiles(File.java:1051) > at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:89) > at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91) > at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:257) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:210) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:180) > - locked <0x000000063b182ea0> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1396) > at > org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2832) > at java.lang.Thread.run(Thread.java:662) > {code} > Other things would then lock the FsDatasetImpl while waiting for the > FsVolumeList, e.g.: > {code} > "DataXceiver for client at /10.10.0.52:46643 [Receiving block > BP-1573746465-127.0.1.1-1352244533715:blk_1073770670_1099996962574]" daemon > prio=10 tid=0x00007f4e55561000 nid=0x406d waiting for monitor entry > [0x00007f4e3106d000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.getNextVolume(FsVolumeList.java:64) > - waiting to lock <0x000000063b182ea0> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:927) > - locked <0x000000063b1f9a48> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:101) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:167) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:604) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225) > at java.lang.Thread.run(Thread.java:662) > {code} > That lock on the FsDatasetImpl then causes other threads to block: > {code} > "Thread-127" daemon prio=10 tid=0x00007f4e4c67d800 nid=0x2e02 waiting for > monitor entry [0x00007f4e33390000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:228) > - waiting to lock <0x000000063b1f9a48> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyBlock(BlockPoolSliceScanner.java:436) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyFirstBlock(BlockPoolSliceScanner.java:523) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:684) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:650) > at > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:101) > "DataNode: [[[DISK]file:/data/0/dfs/dn/, [DISK]file:/data/1/dfs/dn/, > [DISK]file:/data/2/dfs/dn/, [DISK]file:/data/3/dfs/dn/, > [DISK]file:/data/4/dfs/dn/, [DISK]file:/data/5/dfs/dn/, > [DISK]file:/data/6/dfs/dn/, [DISK]file:/data/7/dfs/dn/, > [DISK]file:/data/8/dfs/dn/, [DISK]file:/data/9/dfs/dn/, > [DISK]file:/data/10/dfs/dn/, [DISK]file:/data/11/dfs/dn/, > [DISK]file:/data/12/dfs/dn/, [DISK]file:/data/13/dfs/dn/, > [DISK]file:/data/14/dfs/dn/, [DISK]file:/data/15/dfs/dn/, > [DISK]file:/data/16/dfs/dn/, [DISK]file:/data/17/dfs/dn/, > [DISK]file:/data/18/dfs/dn/, [DISK]file:/data/19/dfs/dn/, > [DISK]file:/data/20/dfs/dn/, [DISK]file:/data/21/dfs/dn/, > [DISK]file:/data/22/dfs/dn/, [DISK]file:/data/23/dfs/dn/, > [DISK]file:/data/24/dfs/dn/, [DISK]file:/data/25/dfs/dn/, > [DISK]file:/data/26/dfs/dn/, [DISK]file:/data/27/dfs/dn/, > [DISK]file:/data/28/dfs/dn/, [DISK]file:/data/29/dfs/dn/, > [DISK]file:/data/30/dfs/dn/, [DISK]file:/data/31/dfs/dn/, > [DISK]file:/data/32/dfs/dn/, [DISK]file:/data/33/dfs/dn/, > [DISK]file:/data/34/dfs/dn/, [DISK]file:/data/35/dfs/dn/]] heartbeating to > bigdata-01.sc-chi-int.37signals.com/10.10.0.211:8020" daemon prio=10 > tid=0x00007f4e553a5800 nid=0x2d66 waiting for monitor entry > [0x00007f4e361be000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:116) > - waiting to lock <0x000000063b1f9a48> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:132) > - locked <0x000000063b182d80> (a java.lang.Object) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:572) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:847) > at java.lang.Thread.run(Thread.java:662) > "DataXceiver for client DFSClient_NONMAPREDUCE_-1948416574_103 at > /10.10.0.169:44229 [Receiving block > BP-1573746465-127.0.1.1-1352244533715:blk_1073776794_1099996963072]" daemon > prio=10 tid=0x00007f4e55431000 nid=0x4ab8 waiting for monitor entry > [0x00007f4e2a448000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:782) > - waiting to lock <0x000000063b1f9a48> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:101) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:171) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:604) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225) > at java.lang.Thread.run(Thread.java:662) > {code} > This occurs, I believe, because FsVolumeList#checkDirs() locks the entire > FsVolumeList to prevent multiple simultaneous checkDirs() calls. On a slow > system under high IO or with many disks, the checkDirs() call can take dozens > of seconds or longer (empirically). > I think this can be improved by holding a separate mutex limited to checkDirs > (must like datanode.checkDiskErrors() does) and only locking the full > FsVolumeList if needed to update the volume list if any disks are removed. > I'll attach a patch that does this. > We're running this patch in production and it's working as expected -- a lock > is held locally instead of on the entire FsVolumeList while checkDirs() is > running: > {code} > "Thread-614" daemon prio=10 tid=0x000000004037b000 nid=0x7331 runnable > [0x00007f4d45391000] > java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1157) > at > org.apache.hadoop.util.DiskChecker.mkdirsWithExistsCheck(DiskChecker.java:67) > at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:104) > at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:88) > at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91) > at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:257) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:210) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:182) > - locked <0x000000063b24f540> (a java.lang.Object) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1396) > at > org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2832) > at java.lang.Thread.run(Thread.java:662) > {code} > Feedback would be appreciated! -- This message was sent by Atlassian JIRA (v6.3.4#6332)