[ 
https://issues.apache.org/jira/browse/HDFS-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241905#comment-14241905
 ] 

Colin Patrick McCabe commented on HDFS-7489:
--------------------------------------------

Thanks for the patch, [~nlorang].

> Incorrect locking in FsVolumeList#checkDirs can hang datanodes
> --------------------------------------------------------------
>
>                 Key: HDFS-7489
>                 URL: https://issues.apache.org/jira/browse/HDFS-7489
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.5.0, 2.6.0
>            Reporter: Noah Lorang
>            Assignee: Noah Lorang
>            Priority: Critical
>             Fix For: 2.6.1
>
>         Attachments: HDFS-7489-v1.patch, HDFS-7489-v2.patch, 
> HDFS-7489-v2.patch.1
>
>
> Starting after upgrading to 2.5.0 (CDH 5.2.1), we started to see datanodes 
> hanging their heartbeat and requests from clients. After some digging, I 
> identified the culprit as being the checkDiskError() triggered by catching 
> IOExceptions (in our case, SocketExceptions being triggered on one datanode 
> by ReplicaAlreadyExistsExceptions on another datanode).
> Thread dumps reveal that the checkDiskErrors() thread is holding a lock on 
> the FsVolumeList:
> {code}
> "Thread-409" daemon prio=10 tid=0x00007f4e50200800 nid=0x5b8e runnable 
> [0x00007f4e2f855000]
>    java.lang.Thread.State: RUNNABLE
>         at java.io.UnixFileSystem.list(Native Method)
>         at java.io.File.list(File.java:973)
>         at java.io.File.listFiles(File.java:1051)
>         at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:89)
>         at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
>         at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:257)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:210)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:180)
>         - locked <0x000000063b182ea0> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1396)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2832)
>         at java.lang.Thread.run(Thread.java:662)
> {code}
> Other things would then lock the FsDatasetImpl while waiting for the 
> FsVolumeList, e.g.:
> {code}
> "DataXceiver for client  at /10.10.0.52:46643 [Receiving block 
> BP-1573746465-127.0.1.1-1352244533715:blk_1073770670_1099996962574]" daemon 
> prio=10 tid=0x00007f4e55561000 nid=0x406d waiting for monitor entry 
> [0x00007f4e3106d000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.getNextVolume(FsVolumeList.java:64)
>         - waiting to lock <0x000000063b182ea0> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:927)
>         - locked <0x000000063b1f9a48> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:101)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:167)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:604)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
>         at java.lang.Thread.run(Thread.java:662)
> {code}
> That lock on the FsDatasetImpl then causes other threads to block:
> {code}
> "Thread-127" daemon prio=10 tid=0x00007f4e4c67d800 nid=0x2e02 waiting for 
> monitor entry [0x00007f4e33390000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:228)
>         - waiting to lock <0x000000063b1f9a48> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyBlock(BlockPoolSliceScanner.java:436)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyFirstBlock(BlockPoolSliceScanner.java:523)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:684)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:650)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:101)
> "DataNode: [[[DISK]file:/data/0/dfs/dn/, [DISK]file:/data/1/dfs/dn/, 
> [DISK]file:/data/2/dfs/dn/, [DISK]file:/data/3/dfs/dn/, 
> [DISK]file:/data/4/dfs/dn/, [DISK]file:/data/5/dfs/dn/, 
> [DISK]file:/data/6/dfs/dn/, [DISK]file:/data/7/dfs/dn/, 
> [DISK]file:/data/8/dfs/dn/, [DISK]file:/data/9/dfs/dn/, 
> [DISK]file:/data/10/dfs/dn/, [DISK]file:/data/11/dfs/dn/, 
> [DISK]file:/data/12/dfs/dn/, [DISK]file:/data/13/dfs/dn/, 
> [DISK]file:/data/14/dfs/dn/, [DISK]file:/data/15/dfs/dn/, 
> [DISK]file:/data/16/dfs/dn/, [DISK]file:/data/17/dfs/dn/, 
> [DISK]file:/data/18/dfs/dn/, [DISK]file:/data/19/dfs/dn/, 
> [DISK]file:/data/20/dfs/dn/, [DISK]file:/data/21/dfs/dn/, 
> [DISK]file:/data/22/dfs/dn/, [DISK]file:/data/23/dfs/dn/, 
> [DISK]file:/data/24/dfs/dn/, [DISK]file:/data/25/dfs/dn/, 
> [DISK]file:/data/26/dfs/dn/, [DISK]file:/data/27/dfs/dn/, 
> [DISK]file:/data/28/dfs/dn/, [DISK]file:/data/29/dfs/dn/, 
> [DISK]file:/data/30/dfs/dn/, [DISK]file:/data/31/dfs/dn/, 
> [DISK]file:/data/32/dfs/dn/, [DISK]file:/data/33/dfs/dn/, 
> [DISK]file:/data/34/dfs/dn/, [DISK]file:/data/35/dfs/dn/]]  heartbeating to 
> bigdata-01.sc-chi-int.37signals.com/10.10.0.211:8020" daemon prio=10 
> tid=0x00007f4e553a5800 nid=0x2d66 waiting for monitor entry 
> [0x00007f4e361be000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:116)
>         - waiting to lock <0x000000063b1f9a48> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:132)
>         - locked <0x000000063b182d80> (a java.lang.Object)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:572)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:847)
>         at java.lang.Thread.run(Thread.java:662)
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1948416574_103 at 
> /10.10.0.169:44229 [Receiving block 
> BP-1573746465-127.0.1.1-1352244533715:blk_1073776794_1099996963072]" daemon 
> prio=10 tid=0x00007f4e55431000 nid=0x4ab8 waiting for monitor entry 
> [0x00007f4e2a448000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:782)
>         - waiting to lock <0x000000063b1f9a48> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:101)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:171)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:604)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
>         at java.lang.Thread.run(Thread.java:662)
> {code}
> This occurs, I believe, because FsVolumeList#checkDirs() locks the entire 
> FsVolumeList to prevent multiple simultaneous checkDirs() calls. On a slow 
> system under high IO or with many disks, the checkDirs() call can take dozens 
> of seconds or longer (empirically).
> I think this can be improved by holding a separate mutex limited to checkDirs 
> (must like datanode.checkDiskErrors() does) and only locking the full 
> FsVolumeList if needed to update the volume list if any disks are removed. 
> I'll attach a patch that does this. 
> We're running this patch in production and it's working as expected -- a lock 
> is held locally instead of on the entire FsVolumeList while checkDirs() is 
> running:
> {code}
> "Thread-614" daemon prio=10 tid=0x000000004037b000 nid=0x7331 runnable 
> [0x00007f4d45391000]
>    java.lang.Thread.State: RUNNABLE
>         at java.io.UnixFileSystem.createDirectory(Native Method)
>         at java.io.File.mkdir(File.java:1157)
>         at 
> org.apache.hadoop.util.DiskChecker.mkdirsWithExistsCheck(DiskChecker.java:67)
>         at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:104)
>         at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:88)
>         at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
>         at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:257)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:210)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:182)
>         - locked <0x000000063b24f540> (a java.lang.Object)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1396)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2832)
>         at java.lang.Thread.run(Thread.java:662)
> {code}
> Feedback would be appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to