[ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15669652#comment-15669652
 ] 

Jingcheng Du commented on HDFS-9668:
------------------------------------

Thanks for the comments [~eddyxu]!
bq. Could you help to clarify what does datasetWriteLock protect? I thought it 
protected FsDatasetImpl#volumes. Because for FsDatasetImpl#volumeMap, some 
functions like createRbw and finalizeBlock use readLock to protect 
volumeMap#add, while in moveBlock() it uses writeLock to protect 
finalizedReplica(), which seems to protect volumeMap.add() as well?
You are right, the dataset lock is used to protected the volumes. I use write 
lock for the write operations on the volumes and read lock for the read 
operations.
In the methods readRbw, etc., it only reads from volumes, a read lock is enough 
to protect the modification of the volumes during the read on volumes and 
replicaMap. Meanwhile in this method, we need to synchronize the operations for 
the same block, so I add a block-related lock.
About the lock in moveBlock(), I think we can use the combination of a read 
lock and block-related lock here. Using 
bq. And in unfinalizedBlock, it uses read lock to protect volumeMap.remove()?
Actually, it uses a read lock and a block-related lock to protect the 
volueMap.remove().
First of all, the replicaMap has a mutex to synchronize the read and write 
methods in it to avoid ConcurrentModificationExxcption (Now in the trunk, it is 
a lock not a mutex, but I change it back to mutex in the patch). Now the 
question is if it's safe to allow different blocks access the createRbw and 
other similar block-related methods at the same time? Yes?
bq. In a few places, it looks to me that we should use read lock, i.e., 
FsDatasetImpl#getBlockReports(), getFinalizedBlocks() moveBlockAcrossStorage(), 
moveBlock().
For moveBlockAcrossStorage(), moveBlock(), you are right, a read lock is enough 
since these methods only read volumes.
For getBlockReports(), getFinalizedBlocks(), I guess we have to use write lock. 
These methods iterate the replicas from repliaMap, and it is not safe to allow 
these methods and createRbw, etc. to run at the same time 
(ConcurrentModificationException might occur during the iteration if the 
replicaMap is changed by createRbw). 
bq. If the replicaMap is protected by read/write lock, does replicaMap still 
need the mutex?
It does. It is a good way to synchronize the read and write operations in the 
replicaMap if we use a read-write lock in FsDatasetImpl.
bq. Should moveBlock() hold readLock and block lock?
I think it can.




> Optimize the locking in FsDatasetImpl
> -------------------------------------
>
>                 Key: HDFS-9668
>                 URL: https://issues.apache.org/jira/browse/HDFS-9668
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-14.patch, HDFS-9668-15.patch, 
> HDFS-9668-16.patch, HDFS-9668-17.patch, HDFS-9668-18.patch, 
> HDFS-9668-19.patch, HDFS-9668-19.patch, HDFS-9668-2.patch, 
> HDFS-9668-20.patch, HDFS-9668-21.patch, HDFS-9668-22.patch, 
> HDFS-9668-23.patch, HDFS-9668-23.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, 
> HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, 
> HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>    java.lang.Thread.State: BLOCKED
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1111)
>       - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:183)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>       at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
>       - None
>       
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>    java.lang.Thread.State: RUNNABLE
>       at java.io.UnixFileSystem.createFileExclusively(Native Method)
>       at java.io.File.createNewFile(File.java:1012)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>       - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:183)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>       at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
>       - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to