[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

ludun (Jira) Wed, 10 Jun 2020 03:04:13 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17130342#comment-17130342
 ]


ludun edited comment on HDFS-15160 at 6/10/20, 10:03 AM:
---------------------------------------------------------

[~brahmareddy] pls check this issue also.  in our custome enviremont.  
Directory Scanner  block the write for a long time(300s+). although they have 
too many blocks for datanode. but we should not hold write lock for scanner.
{code}
2020-06-10 12:17:06,869 | INFO  | 
java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
queue] | BlockPool BP-1104115233-xxxxxx-1571300215588 Total blocks: 11149530, 
missing metadata files:472, missing block files:472, missing blocks in 
memory:0, mismatched blocks:0 | DirectoryScanner.java:473
2020-06-10 12:17:06,869 | WARN  | 
java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
queue] | Lock held time above threshold: lock identifier: 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: 
java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
 | InstrumentedLock.java:143
{code}


was (Author: pilchard):
[~brahmareddy] pls check this issue also.  in our custome enviremont.  
Directory Scanner  block the write for a long time(300s+). although they have 
too many blocks for datanode. but we should not hold write lock for scanner.
{code}
2020-06-10 12:17:06,869 | INFO  | 
java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
queue] | BlockPool BP-1104115233-10.2.245.4-1571300215588 Total blocks: 
11149530, missing metadata files:472, missing block files:472, missing blocks 
in memory:0, mismatched blocks:0 | DirectoryScanner.java:473
2020-06-10 12:17:06,869 | WARN  | 
java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
queue] | Lock held time above threshold: lock identifier: 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: 
java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
 | InstrumentedLock.java:143
{code}

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-15160
>                 URL: https://issues.apache.org/jira/browse/HDFS-15160
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 3.3.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>         Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> HDFS-15160.006.patch, image-2020-04-10-17-18-08-128.png, 
> image-2020-04-10-17-18-55-938.png
>
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

Reply via email to