[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379425#comment-17379425 ] Ahmed Hussein edited comment on HDFS-15160 at 7/13/21, 1:08 PM: Created [PR-3196|https://github.com/apache/hadoop/pull/3196] for branch-2.10 - main conflicts caused by HDFS-10636. (Modify ReplicaInfo to remove the assumption that replica metadata and data are stored in java.io.File) - {{FsDatasetImpl.java#validateBlockFile}} due to HDFS-10636 . I see that 3.x patch removes the lock protecting the {{r = volumeMap.get(bpid, blockId)}} ; but I am not sure if this can be applied to 2.10 since the caller of {{validateBlockFile()}} does not acquire the dataRWLock as far as I can see. - {{FsDatasetImpl.java#getFile}} only exists in branch-2.10 and was removed in HDFS-10636 - {{DiskBalancer}} does not exist in branch-2.10 - {{FsDatasetImpl.java#moveBlockAcrossStorage}} is different in branch-2.10 was (Author: ahussein): Created [PR-3196|https://github.com/apache/hadoop/pull/3196] for branch-2.10 - main conflicts caused by HDFS-10636. (Modify ReplicaInfo to remove the assumption that replica metadata and data are stored in java.io.File) - {{FsDatasetImpl.java#validateBlockFile}} due to HDFS-10636 . I see that 3.x patch removes the lock protecting the {{r = volumeMap.get(bpid, blockId);}} ; but I am not sure if this can be applied to 2.10 since the caller of {{validateBlockFile()}} does not acquire the dataRWLock as far as I can see. - {{FsDatasetImpl.java#getFile}} only exists in branch-2.10 and was removed in HDFS-10636 - {{DiskBalancer}} does not exist in branch-2.10 - {{FsDatasetImpl.java#moveBlockAcrossStorage}} is different in branch-2.10 > ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl > methods should use datanode readlock > --- > > Key: HDFS-15160 > URL: https://issues.apache.org/jira/browse/HDFS-15160 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15160-branch-3.3-001.patch, HDFS-15160.001.patch, > HDFS-15160.002.patch, HDFS-15160.003.patch, HDFS-15160.004.patch, > HDFS-15160.005.patch, HDFS-15160.006.patch, HDFS-15160.007.patch, > HDFS-15160.008.patch, HDFS-15160.branch-3-3.001.patch, > image-2020-04-10-17-18-08-128.png, image-2020-04-10-17-18-55-938.png > > Time Spent: 1h > Remaining Estimate: 0h > > Now we have HDFS-15150, we can start to move some DN operations to use the > read lock rather than the write lock to improve concurrence. The first step > is to make the changes to ReplicaMap, as many other methods make calls to it. > This Jira switches read operations against the volume map to use the readLock > rather than the write lock. > Additionally, some methods make a call to replicaMap.replicas() (eg > getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result > in a read only fashion, so they can also be switched to using a readLock. > Next is the directory scanner and disk balancer, which only require a read > lock. > Finally (for this Jira) are various "low hanging fruit" items in BlockSender > and fsdatasetImpl where is it fairly obvious they only need a read lock. > For now, I have avoided changing anything which looks too risky, as I think > its better to do any larger refactoring or risky changes each in their own > Jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379425#comment-17379425 ] Ahmed Hussein edited comment on HDFS-15160 at 7/13/21, 1:08 PM: Created [PR-3196|https://github.com/apache/hadoop/pull/3196] for branch-2.10 - main conflicts caused by HDFS-10636. (Modify ReplicaInfo to remove the assumption that replica metadata and data are stored in java.io.File) - {{FsDatasetImpl.java#validateBlockFile}} due to HDFS-10636 . I see that 3.x patch removes the lock protecting the {{r = volumeMap.get(bpid, blockId);}} ; but I am not sure if this can be applied to 2.10 since the caller of {{validateBlockFile()}} does not acquire the dataRWLock as far as I can see. - {{FsDatasetImpl.java#getFile}} only exists in branch-2.10 and was removed in HDFS-10636 - {{DiskBalancer}} does not exist in branch-2.10 - {{FsDatasetImpl.java#moveBlockAcrossStorage}} is different in branch-2.10 was (Author: ahussein): Created [PR-3196|https://github.com/apache/hadoop/pull/3196] for branch-2.10 - main conflicts caused by HDFS-10636. (Modify ReplicaInfo to remove the assumption that replica metadata and data are stored in java.io.File) - {{FsDatasetImpl.java#validateBlockFile}} due to HDFS-10636 - {{FsDatasetImpl.java#getFile}} only exists in branch-2.10 and was removed in HDFS-10636 - {{DiskBalancer}} does not exist in branch-2.10 - {{FsDatasetImpl.java#moveBlockAcrossStorage}} is different in branch-2.10 > ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl > methods should use datanode readlock > --- > > Key: HDFS-15160 > URL: https://issues.apache.org/jira/browse/HDFS-15160 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15160-branch-3.3-001.patch, HDFS-15160.001.patch, > HDFS-15160.002.patch, HDFS-15160.003.patch, HDFS-15160.004.patch, > HDFS-15160.005.patch, HDFS-15160.006.patch, HDFS-15160.007.patch, > HDFS-15160.008.patch, HDFS-15160.branch-3-3.001.patch, > image-2020-04-10-17-18-08-128.png, image-2020-04-10-17-18-55-938.png > > Time Spent: 1h > Remaining Estimate: 0h > > Now we have HDFS-15150, we can start to move some DN operations to use the > read lock rather than the write lock to improve concurrence. The first step > is to make the changes to ReplicaMap, as many other methods make calls to it. > This Jira switches read operations against the volume map to use the readLock > rather than the write lock. > Additionally, some methods make a call to replicaMap.replicas() (eg > getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result > in a read only fashion, so they can also be switched to using a readLock. > Next is the directory scanner and disk balancer, which only require a read > lock. > Finally (for this Jira) are various "low hanging fruit" items in BlockSender > and fsdatasetImpl where is it fairly obvious they only need a read lock. > For now, I have avoided changing anything which looks too risky, as I think > its better to do any larger refactoring or risky changes each in their own > Jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130497#comment-17130497 ] hemanthboyina edited comment on HDFS-15160 at 6/10/20, 10:48 AM: - thanks [~pilchard] for the report , HDFS-15150 has introduced read and write lock in datanode With HDFS-15160 we acquire read lock for scanner , so the write wont be blocked was (Author: hemanthboyina): thanks [~pilchard] for the report , HDFS-15150 has introduced read and write lock in datanode > ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl > methods should use datanode readlock > --- > > Key: HDFS-15160 > URL: https://issues.apache.org/jira/browse/HDFS-15160 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, > HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, > HDFS-15160.006.patch, image-2020-04-10-17-18-08-128.png, > image-2020-04-10-17-18-55-938.png > > > Now we have HDFS-15150, we can start to move some DN operations to use the > read lock rather than the write lock to improve concurrence. The first step > is to make the changes to ReplicaMap, as many other methods make calls to it. > This Jira switches read operations against the volume map to use the readLock > rather than the write lock. > Additionally, some methods make a call to replicaMap.replicas() (eg > getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result > in a read only fashion, so they can also be switched to using a readLock. > Next is the directory scanner and disk balancer, which only require a read > lock. > Finally (for this Jira) are various "low hanging fruit" items in BlockSender > and fsdatasetImpl where is it fairly obvious they only need a read lock. > For now, I have avoided changing anything which looks too risky, as I think > its better to do any larger refactoring or risky changes each in their own > Jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130342#comment-17130342 ] ludun edited comment on HDFS-15160 at 6/10/20, 10:03 AM: - [~brahmareddy] pls check this issue also. in our custome enviremont. Directory Scanner block the write for a long time(300s+). although they have too many blocks for datanode. but we should not hold write lock for scanner. {code} 2020-06-10 12:17:06,869 | INFO | java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty queue] | BlockPool BP-1104115233-xx-1571300215588 Total blocks: 11149530, missing metadata files:472, missing block files:472, missing blocks in memory:0, mismatched blocks:0 | DirectoryScanner.java:473 2020-06-10 12:17:06,869 | WARN | java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty queue] | Lock held time above threshold: lock identifier: org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: java.lang.Thread.getStackTrace(Thread.java:1559) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148) org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186) org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133) org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84) org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96) org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475) org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375) org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320) java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:748) | InstrumentedLock.java:143 {code} was (Author: pilchard): [~brahmareddy] pls check this issue also. in our custome enviremont. Directory Scanner block the write for a long time(300s+). although they have too many blocks for datanode. but we should not hold write lock for scanner. {code} 2020-06-10 12:17:06,869 | INFO | java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty queue] | BlockPool BP-1104115233-10.2.245.4-1571300215588 Total blocks: 11149530, missing metadata files:472, missing block files:472, missing blocks in memory:0, mismatched blocks:0 | DirectoryScanner.java:473 2020-06-10 12:17:06,869 | WARN | java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty queue] | Lock held time above threshold: lock identifier: org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: java.lang.Thread.getStackTrace(Thread.java:1559) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148) org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186) org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133) org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84) org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96) org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475) org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375) org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320) java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:748) | InstrumentedLock.java:143 {code} > ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl > methods should use datanode readlock > --- > >
[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130342#comment-17130342 ] ludun edited comment on HDFS-15160 at 6/10/20, 7:35 AM: [~brahmareddy] pls check this issue also. in our custome enviremont. Directory Scanner block the write for a long time(300s+). although they have too many blocks for datanode. but we should not hold write lock for scanner. {code} 2020-06-10 12:17:06,869 | INFO | java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty queue] | BlockPool BP-1104115233-10.2.245.4-1571300215588 Total blocks: 11149530, missing metadata files:472, missing block files:472, missing blocks in memory:0, mismatched blocks:0 | DirectoryScanner.java:473 2020-06-10 12:17:06,869 | WARN | java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty queue] | Lock held time above threshold: lock identifier: org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: java.lang.Thread.getStackTrace(Thread.java:1559) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148) org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186) org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133) org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84) org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96) org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475) org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375) org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320) java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:748) | InstrumentedLock.java:143 {code} was (Author: pilchard): [~brahmareddy] pls check this issue also. in our custome enviremont. Directory Scanner block the write for a long time. {code} 2020-06-10 12:17:06,869 | INFO | java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty queue] | BlockPool BP-1104115233-10.2.245.4-1571300215588 Total blocks: 11149530, missing metadata files:472, missing block files:472, missing blocks in memory:0, mismatched blocks:0 | DirectoryScanner.java:473 2020-06-10 12:17:06,869 | WARN | java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty queue] | Lock held time above threshold: lock identifier: org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: java.lang.Thread.getStackTrace(Thread.java:1559) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148) org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186) org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133) org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84) org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96) org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475) org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375) org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320) java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:748) | InstrumentedLock.java:143 {code} > ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl > methods should use datanode readlock > --- > > Key: HDFS-15160 > URL: https://issues.apache.org/jira/browse/HDFS-15160 >
[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120455#comment-17120455 ] Jiang Xin edited comment on HDFS-15160 at 6/2/20, 9:25 AM: --- Hi [~sodonnell] , Thanks for your quick reply. I have one more question, is it safe to change FsDatasetImpl#getBlockLocalPathInfo and DataNode#transferReplicaForPipelineRecovery to read lock?As you mentioned above, both of them would change generationStamp, the `synchronized(replica) ` in FsDatasetImpl#getBlockLocalPathInfo seems not protected generationStamp from being changed in other methods. Would you like to have a double check? Besides, FsDatasetImpl#getBlockReports could be changed to read lock in my opinion, What do you think? Thanks. was (Author: jiang xin): Hi [~sodonnell] , Thanks for your quick reply. I have one more question, is it safe to change FsDatasetImpl#getBlockLocalPathInfo and DataNode#transferReplicaForPipelineRecovery to read lock? As you mentioned above, both of them would change generationStamp, the `synchronized(replica) ` in FsDatasetImpl#getBlockLocalPathInfo seems not protected generationStamp from being changed in other methods. Would you like to have a review? Thanks. > ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl > methods should use datanode readlock > --- > > Key: HDFS-15160 > URL: https://issues.apache.org/jira/browse/HDFS-15160 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, > HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, > HDFS-15160.006.patch, image-2020-04-10-17-18-08-128.png, > image-2020-04-10-17-18-55-938.png > > > Now we have HDFS-15150, we can start to move some DN operations to use the > read lock rather than the write lock to improve concurrence. The first step > is to make the changes to ReplicaMap, as many other methods make calls to it. > This Jira switches read operations against the volume map to use the readLock > rather than the write lock. > Additionally, some methods make a call to replicaMap.replicas() (eg > getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result > in a read only fashion, so they can also be switched to using a readLock. > Next is the directory scanner and disk balancer, which only require a read > lock. > Finally (for this Jira) are various "low hanging fruit" items in BlockSender > and fsdatasetImpl where is it fairly obvious they only need a read lock. > For now, I have avoided changing anything which looks too risky, as I think > its better to do any larger refactoring or risky changes each in their own > Jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120455#comment-17120455 ] Jiang Xin edited comment on HDFS-15160 at 5/31/20, 9:03 AM: Hi [~sodonnell] , Thanks for your quick reply. I have one more question, is it safe to change FsDatasetImpl#getBlockLocalPathInfo and DataNode#transferReplicaForPipelineRecovery to read lock? As you mentioned above, both of them would change generationStamp, the `synchronized(replica) ` in FsDatasetImpl#getBlockLocalPathInfo seems not protected generationStamp from being changed in other methods. Would you like to have a review? Thanks. was (Author: jiang xin): Hi [~sodonnell] , Thanks for your quick reply. I have one more question, is it safe to change FsDatasetImpl#getBlockLocalPathInfo and DataNode#transferReplicaForPipelineRecovery to read lock? As you mentioned above, both of them would change generationStamp, the `synchronized(replica) ` in FsDatasetImpl#getBlockLocalPathInfo seems not protected generationStamp from being changed in other methods. Would you like to review it again? Thanks. > ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl > methods should use datanode readlock > --- > > Key: HDFS-15160 > URL: https://issues.apache.org/jira/browse/HDFS-15160 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, > HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, > HDFS-15160.006.patch, image-2020-04-10-17-18-08-128.png, > image-2020-04-10-17-18-55-938.png > > > Now we have HDFS-15150, we can start to move some DN operations to use the > read lock rather than the write lock to improve concurrence. The first step > is to make the changes to ReplicaMap, as many other methods make calls to it. > This Jira switches read operations against the volume map to use the readLock > rather than the write lock. > Additionally, some methods make a call to replicaMap.replicas() (eg > getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result > in a read only fashion, so they can also be switched to using a readLock. > Next is the directory scanner and disk balancer, which only require a read > lock. > Finally (for this Jira) are various "low hanging fruit" items in BlockSender > and fsdatasetImpl where is it fairly obvious they only need a read lock. > For now, I have avoided changing anything which looks too risky, as I think > its better to do any larger refactoring or risky changes each in their own > Jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118394#comment-17118394 ] Jiang Xin edited comment on HDFS-15160 at 5/28/20, 7:28 AM: [~sodonnell] Thanks for your great job. I'm going to apply patch 005 and run on our production cluster. But the code `synchronized(replica) ... ` in method getBlockLocalPathInfo confuse me. It's holding the write lock, doesn't need to worry about updating genStamps . I assume that you wanted to change method getBlockLocalPathInfo in read lock, am I right? Thanks was (Author: jiang xin): [~sodonnell] I'm going to apply patch 005 and run on our production cluster. But the code `synchronized(replica) ... ` in method getBlockLocalPathInfo confuse me. It's holding the write lock, doesn't need to worry about updating genStamps . I assume that you wanted to change method getBlockLocalPathInfo in read lock, am I right? > ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl > methods should use datanode readlock > --- > > Key: HDFS-15160 > URL: https://issues.apache.org/jira/browse/HDFS-15160 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, > HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, > image-2020-04-10-17-18-08-128.png, image-2020-04-10-17-18-55-938.png > > > Now we have HDFS-15150, we can start to move some DN operations to use the > read lock rather than the write lock to improve concurrence. The first step > is to make the changes to ReplicaMap, as many other methods make calls to it. > This Jira switches read operations against the volume map to use the readLock > rather than the write lock. > Additionally, some methods make a call to replicaMap.replicas() (eg > getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result > in a read only fashion, so they can also be switched to using a readLock. > Next is the directory scanner and disk balancer, which only require a read > lock. > Finally (for this Jira) are various "low hanging fruit" items in BlockSender > and fsdatasetImpl where is it fairly obvious they only need a read lock. > For now, I have avoided changing anything which looks too risky, as I think > its better to do any larger refactoring or risky changes each in their own > Jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056054#comment-17056054 ] zhuqi edited comment on HDFS-15160 at 3/10/20, 3:23 PM: Thanks [~sodonnell] for your great works. LGTM, i agree with [~hexiaoqiao] that the DataNode#transferReplicaForPipelineRecovery should change data.acquireDatasetLock() to data.acquireDatasetReadLock() to get replica information. was (Author: zhuqi): Thanks [~sodonnell] for your great works. LGTM, i agree with [~hexiaoqiao] that the DataNode#transferReplicaForPipelineRecovery should change data.acquireDatasetLock() to data.acquireDatasetReadLock() to get replica information. > ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl > methods should use datanode readlock > --- > > Key: HDFS-15160 > URL: https://issues.apache.org/jira/browse/HDFS-15160 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch > > > Now we have HDFS-15150, we can start to move some DN operations to use the > read lock rather than the write lock to improve concurrence. The first step > is to make the changes to ReplicaMap, as many other methods make calls to it. > This Jira switches read operations against the volume map to use the readLock > rather than the write lock. > Additionally, some methods make a call to replicaMap.replicas() (eg > getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result > in a read only fashion, so they can also be switched to using a readLock. > Next is the directory scanner and disk balancer, which only require a read > lock. > Finally (for this Jira) are various "low hanging fruit" items in BlockSender > and fsdatasetImpl where is it fairly obvious they only need a read lock. > For now, I have avoided changing anything which looks too risky, as I think > its better to do any larger refactoring or risky changes each in their own > Jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org