[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2021-07-13 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379425#comment-17379425
 ] 

Ahmed Hussein edited comment on HDFS-15160 at 7/13/21, 1:08 PM:


Created [PR-3196|https://github.com/apache/hadoop/pull/3196] for branch-2.10
 - main conflicts caused by HDFS-10636. (Modify ReplicaInfo to remove the 
assumption that replica metadata and data are stored in java.io.File)
 - {{FsDatasetImpl.java#validateBlockFile}} due to HDFS-10636 . I see that 3.x 
patch removes the lock protecting the {{r = volumeMap.get(bpid, blockId)}} ; 
but I am not sure if this can be applied to 2.10 since the caller of 
{{validateBlockFile()}} does not acquire the dataRWLock as far as I can see.
 - {{FsDatasetImpl.java#getFile}} only exists in branch-2.10 and was removed in 
HDFS-10636
 - {{DiskBalancer}} does not exist in branch-2.10
 - {{FsDatasetImpl.java#moveBlockAcrossStorage}} is different in branch-2.10


was (Author: ahussein):
Created [PR-3196|https://github.com/apache/hadoop/pull/3196] for branch-2.10
 - main conflicts caused by HDFS-10636. (Modify ReplicaInfo to remove the 
assumption that replica metadata and data are stored in java.io.File)
 - {{FsDatasetImpl.java#validateBlockFile}} due to HDFS-10636 . I see that 3.x 
patch removes the lock protecting the {{r = volumeMap.get(bpid, blockId);}} ; 
but I am not sure if this can be applied to 2.10 since the caller of 
{{validateBlockFile()}} does not acquire the dataRWLock as far as I can see.
 - {{FsDatasetImpl.java#getFile}} only exists in branch-2.10 and was removed in 
HDFS-10636
 - {{DiskBalancer}} does not exist in branch-2.10
 - {{FsDatasetImpl.java#moveBlockAcrossStorage}} is different in branch-2.10

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15160-branch-3.3-001.patch, HDFS-15160.001.patch, 
> HDFS-15160.002.patch, HDFS-15160.003.patch, HDFS-15160.004.patch, 
> HDFS-15160.005.patch, HDFS-15160.006.patch, HDFS-15160.007.patch, 
> HDFS-15160.008.patch, HDFS-15160.branch-3-3.001.patch, 
> image-2020-04-10-17-18-08-128.png, image-2020-04-10-17-18-55-938.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2021-07-13 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379425#comment-17379425
 ] 

Ahmed Hussein edited comment on HDFS-15160 at 7/13/21, 1:08 PM:


Created [PR-3196|https://github.com/apache/hadoop/pull/3196] for branch-2.10
 - main conflicts caused by HDFS-10636. (Modify ReplicaInfo to remove the 
assumption that replica metadata and data are stored in java.io.File)
 - {{FsDatasetImpl.java#validateBlockFile}} due to HDFS-10636 . I see that 3.x 
patch removes the lock protecting the {{r = volumeMap.get(bpid, blockId);}} ; 
but I am not sure if this can be applied to 2.10 since the caller of 
{{validateBlockFile()}} does not acquire the dataRWLock as far as I can see.
 - {{FsDatasetImpl.java#getFile}} only exists in branch-2.10 and was removed in 
HDFS-10636
 - {{DiskBalancer}} does not exist in branch-2.10
 - {{FsDatasetImpl.java#moveBlockAcrossStorage}} is different in branch-2.10


was (Author: ahussein):
Created [PR-3196|https://github.com/apache/hadoop/pull/3196] for branch-2.10

 - main conflicts caused by HDFS-10636. (Modify ReplicaInfo to remove the 
assumption that replica metadata and data are stored in java.io.File)
- {{FsDatasetImpl.java#validateBlockFile}} due to HDFS-10636
- {{FsDatasetImpl.java#getFile}} only exists in branch-2.10 and was removed 
in HDFS-10636
- {{DiskBalancer}} does not exist in branch-2.10
- {{FsDatasetImpl.java#moveBlockAcrossStorage}} is different in branch-2.10

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15160-branch-3.3-001.patch, HDFS-15160.001.patch, 
> HDFS-15160.002.patch, HDFS-15160.003.patch, HDFS-15160.004.patch, 
> HDFS-15160.005.patch, HDFS-15160.006.patch, HDFS-15160.007.patch, 
> HDFS-15160.008.patch, HDFS-15160.branch-3-3.001.patch, 
> image-2020-04-10-17-18-08-128.png, image-2020-04-10-17-18-55-938.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2020-06-10 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130497#comment-17130497
 ] 

hemanthboyina edited comment on HDFS-15160 at 6/10/20, 10:48 AM:
-

thanks [~pilchard] for the report ,  HDFS-15150 has introduced read and write 
lock in datanode 

With HDFS-15160 we acquire read lock for scanner , so the write wont be blocked 


was (Author: hemanthboyina):
thanks [~pilchard] for the report ,  HDFS-15150 has introduced read and write 
lock in datanode 

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> HDFS-15160.006.patch, image-2020-04-10-17-18-08-128.png, 
> image-2020-04-10-17-18-55-938.png
>
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2020-06-10 Thread ludun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130342#comment-17130342
 ] 

ludun edited comment on HDFS-15160 at 6/10/20, 10:03 AM:
-

[~brahmareddy] pls check this issue also.  in our custome enviremont.  
Directory Scanner  block the write for a long time(300s+). although they have 
too many blocks for datanode. but we should not hold write lock for scanner.
{code}
2020-06-10 12:17:06,869 | INFO  | 
java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
queue] | BlockPool BP-1104115233-xx-1571300215588 Total blocks: 11149530, 
missing metadata files:472, missing block files:472, missing blocks in 
memory:0, mismatched blocks:0 | DirectoryScanner.java:473
2020-06-10 12:17:06,869 | WARN  | 
java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
queue] | Lock held time above threshold: lock identifier: 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: 
java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
 | InstrumentedLock.java:143
{code}


was (Author: pilchard):
[~brahmareddy] pls check this issue also.  in our custome enviremont.  
Directory Scanner  block the write for a long time(300s+). although they have 
too many blocks for datanode. but we should not hold write lock for scanner.
{code}
2020-06-10 12:17:06,869 | INFO  | 
java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
queue] | BlockPool BP-1104115233-10.2.245.4-1571300215588 Total blocks: 
11149530, missing metadata files:472, missing block files:472, missing blocks 
in memory:0, mismatched blocks:0 | DirectoryScanner.java:473
2020-06-10 12:17:06,869 | WARN  | 
java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
queue] | Lock held time above threshold: lock identifier: 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: 
java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
 | InstrumentedLock.java:143
{code}

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> 

[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2020-06-10 Thread ludun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130342#comment-17130342
 ] 

ludun edited comment on HDFS-15160 at 6/10/20, 7:35 AM:


[~brahmareddy] pls check this issue also.  in our custome enviremont.  
Directory Scanner  block the write for a long time(300s+). although they have 
too many blocks for datanode. but we should not hold write lock for scanner.
{code}
2020-06-10 12:17:06,869 | INFO  | 
java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
queue] | BlockPool BP-1104115233-10.2.245.4-1571300215588 Total blocks: 
11149530, missing metadata files:472, missing block files:472, missing blocks 
in memory:0, mismatched blocks:0 | DirectoryScanner.java:473
2020-06-10 12:17:06,869 | WARN  | 
java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
queue] | Lock held time above threshold: lock identifier: 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: 
java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
 | InstrumentedLock.java:143
{code}


was (Author: pilchard):
[~brahmareddy] pls check this issue also.  in our custome enviremont.  
Directory Scanner  block the write for a long time.
{code}
2020-06-10 12:17:06,869 | INFO  | 
java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
queue] | BlockPool BP-1104115233-10.2.245.4-1571300215588 Total blocks: 
11149530, missing metadata files:472, missing block files:472, missing blocks 
in memory:0, mismatched blocks:0 | DirectoryScanner.java:473
2020-06-10 12:17:06,869 | WARN  | 
java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
queue] | Lock held time above threshold: lock identifier: 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: 
java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
 | InstrumentedLock.java:143
{code}

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> 

[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2020-06-02 Thread Jiang Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120455#comment-17120455
 ] 

Jiang Xin edited comment on HDFS-15160 at 6/2/20, 9:25 AM:
---

Hi [~sodonnell] ,

Thanks for your quick reply.

I have one more question, is it safe to change 
FsDatasetImpl#getBlockLocalPathInfo and 
DataNode#transferReplicaForPipelineRecovery to read lock?As you mentioned 
above, both of them would change generationStamp, the `synchronized(replica) ` 
in FsDatasetImpl#getBlockLocalPathInfo seems not protected generationStamp from 
being changed in other methods. Would you like to have a double check?

Besides, FsDatasetImpl#getBlockReports could be changed to read lock in my 
opinion, What do you think?

Thanks.


was (Author: jiang xin):
Hi [~sodonnell] ,

Thanks for your quick reply. I have one more question, is it safe to change 
FsDatasetImpl#getBlockLocalPathInfo and 
DataNode#transferReplicaForPipelineRecovery to read lock?

As you mentioned above, both of them would change generationStamp, the 
`synchronized(replica) ` in FsDatasetImpl#getBlockLocalPathInfo seems not 
protected generationStamp from being changed in other methods. Would you like 
to have a review?

Thanks.

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> HDFS-15160.006.patch, image-2020-04-10-17-18-08-128.png, 
> image-2020-04-10-17-18-55-938.png
>
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2020-05-31 Thread Jiang Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120455#comment-17120455
 ] 

Jiang Xin edited comment on HDFS-15160 at 5/31/20, 9:03 AM:


Hi [~sodonnell] ,

Thanks for your quick reply. I have one more question, is it safe to change 
FsDatasetImpl#getBlockLocalPathInfo and 
DataNode#transferReplicaForPipelineRecovery to read lock?

As you mentioned above, both of them would change generationStamp, the 
`synchronized(replica) ` in FsDatasetImpl#getBlockLocalPathInfo seems not 
protected generationStamp from being changed in other methods. Would you like 
to have a review?

Thanks.


was (Author: jiang xin):
Hi [~sodonnell] ,

Thanks for your quick reply. I have one more question, is it safe to change 
FsDatasetImpl#getBlockLocalPathInfo and 
DataNode#transferReplicaForPipelineRecovery to read lock?

As you mentioned above, both of them would change generationStamp, the 
`synchronized(replica) ` in FsDatasetImpl#getBlockLocalPathInfo seems not 
protected generationStamp from being changed in other methods. Would you like 
to review it again?

Thanks.

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> HDFS-15160.006.patch, image-2020-04-10-17-18-08-128.png, 
> image-2020-04-10-17-18-55-938.png
>
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2020-05-28 Thread Jiang Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118394#comment-17118394
 ] 

Jiang Xin edited comment on HDFS-15160 at 5/28/20, 7:28 AM:


[~sodonnell]  Thanks for your great job. 

I'm going to apply patch 005 and run on our production cluster. But the code 
`synchronized(replica) ... ` in method getBlockLocalPathInfo confuse me. It's 
holding the write lock, doesn't need to worry about updating genStamps . I 
assume that you wanted to change method getBlockLocalPathInfo in read lock, am 
I right?

Thanks


was (Author: jiang xin):
[~sodonnell]  I'm going to apply patch 005 and run on our production cluster. 
But the code `synchronized(replica) ... ` in method getBlockLocalPathInfo 
confuse me. It's holding the write lock, doesn't need to worry about updating 
genStamps . I assume that you wanted to change method getBlockLocalPathInfo in 
read lock, am I right?

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> image-2020-04-10-17-18-08-128.png, image-2020-04-10-17-18-55-938.png
>
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2020-03-10 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056054#comment-17056054
 ] 

zhuqi edited comment on HDFS-15160 at 3/10/20, 3:23 PM:


Thanks [~sodonnell] for your great works.

LGTM, i agree with  [~hexiaoqiao]  that the 
DataNode#transferReplicaForPipelineRecovery should change 
data.acquireDatasetLock() to data.acquireDatasetReadLock() to get replica 
information.


was (Author: zhuqi):
Thanks [~sodonnell] for your great works.

LGTM, i agree with  [~hexiaoqiao]  that the 
DataNode#transferReplicaForPipelineRecovery should change 

data.acquireDatasetLock() to data.acquireDatasetReadLock() to get replica 
information.

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch
>
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org