[jira] [Commented] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock

2020-09-14 Thread Jiang Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17195317#comment-17195317
 ] 

Jiang Xin commented on HDFS-15382:
--

Thanks [~Aiphag0] for your proposal, seems it helps a lot on IO heavy DNs and 
we planned to do this recently. Would you submit a sample patch or push it 
forward?

> Split FsDatasetImpl from blockpool lock to blockpool volume lock 
> -
>
> Key: HDFS-15382
> URL: https://issues.apache.org/jira/browse/HDFS-15382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.2.1
>
> Attachments: image-2020-06-02-1.png, image-2020-06-03-1.png
>
>
> In HDFS-15180 we split lock to blockpool grain size.But when one volume is in 
> heavy load and will block other request which in same blockpool but different 
> volume.So we split lock to two leval to avoid this happend.And to improve 
> datanode performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2020-08-16 Thread Jiang Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17178491#comment-17178491
 ] 

Jiang Xin edited comment on HDFS-11187 at 8/16/20, 2:27 PM:


Hi [~weichiu] , thanks for your patch, but I don't think it handles all 
scenario, please correct me if I misunderstood.

Consider such case: A reader thread reads a finalized replica, which have not 
load the LPCC in memory, so it's LPCC is null, then it comes to 
getPartialChunkChecksumForFinalized() in BlockSender#init(). 
{code:java}
private ChunkChecksum getPartialChunkChecksumForFinalized(
FinalizedReplica finalized) throws IOException {
  ...
  final long replicaVisibleLength = replica.getVisibleLength();
  if (replicaVisibleLength % CHUNK_SIZE != 0 &&
  finalized.getLastPartialChunkChecksum() == null) {

//
// reader thread blocks here, and another writer thread append this 
replica and succeed.
//

// the finalized replica does not have precomputed last partial
// chunk checksum. Recompute now.
try {
  finalized.loadLastPartialChunkChecksum();
  return new ChunkChecksum(finalized.getVisibleLength(),
  finalized.getLastPartialChunkChecksum());
} catch (FileNotFoundException e) {
  ...
}
{code}
At the same time, another thread append this replica and succeed. So 
getPartialChunkChecksumForFinalized recompute the LPCC and set to the replica. 
In this situation, the `finalized` object is old, and it's visibleLength is old 
,but after loadLastPartialChunkChecksum(), it's LPCC is new, so the mismatch 
happened.

I'm not sure if we need to worry about it, any advise?

 


was (Author: jiang xin):
Hi [~weichiu] , thanks for your patch, but I don't think it handles all 
scenario, please correct me if I misunderstood.

Consider such case: A reader thread reads a finalized replica, which have not 
load the LPCC in memory, so it's LPCC is null, then it comes to 
getPartialChunkChecksumForFinalized() in BlockSender#init(). 

 
{code:java}
private ChunkChecksum getPartialChunkChecksumForFinalized(
FinalizedReplica finalized) throws IOException {
  ...
  final long replicaVisibleLength = replica.getVisibleLength();
  if (replicaVisibleLength % CHUNK_SIZE != 0 &&
  finalized.getLastPartialChunkChecksum() == null) {

//
// reader thread blocks here, and another writer thread append this 
replica and succeed.
//

// the finalized replica does not have precomputed last partial
// chunk checksum. Recompute now.
try {
  finalized.loadLastPartialChunkChecksum();
  return new ChunkChecksum(finalized.getVisibleLength(),
  finalized.getLastPartialChunkChecksum());
} catch (FileNotFoundException e) {
  ...
}
{code}
At the same time, another thread append this replica and succeed. So 
getPartialChunkChecksumForFinalized recompute the LPCC and set to the replica. 
In this situation, the `finalized` object is old, and it's visibleLength is old 
,but after loadLastPartialChunkChecksum(), it's LPCC is new, so the mismatch 
happened.

I'm not sure if we need to worry about it, any advise?

 

 

 

 

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.3
>
> Attachments: HDFS-11187-branch-2.001.patch, 
> HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, 
> HDFS-11187-branch-2.004.patch, HDFS-11187-branch-2.7.001.patch, 
> HDFS-11187.001.patch, HDFS-11187.002.patch, HDFS-11187.003.patch, 
> HDFS-11187.004.patch, HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2020-08-16 Thread Jiang Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17178491#comment-17178491
 ] 

Jiang Xin commented on HDFS-11187:
--

Hi [~weichiu] , thanks for your patch, but I don't think it handles all 
scenario, please correct me if I misunderstood.

Consider such case: A reader thread reads a finalized replica, which have not 
load the LPCC in memory, so it's LPCC is null, then it comes to 
getPartialChunkChecksumForFinalized() in BlockSender#init(). 

 
{code:java}
private ChunkChecksum getPartialChunkChecksumForFinalized(
FinalizedReplica finalized) throws IOException {
  ...
  final long replicaVisibleLength = replica.getVisibleLength();
  if (replicaVisibleLength % CHUNK_SIZE != 0 &&
  finalized.getLastPartialChunkChecksum() == null) {

//
// reader thread blocks here, and another writer thread append this 
replica and succeed.
//

// the finalized replica does not have precomputed last partial
// chunk checksum. Recompute now.
try {
  finalized.loadLastPartialChunkChecksum();
  return new ChunkChecksum(finalized.getVisibleLength(),
  finalized.getLastPartialChunkChecksum());
} catch (FileNotFoundException e) {
  ...
}
{code}
At the same time, another thread append this replica and succeed. So 
getPartialChunkChecksumForFinalized recompute the LPCC and set to the replica. 
In this situation, the `finalized` object is old, and it's visibleLength is old 
,but after loadLastPartialChunkChecksum(), it's LPCC is new, so the mismatch 
happened.

I'm not sure if we need to worry about it, any advise?

 

 

 

 

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.3
>
> Attachments: HDFS-11187-branch-2.001.patch, 
> HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, 
> HDFS-11187-branch-2.004.patch, HDFS-11187-branch-2.7.001.patch, 
> HDFS-11187.001.patch, HDFS-11187.002.patch, HDFS-11187.003.patch, 
> HDFS-11187.004.patch, HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2020-06-11 Thread Jiang Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133904#comment-17133904
 ] 

Jiang Xin commented on HDFS-15160:
--

Hi [~sodonnell] , I applied this patch on a small cluster which contains a 
hundred datanodes, for now it works well and significantly decrease the blocked 
thread! I'm going to greyscale on our largest cluster next week, after running 
stably for a period of time, I'll give you the feedback.

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> HDFS-15160.006.patch, image-2020-04-10-17-18-08-128.png, 
> image-2020-04-10-17-18-55-938.png
>
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2020-06-02 Thread Jiang Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120455#comment-17120455
 ] 

Jiang Xin edited comment on HDFS-15160 at 6/2/20, 9:25 AM:
---

Hi [~sodonnell] ,

Thanks for your quick reply.

I have one more question, is it safe to change 
FsDatasetImpl#getBlockLocalPathInfo and 
DataNode#transferReplicaForPipelineRecovery to read lock?As you mentioned 
above, both of them would change generationStamp, the `synchronized(replica) ` 
in FsDatasetImpl#getBlockLocalPathInfo seems not protected generationStamp from 
being changed in other methods. Would you like to have a double check?

Besides, FsDatasetImpl#getBlockReports could be changed to read lock in my 
opinion, What do you think?

Thanks.


was (Author: jiang xin):
Hi [~sodonnell] ,

Thanks for your quick reply. I have one more question, is it safe to change 
FsDatasetImpl#getBlockLocalPathInfo and 
DataNode#transferReplicaForPipelineRecovery to read lock?

As you mentioned above, both of them would change generationStamp, the 
`synchronized(replica) ` in FsDatasetImpl#getBlockLocalPathInfo seems not 
protected generationStamp from being changed in other methods. Would you like 
to have a review?

Thanks.

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> HDFS-15160.006.patch, image-2020-04-10-17-18-08-128.png, 
> image-2020-04-10-17-18-55-938.png
>
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2020-05-31 Thread Jiang Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120455#comment-17120455
 ] 

Jiang Xin edited comment on HDFS-15160 at 5/31/20, 9:03 AM:


Hi [~sodonnell] ,

Thanks for your quick reply. I have one more question, is it safe to change 
FsDatasetImpl#getBlockLocalPathInfo and 
DataNode#transferReplicaForPipelineRecovery to read lock?

As you mentioned above, both of them would change generationStamp, the 
`synchronized(replica) ` in FsDatasetImpl#getBlockLocalPathInfo seems not 
protected generationStamp from being changed in other methods. Would you like 
to have a review?

Thanks.


was (Author: jiang xin):
Hi [~sodonnell] ,

Thanks for your quick reply. I have one more question, is it safe to change 
FsDatasetImpl#getBlockLocalPathInfo and 
DataNode#transferReplicaForPipelineRecovery to read lock?

As you mentioned above, both of them would change generationStamp, the 
`synchronized(replica) ` in FsDatasetImpl#getBlockLocalPathInfo seems not 
protected generationStamp from being changed in other methods. Would you like 
to review it again?

Thanks.

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> HDFS-15160.006.patch, image-2020-04-10-17-18-08-128.png, 
> image-2020-04-10-17-18-55-938.png
>
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2020-05-31 Thread Jiang Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120455#comment-17120455
 ] 

Jiang Xin commented on HDFS-15160:
--

Hi [~sodonnell] ,

Thanks for your quick reply. I have one more question, is it safe to change 
FsDatasetImpl#getBlockLocalPathInfo and 
DataNode#transferReplicaForPipelineRecovery to read lock?

As you mentioned above, both of them would change generationStamp, the 
`synchronized(replica) ` in FsDatasetImpl#getBlockLocalPathInfo seems not 
protected generationStamp from being changed in other methods. Would you like 
to review it again?

Thanks.

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> HDFS-15160.006.patch, image-2020-04-10-17-18-08-128.png, 
> image-2020-04-10-17-18-55-938.png
>
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2020-05-28 Thread Jiang Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118394#comment-17118394
 ] 

Jiang Xin edited comment on HDFS-15160 at 5/28/20, 7:28 AM:


[~sodonnell]  Thanks for your great job. 

I'm going to apply patch 005 and run on our production cluster. But the code 
`synchronized(replica) ... ` in method getBlockLocalPathInfo confuse me. It's 
holding the write lock, doesn't need to worry about updating genStamps . I 
assume that you wanted to change method getBlockLocalPathInfo in read lock, am 
I right?

Thanks


was (Author: jiang xin):
[~sodonnell]  I'm going to apply patch 005 and run on our production cluster. 
But the code `synchronized(replica) ... ` in method getBlockLocalPathInfo 
confuse me. It's holding the write lock, doesn't need to worry about updating 
genStamps . I assume that you wanted to change method getBlockLocalPathInfo in 
read lock, am I right?

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> image-2020-04-10-17-18-08-128.png, image-2020-04-10-17-18-55-938.png
>
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2020-05-28 Thread Jiang Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118394#comment-17118394
 ] 

Jiang Xin commented on HDFS-15160:
--

[~sodonnell]  I'm going to apply patch 005 and run on our production cluster. 
But the code `synchronized(replica) ... ` in method getBlockLocalPathInfo 
confuse me. It's holding the write lock, doesn't need to worry about updating 
genStamps . I assume that you wanted to change method getBlockLocalPathInfo in 
read lock, am I right?

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> image-2020-04-10-17-18-08-128.png, image-2020-04-10-17-18-55-938.png
>
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org