date:20240618

[jira] [Resolved] (HDFS-17554) OIV: Print the storage policy name in OIV delimited output

2024-06-18 Thread Hualong Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hualong Zhang resolved HDFS-17554.
--
Resolution: Not A Problem

> OIV: Print the storage policy name in OIV delimited output
> --
>
> Key: HDFS-17554
> URL: https://issues.apache.org/jira/browse/HDFS-17554
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 3.5.0
>Reporter: Hualong Zhang
>Assignee: Hualong Zhang
>Priority: Major
>
> Refer to adding the storage policy name to the OIV output instead of the 
> erasure coding policy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-17528) FsImageValidation: set txid when saving a new image

2024-06-18 Thread Tsz-wo Sze (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze resolved HDFS-17528.
---
Fix Version/s: 3.5.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Th pull request is now merged.

> FsImageValidation: set txid when saving a new image
> ---
>
> Key: HDFS-17528
> URL: https://issues.apache.org/jira/browse/HDFS-17528
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> - When the fsimage is specified as a file and the FsImageValidation tool 
> saves a new image (for removing inaccessible inodes), the txid is not set.  
> Then, the resulted image will have 0 as its txid.
> - When the fsimage is specified as a directory, the txid is set.  However, it 
> will get NPE since NameNode metrics is uninitialized (although the metrics is 
> not used by FsImageValidation).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17546) Implementing Timeout for HostFileReader when FS hangs

2024-06-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856023#comment-17856023
 ] 

ASF GitHub Bot commented on HDFS-17546:
---

ctrezzo commented on PR #6891:
URL: https://github.com/apache/hadoop/pull/6891#issuecomment-2176655680

   @NyteKnight can you take a look at the new spot bugs warning? It looks like 
it is complaining about an unchecked conversion.




> Implementing Timeout for HostFileReader when FS hangs
> -
>
> Key: HDFS-17546
> URL: https://issues.apache.org/jira/browse/HDFS-17546
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Simbarashe Dzinamarira
>Assignee: Heagan
>Priority: Minor
>  Labels: pull-request-available
>
> Certain implementations of Hadoop have the dfs.hosts file residing on NAS/NFS 
> and potentially with symlinks. If the FS hangs for any reason, the 
> refreshNodes call would infinitely hang on the HostsFileReader until the FS 
> returns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17528) FsImageValidation: set txid when saving a new image

2024-06-18 Thread Tsz-wo Sze (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17855989#comment-17855989
 ] 

Tsz-wo Sze commented on HDFS-17528:
---

[~smajeti], there were still some build issues.  Just have pushed an empty 
commit for triggering a new build.

> FsImageValidation: set txid when saving a new image
> ---
>
> Key: HDFS-17528
> URL: https://issues.apache.org/jira/browse/HDFS-17528
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>  Labels: pull-request-available
>
> - When the fsimage is specified as a file and the FsImageValidation tool 
> saves a new image (for removing inaccessible inodes), the txid is not set.  
> Then, the resulted image will have 0 as its txid.
> - When the fsimage is specified as a directory, the txid is set.  However, it 
> will get NPE since NameNode metrics is uninitialized (although the metrics is 
> not used by FsImageValidation).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-17554) OIV: Print the storage policy name in OIV delimited output

2024-06-18 Thread Hualong Zhang (Jira)

Hualong Zhang created HDFS-17554:


 Summary: OIV: Print the storage policy name in OIV delimited output
 Key: HDFS-17554
 URL: https://issues.apache.org/jira/browse/HDFS-17554
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Affects Versions: 3.5.0
Reporter: Hualong Zhang
Assignee: Hualong Zhang


Refer to adding the storage policy name to the OIV output instead of the 
erasure coding policy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17553) DFSOutputStream.java#closeImpl should have configurable retries upon flushInternal failures

2024-06-18 Thread Zinan Zhuang (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zinan Zhuang updated HDFS-17553:

Summary: DFSOutputStream.java#closeImpl should have configurable retries 
upon flushInternal failures  (was: DFSOutputStream.java#closeImpl should 
configurable retries upon flushInternal failures)

> DFSOutputStream.java#closeImpl should have configurable retries upon 
> flushInternal failures
> ---
>
> Key: HDFS-17553
> URL: https://issues.apache.org/jira/browse/HDFS-17553
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Zinan Zhuang
>Priority: Major
>
> HDFS-15865 introduced an interrupt in DFSStreamer class to interrupt the 
> waitForAckedSeqno call when timeout has exceeded, which throws an 
> InterrupttedIOExceptions. This method is being used in 
> [DFSOutputStream.java#flushInternal 
> |https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L773]
>  , one of whose use case is 
> [DFSOutputStream.java#closeImpl|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L870]
>  to close a file.
> What we saw was that we were getting InterrupttedIOExceptions during the 
> flushInternal call when we were closing out a file, which was unhandled by 
> DFSClient and got thrown to caller. There's a known issue HDFS-4504 that when 
> a file failed to close on HDFS side, block recovery was not called and the 
> lease got leaked until the DFSClient gets recycled. In our HBase setups, 
> DFSClients remain long-lived in regionservers, which means these files remain 
> undead until the corresponding regionservers get restarted.
> This issue was observed during datanode decomission because it was stuck on 
> open files caused by above leakage. As it's good to close a HDFS file as 
> smooth as possible, retries of flushInternal during closeImpl operations 
> would be beneficial to reduce such leakages. The number of retries can be 
> based on dfsclient config. [For 
> example|https://github.com/apache/hadoop/blob/63633812a417a3d548be3bdcecebd2ae893d03e0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1660]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17553) DFSOutputStream.java#closeImpl should configurable retries upon flushInternal failures

2024-06-18 Thread Zinan Zhuang (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zinan Zhuang updated HDFS-17553:

Description: 
HDFS-15865 introduced an interrupt in DFSStreamer class to interrupt the 
waitForAckedSeqno call when timeout has exceeded, which throws an 
InterrupttedIOExceptions. This method is being used in 
[DFSOutputStream.java#flushInternal 
|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L773]
 , one of whose use case is 
[DFSOutputStream.java#closeImpl|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L870]
 to close a file.

What we saw was that we were getting InterrupttedIOExceptions during the 
flushInternal call when we were closing out a file, which was unhandled by 
DFSClient and got thrown to caller. There's a known issue HDFS-4504 that when a 
file failed to close on HDFS side, block recovery was not called and the lease 
got leaked until the DFSClient gets recycled. In our HBase setups, DFSClients 
remain long-lived in regionservers, which means these files remain undead until 
the corresponding regionservers get restarted.

This issue was observed during datanode decomission because it was stuck on 
open files caused by above leakage. As it's good to close a HDFS file as smooth 
as possible, retries of flushInternal during closeImpl operations would be 
beneficial to reduce such leakages. The number of retries can be based on 
dfsclient config. [For 
example|https://github.com/apache/hadoop/blob/63633812a417a3d548be3bdcecebd2ae893d03e0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1660]
 

  was:
[HDFS-15865|https://issues.apache.org/jira/browse/HDFS-15865] introduced an 
interrupt in DFSStreamer class to interrupt the 
waitForAckedSeqno call when timeout has exceeded, which throws an 
InterrupttedIOExceptions. This method is being used in 
[DFSOutputStream.java#flushInternal 
|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L773]
 , one of whose use case is 
[DFSOutputStream.java#closeImpl|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L870]
 to close a file.

What we saw was that we were getting InterrupttedIOExceptions during the 
flushInternal call when we were closing out a file, which was unhandled by 
DFSClient and got thrown to caller. There's a known issue HDFS-4504 that when a 
file failed to close on HDFS side, block recovery was not called and the lease 
got leaked until the DFSClient gets recycled. In our HBase setups, DFSClients 
remain long-lived in regionservers, which means these files remain undead until 
the corresponding regionservers get restarted.

This issue was observed during datanode decomission because it was stuck on 
open files caused by above leakage. As it's good to close a HDFS file as smooth 
as possible, a retry of flushInternal during closeImpl operations would be 
beneficial to reduce such leakages. 
 

Summary: DFSOutputStream.java#closeImpl should configurable retries 
upon flushInternal failures  (was: DFSOutputStream.java#closeImpl should have a 
retry upon flushInternal failures)

> DFSOutputStream.java#closeImpl should configurable retries upon flushInternal 
> failures
> --
>
> Key: HDFS-17553
> URL: https://issues.apache.org/jira/browse/HDFS-17553
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Zinan Zhuang
>Priority: Major
>
> HDFS-15865 introduced an interrupt in DFSStreamer class to interrupt the 
> waitForAckedSeqno call when timeout has exceeded, which throws an 
> InterrupttedIOExceptions. This method is being used in 
> [DFSOutputStream.java#flushInternal 
> |https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L773]
>  , one of whose use case is 
> [DFSOutputStream.java#closeImpl|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L870]
>  to close a file.
> What we saw was that we were getting InterrupttedIOExceptions during the 
> flushInternal call when we were closing out a file, which was unhandled by 
> DFSClient and got thrown to caller. There's a known issue HDFS-4504 that when 
> a file failed to close on HDFS side, block recovery was not called and the 
> lea

[jira] [Updated] (HDFS-17553) DFSOutputStream.java#closeImpl should have a retry upon flushInternal failures

2024-06-18 Thread Zinan Zhuang (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zinan Zhuang updated HDFS-17553:

Description: 
[HDFS-15865|https://issues.apache.org/jira/browse/HDFS-15865] introduced an 
interrupt in DFSStreamer class to interrupt the 
waitForAckedSeqno call when timeout has exceeded, which throws an 
InterrupttedIOExceptions. This method is being used in 
[DFSOutputStream.java#flushInternal 
|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L773]
 , one of whose use case is 
[DFSOutputStream.java#closeImpl|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L870]
 to close a file.

What we saw was that we were getting InterrupttedIOExceptions during the 
flushInternal call when we were closing out a file, which was unhandled by 
DFSClient and got thrown to caller. There's a known issue HDFS-4504 that when a 
file failed to close on HDFS side, block recovery was not called and the lease 
got leaked until the DFSClient gets recycled. In our HBase setups, DFSClients 
remain long-lived in regionservers, which means these files remain undead until 
the corresponding regionservers get restarted.

This issue was observed during datanode decomission because it was stuck on 
open files caused by above leakage. As it's good to close a HDFS file as smooth 
as possible, a retry of flushInternal during closeImpl operations would be 
beneficial to reduce such leakages. 
 

  was:
HDFS-15865 introduced an interrupt in DFSStreamer class to interrupt the 
waitForAckedSeqno call when timeout has exceeded, which throws an 
InterrupttedIOExceptions. This method is being used in 
[DFSOutputStream.java#flushInternal 
|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L773]
 , one of whose use case is 
[DFSOutputStream.java#closeImpl|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L870]
 to close a file.

What we saw was that we were getting InterrupttedIOExceptions during the 
flushInternal call when we were closing out a file, which was unhandled by 
DFSClient and got thrown to caller. There's a known issue HDFS-4504 that when a 
file failed to close on HDFS side, block recovery was not called and the lease 
got leaked until the DFSClient gets recycled. In our HBase setups, DFSClients 
remain long-lived in regionservers, which means these files remain undead until 
the corresponding regionservers get restarted.

This issue was observed during datanode decomission because it was stuck on 
open files caused by above leakage. As it's good to close a HDFS file as smooth 
as possible, a retry of flushInternal during closeImpl operations would be 
beneficial to reduce such leakages. 
 


> DFSOutputStream.java#closeImpl should have a retry upon flushInternal failures
> --
>
> Key: HDFS-17553
> URL: https://issues.apache.org/jira/browse/HDFS-17553
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Zinan Zhuang
>Priority: Major
>
> [HDFS-15865|https://issues.apache.org/jira/browse/HDFS-15865] introduced an 
> interrupt in DFSStreamer class to interrupt the 
> waitForAckedSeqno call when timeout has exceeded, which throws an 
> InterrupttedIOExceptions. This method is being used in 
> [DFSOutputStream.java#flushInternal 
> |https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L773]
>  , one of whose use case is 
> [DFSOutputStream.java#closeImpl|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L870]
>  to close a file.
> What we saw was that we were getting InterrupttedIOExceptions during the 
> flushInternal call when we were closing out a file, which was unhandled by 
> DFSClient and got thrown to caller. There's a known issue HDFS-4504 that when 
> a file failed to close on HDFS side, block recovery was not called and the 
> lease got leaked until the DFSClient gets recycled. In our HBase setups, 
> DFSClients remain long-lived in regionservers, which means these files remain 
> undead until the corresponding regionservers get restarted.
> This issue was observed during datanode decomission because it was stuck on 
> open files caused by above leakage. As it's good to close a HDFS file as 
> smooth as possible, a retry of flushI

[jira] [Updated] (HDFS-17553) DFSOutputStream.java#closeImpl should have a retry upon flushInternal failures

2024-06-18 Thread Zinan Zhuang (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zinan Zhuang updated HDFS-17553:

Description: 
HDFS-15865 introduced an interrupt in DFSStreamer class to interrupt the 
waitForAckedSeqno call when timeout has exceeded, which throws an 
InterrupttedIOExceptions. This method is being used in 
[DFSOutputStream.java#flushInternal 
|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L773]
 , one of whose use case is 
[DFSOutputStream.java#closeImpl|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L870]
 to close a file.

What we saw was that we were getting InterrupttedIOExceptions during the 
flushInternal call when we were closing out a file, which was unhandled by 
DFSClient and got thrown to caller. There's a known issue HDFS-4504 that when a 
file failed to close on HDFS side, block recovery was not called and the lease 
got leaked until the DFSClient gets recycled. In our HBase setups, DFSClients 
remain long-lived in regionservers, which means these files remain undead until 
the corresponding regionservers get restarted.

This issue was observed during datanode decomission because it was stuck on 
open files caused by above leakage. As it's good to close a HDFS file as smooth 
as possible, a retry of flushInternal during closeImpl operations would be 
beneficial to reduce such leakages. 
 

  was:
[HDFS-15865|https://issues.apache.org/jira/browse/HDFS-15865] introduced an 
interrupt in DFSStreamer class to interrupt the 
waitForAckedSeqno call when timeout has exceeded, which throws an 
InterrupttedIOExceptions. This method is being used in 
[DFSOutputStream.java#flushInternal 
|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L773]
 , one of whose use case is  
[DFSOutputStream.java#closeImpl|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L870]
 to close a file. 

What we saw was that we were getting InterrupttedIOExceptions during the 
flushInternal call when we were closing out a file, which was unhandled by 
DFSClient and got thrown to caller. There's a known issue 
[HDFS-4504|https://issues.apache.org/jira/browse/HDFS-4504] that when a file 
failed to close on HDFS side, the lease got leaked until the DFSClient gets 
recycled. In our HBase setups, DFSClients remain long-lived in regionservers, 
which means these files remain undead until the corresponding regionservers get 
restarted. 

This issue was observed during datanode decomission because it was stuck on 
open files caused by above leakage. As it's good to close a HDFS file as smooth 
as possible, a retry of flushInternal during closeImpl operations would be 
beneficial to reduce such leakages. 
 


> DFSOutputStream.java#closeImpl should have a retry upon flushInternal failures
> --
>
> Key: HDFS-17553
> URL: https://issues.apache.org/jira/browse/HDFS-17553
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Zinan Zhuang
>Priority: Major
>
> HDFS-15865 introduced an interrupt in DFSStreamer class to interrupt the 
> waitForAckedSeqno call when timeout has exceeded, which throws an 
> InterrupttedIOExceptions. This method is being used in 
> [DFSOutputStream.java#flushInternal 
> |https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L773]
>  , one of whose use case is 
> [DFSOutputStream.java#closeImpl|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L870]
>  to close a file.
> What we saw was that we were getting InterrupttedIOExceptions during the 
> flushInternal call when we were closing out a file, which was unhandled by 
> DFSClient and got thrown to caller. There's a known issue HDFS-4504 that when 
> a file failed to close on HDFS side, block recovery was not called and the 
> lease got leaked until the DFSClient gets recycled. In our HBase setups, 
> DFSClients remain long-lived in regionservers, which means these files remain 
> undead until the corresponding regionservers get restarted.
> This issue was observed during datanode decomission because it was stuck on 
> open files caused by above leakage. As it's good to close a HDFS file as 
> smooth as possible, a retry of flushInternal during closeImpl operation

[jira] [Updated] (HDFS-17553) DFSOutputStream.java#closeImpl should have a retry upon flushInternal failures

2024-06-18 Thread Zinan Zhuang (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zinan Zhuang updated HDFS-17553:

Description: 
[HDFS-15865|https://issues.apache.org/jira/browse/HDFS-15865] introduced an 
interrupt in DFSStreamer class to interrupt the 
waitForAckedSeqno call when timeout has exceeded, which throws an 
InterrupttedIOExceptions. This method is being used in 
[DFSOutputStream.java#flushInternal 
|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L773]
 , one of whose use case is  
[DFSOutputStream.java#closeImpl|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L870]
 to close a file. 

What we saw was that we were getting InterrupttedIOExceptions during the 
flushInternal call when we were closing out a file, which was unhandled by 
DFSClient and got thrown to caller. There's a known issue 
[HDFS-4504|https://issues.apache.org/jira/browse/HDFS-4504] that when a file 
failed to close on HDFS side, the lease got leaked until the DFSClient gets 
recycled. In our HBase setups, DFSClients remain long-lived in regionservers, 
which means these files remain undead until the corresponding regionservers get 
restarted. 

This issue was observed during datanode decomission because it was stuck on 
open files caused by above leakage. As it's good to close a HDFS file as smooth 
as possible, a retry of flushInternal during closeImpl operations would be 
beneficial to reduce such leakages. 
 

  was:
[HDFS-15865|https://issues.apache.org/jira/browse/HDFS-15865] introduced an 
interrupt in DFSStreamer class to interrupt the 
waitForAckedSeqno call when timeout has exceeded, which throws an 
InterrupttedIOExceptions. This method is being used in 
[DFSOutputStream.java#flushInternal 
|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L773]
 , one of whose use case is  
[DFSOutputStream.java#closeImpl|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L870]
 to close a file. 

What we saw was that we were getting InterrupttedIOExceptions during the 
flushInternal call when we are closing out a file, which was unhandled by 
DFSClient and got thrown to caller. There's a known issue 
[HDFS-4504|https://issues.apache.org/jira/browse/HDFS-4504] that when a file 
failed to close on HDFS side, the lease got leaked until the DFSClient gets 
recycled. In our HBase setups, DFSClients remain long-lived in each 
regionserver, which means these files remain undead until the regionserver gets 
restarted. 

This issue was observed during datanode decomission because it was stuck on 
open files caused by above leakage. As it's good to close a HDFS file as smooth 
as possible, a retry of flushInternal during closeImpl operations would be 
beneficial to reduce such leakages. 
 


> DFSOutputStream.java#closeImpl should have a retry upon flushInternal failures
> --
>
> Key: HDFS-17553
> URL: https://issues.apache.org/jira/browse/HDFS-17553
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Zinan Zhuang
>Priority: Major
>
> [HDFS-15865|https://issues.apache.org/jira/browse/HDFS-15865] introduced an 
> interrupt in DFSStreamer class to interrupt the 
> waitForAckedSeqno call when timeout has exceeded, which throws an 
> InterrupttedIOExceptions. This method is being used in 
> [DFSOutputStream.java#flushInternal 
> |https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L773]
>  , one of whose use case is  
> [DFSOutputStream.java#closeImpl|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L870]
>  to close a file. 
> What we saw was that we were getting InterrupttedIOExceptions during the 
> flushInternal call when we were closing out a file, which was unhandled by 
> DFSClient and got thrown to caller. There's a known issue 
> [HDFS-4504|https://issues.apache.org/jira/browse/HDFS-4504] that when a file 
> failed to close on HDFS side, the lease got leaked until the DFSClient gets 
> recycled. In our HBase setups, DFSClients remain long-lived in regionservers, 
> which means these files remain undead until the corresponding regionservers 
> get restarted. 
> This issue was observed during datanode decomission because it was stuck on 
> open files

[jira] [Updated] (HDFS-17553) DFSOutputStream.java#closeImpl should have a retry upon flushInternal failures

2024-06-18 Thread Zinan Zhuang (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zinan Zhuang updated HDFS-17553:

Description: 
[HDFS-15865|https://issues.apache.org/jira/browse/HDFS-15865] introduced an 
interrupt in DFSStreamer class to interrupt the 
waitForAckedSeqno call when timeout has exceeded, which throws an 
InterrupttedIOExceptions. This method is being used in 
[DFSOutputStream.java#flushInternal 
|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L773]
 , one of whose use case is  
[DFSOutputStream.java#closeImpl|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L870]
 to close a file. 

What we saw was that we were getting InterrupttedIOExceptions during the 
flushInternal call when we are closing out a file, which was unhandled by 
DFSClient and got thrown to caller. There's a known issue 
[HDFS-4504|https://issues.apache.org/jira/browse/HDFS-4504] that when a file 
failed to close on HDFS side, the lease got leaked until the DFSClient gets 
recycled. In our HBase setups, DFSClients remain long-lived in each 
regionserver, which means these files remain undead until the regionserver gets 
restarted. 

This issue was observed during datanode decomission because it was stuck on 
open files caused by above leakage. As it's good to close a HDFS file as smooth 
as possible, a retry of flushInternal during closeImpl operations would be 
beneficial to reduce such leakages. 
 

  was:
[HDFS-15865|https://issues.apache.org/jira/browse/HDFS-15865] introduced an 
interrupt in DFSStreamer class to interrupt the 
waitForAckedSeqno call when timeout has exceeded. This method is being used in 
[DFSOutputStream.java#flushInternal 
|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L773]
 , one of whose use case is  
[DFSOutputStream.java#closeImpl|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L870]
 to close a file. 

What we saw was that we were getting more interrupts during the flushInternal 
call when we are closing out a file, which was unhandled by DFSClient and got 
thrown to caller. There's a known issue 
[HDFS-4504|https://issues.apache.org/jira/browse/HDFS-4504] that when a file 
failed to close on HDFS side, the lease got leaked until the DFSClient gets 
recycled. In our HBase setups, DFSClients remain long-lived in each 
regionserver, which means these files remain undead until the regionserver gets 
restarted. 

This issue was observed during datanode decomission because it was stuck on 
open files caused by above leakage. As it's good to close a HDFS file as smooth 
as possible, a retry of flushInternal during closeImpl operations would be 
beneficial to reduce such leakages. 
 


> DFSOutputStream.java#closeImpl should have a retry upon flushInternal failures
> --
>
> Key: HDFS-17553
> URL: https://issues.apache.org/jira/browse/HDFS-17553
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Zinan Zhuang
>Priority: Major
>
> [HDFS-15865|https://issues.apache.org/jira/browse/HDFS-15865] introduced an 
> interrupt in DFSStreamer class to interrupt the 
> waitForAckedSeqno call when timeout has exceeded, which throws an 
> InterrupttedIOExceptions. This method is being used in 
> [DFSOutputStream.java#flushInternal 
> |https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L773]
>  , one of whose use case is  
> [DFSOutputStream.java#closeImpl|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L870]
>  to close a file. 
> What we saw was that we were getting InterrupttedIOExceptions during the 
> flushInternal call when we are closing out a file, which was unhandled by 
> DFSClient and got thrown to caller. There's a known issue 
> [HDFS-4504|https://issues.apache.org/jira/browse/HDFS-4504] that when a file 
> failed to close on HDFS side, the lease got leaked until the DFSClient gets 
> recycled. In our HBase setups, DFSClients remain long-lived in each 
> regionserver, which means these files remain undead until the regionserver 
> gets restarted. 
> This issue was observed during datanode decomission because it was stuck on 
> open files caused by above leakage. As it's good to close a HDFS file as 
> smooth a

[jira] [Created] (HDFS-17553) DFSOutputStream.java#closeImpl should have a retry upon flushInternal failures

2024-06-18 Thread Zinan Zhuang (Jira)

Zinan Zhuang created HDFS-17553:
---

 Summary: DFSOutputStream.java#closeImpl should have a retry upon 
flushInternal failures
 Key: HDFS-17553
 URL: https://issues.apache.org/jira/browse/HDFS-17553
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: dfsclient
Affects Versions: 3.4.0, 3.3.1
Reporter: Zinan Zhuang


[HDFS-15865|https://issues.apache.org/jira/browse/HDFS-15865] introduced an 
interrupt in DFSStreamer class to interrupt the 
waitForAckedSeqno call when timeout has exceeded. This method is being used in 
[DFSOutputStream.java#flushInternal 
|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L773]
 , one of whose use case is  
[DFSOutputStream.java#closeImpl|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L870]
 to close a file. 

What we saw was that we were getting more interrupts during the flushInternal 
call when we are closing out a file, which was unhandled by DFSClient and got 
thrown to caller. There's a known issue 
[HDFS-4504|https://issues.apache.org/jira/browse/HDFS-4504] that when a file 
failed to close on HDFS side, the lease got leaked until the DFSClient gets 
recycled. In our HBase setups, DFSClients remain long-lived in each 
regionserver, which means these files remain undead until the regionserver gets 
restarted. 

This issue was observed during datanode decomission because it was stuck on 
open files caused by above leakage. As it's good to close a HDFS file as smooth 
as possible, a retry of flushInternal during closeImpl operations would be 
beneficial to reduce such leakages. 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-17439) Improve NNThroughputBenchmark to allow non super user to use the tool

2024-06-18 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-17439.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

> Improve NNThroughputBenchmark to allow non super user to use the tool
> -
>
> Key: HDFS-17439
> URL: https://issues.apache.org/jira/browse/HDFS-17439
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: benchmarks, namenode
>Reporter: Fateh Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> The NNThroughputBenchmark can only be used with hdfs user or any user with 
> super user privileges since entering/exiting safemode is a privileged 
> operation. However, when using super user, ACL checks are skipped. Hence it 
> renders the tool to be useless when testing namenode performance along with 
> authorization frameworks such as Apache Ranger / any other authorization 
> frameworks.
> An optional argument such as -nonSuperUser can be used to skip the statements 
> such as entering / exiting safemode. This optional argument makes the tool 
> useful for incorporating authorization frameworks into the performance 
> estimation flows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17546) Implementing Timeout for HostFileReader when FS hangs

2024-06-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17855879#comment-17855879
 ] 

ASF GitHub Bot commented on HDFS-17546:
---

hadoop-yetus commented on PR #6891:
URL: https://github.com/apache/hadoop/pull/6891#issuecomment-2175850768

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 42s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ branch-3.3 Compile Tests _ |
   | +0 :ok: |  mvndep  |  13m 34s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  39m 44s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  compile  |   3m 56s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  checkstyle  |   1m  4s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  mvnsite  |   2m 24s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  javadoc  |   2m  9s |  |  branch-3.3 passed  |
   | -1 :x: |  spotbugs  |   2m 40s | 
[/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6891/2/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html)
 |  hadoop-hdfs-project/hadoop-hdfs-client in branch-3.3 has 2 extant spotbugs 
warnings.  |
   | +1 :green_heart: |  shadedclient  |  41m 42s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 32s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m  8s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   3m 49s |  |  the patch passed  |
   | -1 :x: |  javac  |   3m 49s | 
[/results-compile-javac-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6891/2/artifact/out/results-compile-javac-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project generated 1 new + 608 unchanged - 1 fixed = 609 total 
(was 609)  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 57s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m  8s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 56s |  |  the patch passed  |
   | +1 :green_heart: |  spotbugs  |   6m  0s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  41m 54s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 16s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 242m 21s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6891/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 51s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 413m 58s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.protocol.TestBlockListAsLongs |
   |   | hadoop.hdfs.server.namenode.TestNameNodeMXBean |
   |   | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy |
   |   | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6891/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6891 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux 6bdbb87477ba 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | branch-3.3 / 6ecc20ecef50726ebd71b0ade7b059bdd2b7f613 |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~18.04-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6891/2/testReport/ |
   | Max. process+thread count | 1986 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project

[jira] [Commented] (HDFS-17545) [ARR] router async rpc client.

2024-06-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17855822#comment-17855822
 ] 

ASF GitHub Bot commented on HDFS-17545:
---

KeeProMise commented on code in PR #6871:
URL: https://github.com/apache/hadoop/pull/6871#discussion_r1643993096


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java:
##
@@ -428,6 +456,25 @@ public RouterRpcServer(Configuration conf, Router router,
 initRouterFedRename();
   }
 
+  private void initAsyncThreadPool() {
+int asyncHandlerCount = conf.getInt(DFS_ROUTER_RPC_ASYNC_HANDLER_COUNT,
+DFS_ROUTER_RPC_ASYNC_HANDLER_COUNT_DEFAULT);
+int asyncResponderCount = conf.getInt(DFS_ROUTER_RPC_ASYNC_RESPONDER_COUNT,
+DFS_ROUTER_RPC_ASYNC_RESPONDER_COUNT_DEFAULT);
+synchronized (RouterRpcServer.class) {
+  if (asyncRouterHandler == null) {
+LOG.info("init router async handler count: {}", asyncHandlerCount);
+asyncRouterHandler = Executors.newFixedThreadPool(
+asyncHandlerCount, new AsyncThreadFactory("router async handler 
"));
+  }
+  if (asyncRouterResponder == null) {
+LOG.info("init router async responder count: {}", asyncResponderCount);
+asyncRouterResponder = Executors.newFixedThreadPool(
+asyncHandlerCount, new AsyncThreadFactory("router async responder 
"));

Review Comment:
   @hfutatzhanghb Yes, I will change this to asyncResponderCount later. Thank 
you for your careful review.





> [ARR] router async rpc client.
> --
>
> Key: HDFS-17545
> URL: https://issues.apache.org/jira/browse/HDFS-17545
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
>  Labels: pull-request-available
>
> *Describe*
> 1. Mainly using AsyncUtil to implement {*}RouterAsyncRpcClient{*}, this class 
> extends RouterRpcClient, enabling the {*}invoiceAll{*}, {*}invoiceMethod{*}, 
> {*}invoiceSequential{*}, {*}invoiceConcurrent{*}, and *invoiceSingle* methods 
> to support asynchrony.
> 2. Use two thread pools, *asyncRouterHandler* and {*}asyncRouterResponder{*}, 
> to handle asynchronous requests and responses, respectively.
> 3. Added {*}DFS_ROUTER_RPC_ENABLE_ASYNC{*}, 
> {*}DFS_ROUTER_RPC_ASYNC_HANDLER_COUNT{*}, 
> *DFS_ROUTER_RPC_ASYNC_RESPONDER_COUNT_DEFAULT* to configure whether to use 
> async router, as well as the number of asyncRouterHandlers and 
> asyncRouterResponders.
> 4. Using *ThreadLocalContext* to maintain thread local variables, ensuring 
> that thread local variables can be correctly passed between handler, 
> asyncRouterHandler, and asyncRouterResponder.
>  
> *Test*
> Currently, I let handler wait synchronously for the response of async 
> resonder to test the function of routeraysncRpcClient.
> Note: For discussions on *AsyncUtil* and client {*}protocolPB{*}, please 
> refer to HDFS-17543 and HDFS-17544.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17545) [ARR] router async rpc client.

2024-06-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17855817#comment-17855817
 ] 

ASF GitHub Bot commented on HDFS-17545:
---

hfutatzhanghb commented on code in PR #6871:
URL: https://github.com/apache/hadoop/pull/6871#discussion_r1643984134


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java:
##
@@ -428,6 +456,25 @@ public RouterRpcServer(Configuration conf, Router router,
 initRouterFedRename();
   }
 
+  private void initAsyncThreadPool() {
+int asyncHandlerCount = conf.getInt(DFS_ROUTER_RPC_ASYNC_HANDLER_COUNT,
+DFS_ROUTER_RPC_ASYNC_HANDLER_COUNT_DEFAULT);
+int asyncResponderCount = conf.getInt(DFS_ROUTER_RPC_ASYNC_RESPONDER_COUNT,
+DFS_ROUTER_RPC_ASYNC_RESPONDER_COUNT_DEFAULT);
+synchronized (RouterRpcServer.class) {
+  if (asyncRouterHandler == null) {
+LOG.info("init router async handler count: {}", asyncHandlerCount);
+asyncRouterHandler = Executors.newFixedThreadPool(
+asyncHandlerCount, new AsyncThreadFactory("router async handler 
"));
+  }
+  if (asyncRouterResponder == null) {
+LOG.info("init router async responder count: {}", asyncResponderCount);
+asyncRouterResponder = Executors.newFixedThreadPool(
+asyncHandlerCount, new AsyncThreadFactory("router async responder 
"));

Review Comment:
   @KeeProMise Hi, sir. here should be asyncResponderCount right?





> [ARR] router async rpc client.
> --
>
> Key: HDFS-17545
> URL: https://issues.apache.org/jira/browse/HDFS-17545
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
>  Labels: pull-request-available
>
> *Describe*
> 1. Mainly using AsyncUtil to implement {*}RouterAsyncRpcClient{*}, this class 
> extends RouterRpcClient, enabling the {*}invoiceAll{*}, {*}invoiceMethod{*}, 
> {*}invoiceSequential{*}, {*}invoiceConcurrent{*}, and *invoiceSingle* methods 
> to support asynchrony.
> 2. Use two thread pools, *asyncRouterHandler* and {*}asyncRouterResponder{*}, 
> to handle asynchronous requests and responses, respectively.
> 3. Added {*}DFS_ROUTER_RPC_ENABLE_ASYNC{*}, 
> {*}DFS_ROUTER_RPC_ASYNC_HANDLER_COUNT{*}, 
> *DFS_ROUTER_RPC_ASYNC_RESPONDER_COUNT_DEFAULT* to configure whether to use 
> async router, as well as the number of asyncRouterHandlers and 
> asyncRouterResponders.
> 4. Using *ThreadLocalContext* to maintain thread local variables, ensuring 
> that thread local variables can be correctly passed between handler, 
> asyncRouterHandler, and asyncRouterResponder.
>  
> *Test*
> Currently, I let handler wait synchronously for the response of async 
> resonder to test the function of routeraysncRpcClient.
> Note: For discussions on *AsyncUtil* and client {*}protocolPB{*}, please 
> refer to HDFS-17543 and HDFS-17544.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-17554) OIV: Print the storage policy name in OIV delimited output

[jira] [Resolved] (HDFS-17528) FsImageValidation: set txid when saving a new image

[jira] [Commented] (HDFS-17546) Implementing Timeout for HostFileReader when FS hangs

[jira] [Commented] (HDFS-17528) FsImageValidation: set txid when saving a new image

[jira] [Created] (HDFS-17554) OIV: Print the storage policy name in OIV delimited output

[jira] [Updated] (HDFS-17553) DFSOutputStream.java#closeImpl should have configurable retries upon flushInternal failures

[jira] [Updated] (HDFS-17553) DFSOutputStream.java#closeImpl should configurable retries upon flushInternal failures

[jira] [Updated] (HDFS-17553) DFSOutputStream.java#closeImpl should have a retry upon flushInternal failures

[jira] [Updated] (HDFS-17553) DFSOutputStream.java#closeImpl should have a retry upon flushInternal failures

[jira] [Updated] (HDFS-17553) DFSOutputStream.java#closeImpl should have a retry upon flushInternal failures

[jira] [Updated] (HDFS-17553) DFSOutputStream.java#closeImpl should have a retry upon flushInternal failures

[jira] [Created] (HDFS-17553) DFSOutputStream.java#closeImpl should have a retry upon flushInternal failures

[jira] [Resolved] (HDFS-17439) Improve NNThroughputBenchmark to allow non super user to use the tool

[jira] [Commented] (HDFS-17546) Implementing Timeout for HostFileReader when FS hangs

[jira] [Commented] (HDFS-17545) [ARR] router async rpc client.

[jira] [Commented] (HDFS-17545) [ARR] router async rpc client.

16 matches

Site Navigation

Mail list logo

Footer information