[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2022-01-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=715049=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-715049
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 25/Jan/22 17:09
Start Date: 25/Jan/22 17:09
Worklog Time Spent: 10m 
  Work Description: sodonnel merged pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 715049)
Time Spent: 5h 40m  (was: 5.5h)

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2022-01-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=714868=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-714868
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 25/Jan/22 16:53
Start Date: 25/Jan/22 16:53
Worklog Time Spent: 10m 
  Work Description: bbeaudreault commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-1021196335


   Thank you @sodonnel!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 714868)
Time Spent: 5.5h  (was: 5h 20m)

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2022-01-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=714724=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-714724
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 25/Jan/22 16:42
Start Date: 25/Jan/22 16:42
Worklog Time Spent: 10m 
  Work Description: sodonnel commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-1021100619


   @bbeaudreault The CI looks good - that test often fails, so I don't believe 
it is related to this.
   
   I gave the changes another check and I don't see any issues, plus you have 
been running this change for a long time in production without issues, which 
further validates it.
   
   Thanks for your patience and sorry it took me so long to get back to this!
   
   +1 from me, I will commit it shortly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 714724)
Time Spent: 5h 20m  (was: 5h 10m)

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2022-01-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=714471=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-714471
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 25/Jan/22 13:44
Start Date: 25/Jan/22 13:44
Worklog Time Spent: 10m 
  Work Description: bbeaudreault commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-1021196335


   Thank you @sodonnel!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 714471)
Time Spent: 5h 10m  (was: 5h)

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2022-01-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=714413=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-714413
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 25/Jan/22 11:42
Start Date: 25/Jan/22 11:42
Worklog Time Spent: 10m 
  Work Description: sodonnel merged pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 714413)
Time Spent: 5h  (was: 4h 50m)

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2022-01-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=714412=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-714412
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 25/Jan/22 11:40
Start Date: 25/Jan/22 11:40
Worklog Time Spent: 10m 
  Work Description: sodonnel commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-1021100619


   @bbeaudreault The CI looks good - that test often fails, so I don't believe 
it is related to this.
   
   I gave the changes another check and I don't see any issues, plus you have 
been running this change for a long time in production without issues, which 
further validates it.
   
   Thanks for your patience and sorry it took me so long to get back to this!
   
   +1 from me, I will commit it shortly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 714412)
Time Spent: 4h 50m  (was: 4h 40m)

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2022-01-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=709629=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-709629
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 16/Jan/22 15:06
Start Date: 16/Jan/22 15:06
Worklog Time Spent: 10m 
  Work Description: bbeaudreault commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-1013892555


   It seems like TestRollingUpgrade has some larger issues. I was curious about 
this so I looked up recent merged PRs: 
https://github.com/apache/hadoop/pulls?q=is%3Apr+is%3Aclosed+TestRollingUpgrade.
 Looks like a bunch of recently merged PRs had failure sin TestRollingUpgrade.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 709629)
Time Spent: 4h 40m  (was: 4.5h)

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2022-01-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=709625=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-709625
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 16/Jan/22 14:48
Start Date: 16/Jan/22 14:48
Worklog Time Spent: 10m 
  Work Description: bbeaudreault commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-1013889398


   @sodonnel So there is one failing test, TestRollingUpgrade. I've now run the 
test locally 3 times and it always succeeds. I don't really have a lot of 
context on this test, but I hope it won't block this PR. The changes here 
should be unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 709625)
Time Spent: 4.5h  (was: 4h 20m)

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2022-01-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=709592=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-709592
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 16/Jan/22 06:33
Start Date: 16/Jan/22 06:33
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-1013819948


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 38s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  13m 21s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  21m 48s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   5m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   5m 18s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 13s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 25s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 46s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   2m 10s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   5m 37s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 16s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 27s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m  6s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 29s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   5m 29s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 10s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   5m 10s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  1s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  3s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 10s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   1m 27s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 56s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   5m 46s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m  4s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 22s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 227m 23s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/16/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 47s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 358m 32s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestRollingUpgrade |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/16/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3527 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell xml |
   | uname | Linux 6be1db3767ad 4.15.0-161-generic #169-Ubuntu SMP Fri Oct 15 
13:41:54 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 52f925d800ae2ad7fb170954d174e78e417b6767 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 

[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2022-01-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=707956=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707956
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 12/Jan/22 23:33
Start Date: 12/Jan/22 23:33
Worklog Time Spent: 10m 
  Work Description: bbeaudreault commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-1011548807


   Hey @sodonnel , the build has finished. Unfortunately there was 1 failure,  
yet I don't think it's related. It's a timeout in RollingUpgradeTest. Otherwise 
the checkstyle issues are cleaned up as are all the other failures. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 707956)
Time Spent: 4h 10m  (was: 4h)

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2022-01-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=707929=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707929
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 12/Jan/22 22:27
Start Date: 12/Jan/22 22:27
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-1011507918


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 39s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  13m 28s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  25m 21s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   6m 16s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   5m 47s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 22s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 30s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 55s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   2m 27s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   6m 49s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m 56s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 28s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 23s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   6m 21s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   6m 21s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   5m 36s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  4s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 14s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   1m 31s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 59s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   6m 32s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m 31s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 30s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 232m  0s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/15/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 43s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 379m 44s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestRollingUpgrade |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/15/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3527 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell xml |
   | uname | Linux a1c99da05f39 4.15.0-161-generic #169-Ubuntu SMP Fri Oct 15 
13:41:54 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 54096317ed298d57ba490bd1ddfc693c331546ba |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 

[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2022-01-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=707603=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707603
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 12/Jan/22 16:07
Start Date: 12/Jan/22 16:07
Worklog Time Spent: 10m 
  Work Description: bbeaudreault commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-1011205367


   I just pushed a rebase of this branch on latest trunk, with checkstyle 
issues fixed. I'll keep an eye on the build and ping you once it looks good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 707603)
Time Spent: 3h 50m  (was: 3h 40m)

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2022-01-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=707594=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707594
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 12/Jan/22 15:51
Start Date: 12/Jan/22 15:51
Worklog Time Spent: 10m 
  Work Description: bbeaudreault commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-1011185700


   Sorry! This had been succeeding last time i looked at it, I should have 
verified that it was still succeeding before pinging you. I'll take a look at 
the failures shortly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 707594)
Time Spent: 3h 40m  (was: 3.5h)

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2022-01-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=707590=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707590
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 12/Jan/22 15:48
Start Date: 12/Jan/22 15:48
Worklog Time Spent: 10m 
  Work Description: sodonnel commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-1011182118


   The last CI runs flagged some checkstyle issues. Could you check those 
please and fix them? A few others parts of that run failed too, but it may have 
been some wider issue. Fixing the checkstyle will trigger a new run and we can 
see how it looks then.
   
   The patch looks mostly good to me, but there is a lot of change in 
DFSInputStream, so I will need to take some more time to go over that part of 
the PR again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 707590)
Time Spent: 3.5h  (was: 3h 20m)

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2021-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=677998=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-677998
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 06/Nov/21 02:17
Start Date: 06/Nov/21 02:17
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-962372657


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 54s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  12m 38s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  21m 22s |  |  trunk passed  |
   | -1 :x: |  compile  |   2m  4s | 
[/branch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/14/artifact/out/branch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt)
 |  hadoop-hdfs-project in trunk failed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.  |
   | -1 :x: |  compile  |   1m 47s | 
[/branch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/14/artifact/out/branch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  hadoop-hdfs-project in trunk failed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.  |
   | +1 :green_heart: |  checkstyle  |   1m 12s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 19s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 38s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   2m  7s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 33s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 28s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 30s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 59s |  |  the patch passed  |
   | -1 :x: |  compile  |   1m 51s | 
[/patch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/14/artifact/out/patch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt)
 |  hadoop-hdfs-project in the patch failed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.  |
   | -1 :x: |  javac  |   1m 51s | 
[/patch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/14/artifact/out/patch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt)
 |  hadoop-hdfs-project in the patch failed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.  |
   | -1 :x: |  compile  |   1m 40s | 
[/patch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/14/artifact/out/patch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  hadoop-hdfs-project in the patch failed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.  |
   | -1 :x: |  javac  |   1m 40s | 
[/patch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/14/artifact/out/patch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  hadoop-hdfs-project in the patch failed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  5s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/14/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 24 new + 105 unchanged - 0 fixed = 
129 total (was 105)  |
   | +1 :green_heart: |  mvnsite  |   2m  3s |  |  the patch passed  |

[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2021-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=677673=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-677673
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 05/Nov/21 20:23
Start Date: 05/Nov/21 20:23
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-961496237


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 41s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  29m 53s |  |  Maven dependency ordering for branch  |
   | -1 :x: |  mvninstall  |  21m 18s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | -1 :x: |  compile  |   2m  2s | 
[/branch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/branch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt)
 |  hadoop-hdfs-project in trunk failed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.  |
   | -1 :x: |  compile  |   1m 47s | 
[/branch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/branch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  hadoop-hdfs-project in trunk failed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.  |
   | +1 :green_heart: |  checkstyle  |   1m 12s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 20s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 38s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   2m  8s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 31s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 39s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 27s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 58s |  |  the patch passed  |
   | -1 :x: |  compile  |   1m 51s | 
[/patch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/patch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt)
 |  hadoop-hdfs-project in the patch failed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.  |
   | -1 :x: |  javac  |   1m 51s | 
[/patch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/patch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt)
 |  hadoop-hdfs-project in the patch failed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.  |
   | -1 :x: |  compile  |   1m 39s | 
[/patch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/patch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  hadoop-hdfs-project in the patch failed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.  |
   | -1 :x: |  javac  |   1m 39s | 
[/patch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/patch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  hadoop-hdfs-project in the patch failed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  5s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch 

[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2021-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=677280=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-677280
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 05/Nov/21 19:32
Start Date: 05/Nov/21 19:32
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-961496237


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 41s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  29m 53s |  |  Maven dependency ordering for branch  |
   | -1 :x: |  mvninstall  |  21m 18s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | -1 :x: |  compile  |   2m  2s | 
[/branch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/branch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt)
 |  hadoop-hdfs-project in trunk failed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.  |
   | -1 :x: |  compile  |   1m 47s | 
[/branch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/branch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  hadoop-hdfs-project in trunk failed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.  |
   | +1 :green_heart: |  checkstyle  |   1m 12s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 20s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 38s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   2m  8s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 31s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 39s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 27s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 58s |  |  the patch passed  |
   | -1 :x: |  compile  |   1m 51s | 
[/patch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/patch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt)
 |  hadoop-hdfs-project in the patch failed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.  |
   | -1 :x: |  javac  |   1m 51s | 
[/patch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/patch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt)
 |  hadoop-hdfs-project in the patch failed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.  |
   | -1 :x: |  compile  |   1m 39s | 
[/patch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/patch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  hadoop-hdfs-project in the patch failed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.  |
   | -1 :x: |  javac  |   1m 39s | 
[/patch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/patch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  hadoop-hdfs-project in the patch failed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  5s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch 

[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2021-11-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=676778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-676778
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 04/Nov/21 23:02
Start Date: 04/Nov/21 23:02
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-961496237


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 41s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  29m 53s |  |  Maven dependency ordering for branch  |
   | -1 :x: |  mvninstall  |  21m 18s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | -1 :x: |  compile  |   2m  2s | 
[/branch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/branch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt)
 |  hadoop-hdfs-project in trunk failed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.  |
   | -1 :x: |  compile  |   1m 47s | 
[/branch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/branch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  hadoop-hdfs-project in trunk failed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.  |
   | +1 :green_heart: |  checkstyle  |   1m 12s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 20s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 38s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   2m  8s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 31s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 39s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 27s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 58s |  |  the patch passed  |
   | -1 :x: |  compile  |   1m 51s | 
[/patch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/patch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt)
 |  hadoop-hdfs-project in the patch failed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.  |
   | -1 :x: |  javac  |   1m 51s | 
[/patch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/patch-compile-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt)
 |  hadoop-hdfs-project in the patch failed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.  |
   | -1 :x: |  compile  |   1m 39s | 
[/patch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/patch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  hadoop-hdfs-project in the patch failed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.  |
   | -1 :x: |  javac  |   1m 39s | 
[/patch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/patch-compile-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  hadoop-hdfs-project in the patch failed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  5s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/13/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch 

[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2021-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=671164=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-671164
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 28/Oct/21 02:13
Start Date: 28/Oct/21 02:13
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-953443821


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 38s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  12m 54s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  21m 17s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   5m 11s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   4m 48s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 11s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 20s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   2m  9s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 33s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 32s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 28s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 59s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m  4s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   5m  4s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   4m 39s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   4m 39s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  5s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/12/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 24 new + 105 unchanged - 0 fixed = 
129 total (was 105)  |
   | +1 :green_heart: |  mvnsite  |   2m  4s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   1m 21s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 50s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 38s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 58s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 21s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  unit  | 224m 43s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 46s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 350m 46s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/12/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3527 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell xml |
   | uname | Linux 1246b90a28e4 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 
23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 9604ba952c1dda4738030b0b8bc2e814e64e6980 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 

[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2021-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=670298=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-670298
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 26/Oct/21 18:46
Start Date: 26/Oct/21 18:46
Worklog Time Spent: 10m 
  Work Description: bbeaudreault commented on a change in pull request 
#3527:
URL: https://github.com/apache/hadoop/pull/3527#discussion_r736825878



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
##
@@ -228,53 +222,57 @@ boolean deadNodesContain(DatanodeInfo nodeInfo) {
 return deadNodes.containsKey(nodeInfo);
   }
 
-  @VisibleForTesting
-  void setReadTimeStampsForTesting(long timeStamp) {
-setLocatedBlocksTimeStamp(timeStamp);
-  }
-
-  private void setLocatedBlocksTimeStamp() {
-setLocatedBlocksTimeStamp(Time.monotonicNow());
-  }
-
-  private void setLocatedBlocksTimeStamp(long timeStamp) {
-this.locatedBlocksTimeStamp = timeStamp;
-  }
-
   /**
* Grab the open-file info from namenode
* @param refreshLocatedBlocks whether to re-fetch locatedblocks
*/
   void openInfo(boolean refreshLocatedBlocks) throws IOException {
 final DfsClientConf conf = dfsClient.getConf();
 synchronized(infoLock) {
-  lastBlockBeingWrittenLength =
-  fetchLocatedBlocksAndGetLastBlockLength(refreshLocatedBlocks);
   int retriesForLastBlockLength = 
conf.getRetryTimesForGetLastBlockLength();
-  while (retriesForLastBlockLength > 0) {
+
+  while (true) {

Review comment:
   This `openInfo` rewrite is functionally identical to the previous 
implementation, but is cleaner and makes it possible to share code between this 
method and the new `refreshBlockLocations` method below. I didn't want to 
simply re-use this method for the background refresh, because it does multiple 
RPC calls within the infoLock. 
   
   The background `refreshBlockLocations` is meant to be minimally impactful to 
the hot path of reads, so it does these RPC calls outside of any locks and then 
quickly swaps them in with the lock.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 670298)
Time Spent: 2.5h  (was: 2h 20m)

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all 

[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2021-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=669916=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669916
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 26/Oct/21 06:37
Start Date: 26/Oct/21 06:37
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-951607494


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 36s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  13m 10s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  21m 26s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   5m  7s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   4m 51s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 14s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 22s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 39s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   2m  7s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 33s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 36s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 27s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m  2s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m  1s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   5m  1s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   4m 45s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   4m 45s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  5s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/10/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 20 new + 105 unchanged - 0 fixed = 
125 total (was 105)  |
   | +1 :green_heart: |  mvnsite  |   2m  7s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   1m 22s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 53s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 40s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m 14s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 21s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  unit  | 222m 16s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 46s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 348m 58s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/10/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3527 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell xml |
   | uname | Linux 90d8a338ecc6 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 
23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 2ceae4ca2a14e5a01b8f0236770148ed1f4e2088 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 

[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2021-10-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=669264=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669264
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 24/Oct/21 05:36
Start Date: 24/Oct/21 05:36
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-950263662


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 44s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  12m 58s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  21m 17s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   5m 15s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   4m 51s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 14s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 21s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   2m  7s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 33s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 45s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 27s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 58s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m  0s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   5m  0s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   4m 40s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   4m 40s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  4s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/8/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 19 new + 105 unchanged - 0 fixed = 
124 total (was 105)  |
   | +1 :green_heart: |  mvnsite  |   2m  7s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   1m 20s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 53s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 40s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m 24s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 24s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  unit  | 224m 23s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 47s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 351m  8s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/8/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3527 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell xml |
   | uname | Linux 88a44b140e02 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 
23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / e6b877bd29afb4680d67226fd96611e1eb8f1633 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 

[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2021-10-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=669241=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669241
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 23/Oct/21 20:08
Start Date: 23/Oct/21 20:08
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-950206416


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 37s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  13m 37s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  21m 18s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   5m 11s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   4m 52s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 14s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 23s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   2m 10s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 32s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 32s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 27s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m  3s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m  4s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   5m  4s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   4m 45s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   4m 45s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  1s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  6s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/7/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 19 new + 105 unchanged - 0 fixed = 
124 total (was 105)  |
   | +1 :green_heart: |  mvnsite  |   2m  4s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   1m 23s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 56s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 40s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m 20s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 20s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 222m 20s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/7/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 47s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 349m 43s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.tools.TestHdfsConfigFields |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/7/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3527 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell xml |
   | uname | Linux 6f7dc2c4d149 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 
23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | 

[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2021-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=669142=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669142
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 23/Oct/21 02:44
Start Date: 23/Oct/21 02:44
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-950044744


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 40s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  12m 49s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  21m 16s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   5m 13s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   4m 55s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 16s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 24s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   2m  8s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 32s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 34s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 27s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m  2s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m  2s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   5m  2s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   4m 43s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   4m 43s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  5s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/6/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 19 new + 105 unchanged - 0 fixed = 
124 total (was 105)  |
   | +1 :green_heart: |  mvnsite  |   2m  5s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   1m 23s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 53s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 43s |  |  the patch passed  |
   | -1 :x: |  shadedclient  |  21m 19s |  |  patch has errors when building 
and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 21s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 258m 46s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 50s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 385m 32s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks |
   |   | hadoop.hdfs.TestPread |
   |   | hadoop.hdfs.TestEncryptedTransfer |
   |   | hadoop.hdfs.TestErasureCodingPolicyWithSnapshotWithRandomECPolicy |
   |   | hadoop.hdfs.TestDataTransferProtocol |
   |   | hadoop.hdfs.TestByteBufferPread |
   |   | hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs |
   |   | hadoop.hdfs.TestFileConcurrentReader |
   |   | hadoop.hdfs.server.datanode.TestDiskError |
   |   | hadoop.hdfs.TestErasureCodeBenchmarkThroughput |
   |   | hadoop.hdfs.TestAppendSnapshotTruncate |
   |   | hadoop.hdfs.TestFileAppendRestart |
 

[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2021-10-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=662456=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-662456
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 08/Oct/21 01:58
Start Date: 08/Oct/21 01:58
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-938277521


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 42s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  12m 54s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  20m 38s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   5m  5s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   4m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 14s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 21s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   2m  6s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 35s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 19s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 28s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m  2s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   4m 49s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   4m 49s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   4m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   4m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  7s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/5/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 18 new + 105 unchanged - 0 fixed = 
123 total (was 105)  |
   | +1 :green_heart: |  mvnsite  |   2m  5s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   1m 22s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 56s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 45s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 53s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 21s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  unit  | 226m  9s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 46s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 350m 53s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3527 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell xml |
   | uname | Linux 78e514d2a3be 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / ceba806587e37fac6b309c9e788eb6cb428f5f75 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 

[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2021-10-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=662448=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-662448
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 08/Oct/21 01:29
Start Date: 08/Oct/21 01:29
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-938268341


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 50s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  12m 57s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  21m 54s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   5m 26s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   5m  0s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 11s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 16s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 39s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   2m  5s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 38s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 24s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 27s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m  2s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 27s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   5m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m  0s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   5m  0s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  9s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/4/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 18 new + 105 unchanged - 0 fixed = 
123 total (was 105)  |
   | +1 :green_heart: |  mvnsite  |   2m 11s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   1m 26s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 55s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 47s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 10s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 22s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  unit  | 236m  0s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 41s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 365m 24s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3527 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell xml |
   | uname | Linux 0f16589d8394 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 
23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / f932816ed48cada018af35ff6bd859847f4a1a0d |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 

[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2021-10-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=662350=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-662350
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 07/Oct/21 23:37
Start Date: 07/Oct/21 23:37
Worklog Time Spent: 10m 
  Work Description: bbeaudreault opened a new pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527


   ### Description of PR
   
   Refactor refreshing of cached block locations so that it happens as part of 
an async process, with rate limiting. Add the ability to limit to only refresh 
DFSInputStreams if necessary. This defaults to false to preserve backwards 
compatibility with the old behavior from 
https://issues.apache.org/jira/browse/HDFS-15119
   
   See https://issues.apache.org/jira/browse/HDFS-16262
   
   ### How was this patch tested?
   
   I added a new test class TestLocatedBlocksRefresher. I am in the process of 
deploying this internally on one of our hadoop-3.3 clusters, will report back.
   
   ### For code changes:
   
   - [x] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 662350)
Time Spent: 1h 20m  (was: 1h 10m)

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2021-10-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=662156=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-662156
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 07/Oct/21 23:21
Start Date: 07/Oct/21 23:21
Worklog Time Spent: 10m 
  Work Description: bbeaudreault commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-938136446


   I've had this running in one of our test clusters, under load and with block 
moves occurring. I had it tuned to a short interval of 10s just to put it in an 
extreme condition. It works really well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 662156)
Time Spent: 1h 10m  (was: 1h)

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2021-10-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=661958=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-661958
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 07/Oct/21 23:04
Start Date: 07/Oct/21 23:04
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-937438221






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 661958)
Time Spent: 1h  (was: 50m)

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2021-10-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=661897=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-661897
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 07/Oct/21 20:35
Start Date: 07/Oct/21 20:35
Worklog Time Spent: 10m 
  Work Description: bbeaudreault commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-938136446


   I've had this running in one of our test clusters, under load and with block 
moves occurring. I had it tuned to a short interval of 10s just to put it in an 
extreme condition. It works really well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 661897)
Time Spent: 50m  (was: 40m)

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2021-10-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=661552=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-661552
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 07/Oct/21 11:41
Start Date: 07/Oct/21 11:41
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-937711713


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m  0s |  |  Docker mode activated.  |
   | -1 :x: |  patch  |   0m 19s |  |  
https://github.com/apache/hadoop/pull/3527 does not apply to trunk. Rebase 
required? Wrong Branch? See 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.  
|
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/3/console |
   | versions | git=2.17.1 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 661552)
Time Spent: 40m  (was: 0.5h)

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2021-10-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=661457=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-661457
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 07/Oct/21 09:12
Start Date: 07/Oct/21 09:12
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-937604480


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 59s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  12m 43s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  26m 19s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   5m 33s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   5m 10s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 16s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 30s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   2m 10s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   6m 20s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  24m 38s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 28s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m  3s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   4m 51s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | -1 :x: |  javac  |   4m 51s | 
[/results-compile-javac-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/2/artifact/out/results-compile-javac-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt)
 |  hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 1 new + 644 unchanged - 1 
fixed = 645 total (was 645)  |
   | +1 :green_heart: |  compile  |   4m 31s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | -1 :x: |  javac  |   4m 31s | 
[/results-compile-javac-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/2/artifact/out/results-compile-javac-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt)
 |  hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 
with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 generated 1 new + 
624 unchanged - 1 fixed = 625 total (was 625)  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  5s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/2/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 15 new + 105 unchanged - 0 fixed = 
120 total (was 105)  |
   | +1 :green_heart: |  mvnsite  |   2m  7s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 23s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 54s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 38s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 56s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 22s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 227m  0s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 39s |  |  

[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2021-10-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=661307=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-661307
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 07/Oct/21 04:35
Start Date: 07/Oct/21 04:35
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-937438221


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 43s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  12m 51s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  20m 57s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   4m 56s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   4m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 15s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 22s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   2m  6s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 30s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m  3s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 27s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m  0s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m  2s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   5m  2s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   4m 37s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   4m 37s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  9s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/1/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 15 new + 105 unchanged - 0 fixed = 
120 total (was 105)  |
   | +1 :green_heart: |  mvnsite  |   2m  4s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 23s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 53s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 39s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 59s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 22s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 231m 45s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 46s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 356m 40s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.tools.TestHdfsConfigFields |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3527 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 3b643776302b 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 
23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 

[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2021-10-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=661230=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-661230
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 06/Oct/21 22:37
Start Date: 06/Oct/21 22:37
Worklog Time Spent: 10m 
  Work Description: bbeaudreault opened a new pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527


   ### Description of PR
   
   Refactor refreshing of cached block locations so that it happens as part of 
an async process, with rate limiting. Add the ability to limit to only refresh 
DFSInputStreams if necessary. This defaults to false to preserve backwards 
compatibility with the old behavior from 
https://issues.apache.org/jira/browse/HDFS-15119
   
   See https://issues.apache.org/jira/browse/HDFS-16262
   
   ### How was this patch tested?
   
   I added a new test class TestLocatedBlocksRefresher. I am in the process of 
deploying this internally on one of our hadoop-3.3 clusters, will report back.
   
   ### For code changes:
   
   - [x] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 661230)
Remaining Estimate: 0h
Time Spent: 10m

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org