[jira] [Updated] (YARN-10854) Support marking inactive node as untracked without configured include path

2021-08-01 Thread Tao Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-10854:

Attachment: YARN-10854.005.patch

> Support marking inactive node as untracked without configured include path
> --
>
> Key: YARN-10854
> URL: https://issues.apache.org/jira/browse/YARN-10854
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-10854.001.patch, YARN-10854.002.patch, 
> YARN-10854.003.patch, YARN-10854.004.patch, YARN-10854.005.patch
>
>
> Currently inactive nodes which have been decommissioned/shutdown/lost for a 
> while(specified expiration time defined via 
> {{yarn.resourcemanager.node-removal-untracked.timeout-ms}}, 60 seconds by 
> default) and not exist in both include and exclude files can be marked as 
> untracked nodes and can be removed from RM state (YARN-4311). It's very 
> useful when auto-scaling is enabled in elastic cloud environment, which can 
> avoid unlimited increase of inactive nodes (mostly are decommissioned nodes).
> But this only works when the include path is configured, mismatched for most 
> of our cloud environments without configured white list of nodes, which can 
> lead to easily control for the auto-scaling of nodes without further security 
> requirements.
> So I propose to support marking inactive node as untracked without configured 
> include path, to be compatible with the former versions, we can add a switch 
> config for this.
> Any thoughts/suggestions/feedbacks are welcome!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10854) Support marking inactive node as untracked without configured include path

2021-08-01 Thread Tao Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391256#comment-17391256
 ] 

Tao Yang commented on YARN-10854:
-

Thanks [~zhuqi] for the review. 
Attached v5 patch to replace illegal import class 
'com.google.common.collect.Sets' with 'org.apache.hadoop.util.Sets'.

> Support marking inactive node as untracked without configured include path
> --
>
> Key: YARN-10854
> URL: https://issues.apache.org/jira/browse/YARN-10854
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-10854.001.patch, YARN-10854.002.patch, 
> YARN-10854.003.patch, YARN-10854.004.patch, YARN-10854.005.patch
>
>
> Currently inactive nodes which have been decommissioned/shutdown/lost for a 
> while(specified expiration time defined via 
> {{yarn.resourcemanager.node-removal-untracked.timeout-ms}}, 60 seconds by 
> default) and not exist in both include and exclude files can be marked as 
> untracked nodes and can be removed from RM state (YARN-4311). It's very 
> useful when auto-scaling is enabled in elastic cloud environment, which can 
> avoid unlimited increase of inactive nodes (mostly are decommissioned nodes).
> But this only works when the include path is configured, mismatched for most 
> of our cloud environments without configured white list of nodes, which can 
> lead to easily control for the auto-scaling of nodes without further security 
> requirements.
> So I propose to support marking inactive node as untracked without configured 
> include path, to be compatible with the former versions, we can add a switch 
> config for this.
> Any thoughts/suggestions/feedbacks are welcome!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10875) CLI queue usage command only reflects default partition usage

2021-08-01 Thread Rajshree Mishra (Jira)
Rajshree Mishra created YARN-10875:
--

 Summary: CLI queue usage command only reflects default partition 
usage
 Key: YARN-10875
 URL: https://issues.apache.org/jira/browse/YARN-10875
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Rajshree Mishra
 Attachments: queueA_scheduler.png, queueA_usage.png

Test step:
 # Hadoop cluster with nodelabels -> default, label1
 # Job is submitted to queueA using resources of accessible nodelabel label1 
 # Check queue usage for queueA using CLI command "yarn queue -status queueA"

Output: Current capacity is displayed as 00%

Expected: queueA is being utilized under label1 resource pool, and status 
command should reflect the same. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10876) CLI queue usage should indicate absolute usage

2021-08-01 Thread Rajshree Mishra (Jira)
Rajshree Mishra created YARN-10876:
--

 Summary: CLI queue usage should indicate absolute usage
 Key: YARN-10876
 URL: https://issues.apache.org/jira/browse/YARN-10876
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Rajshree Mishra
 Attachments: schedulerUsage.png, usageCLI.png

For large cluster with multiple users, WebUI proves to be very slow. 

Users use the CLI to check the usage information, however the output displays 
percentages above 100. 

Users wants to know the available resources to judge if more jobs can be 
submitted and these percentages don't give a clear picture about this 
information.

CLI output should be made more user friendly to provide information about used 
and available resources in a queue, as user may not know total resource of a 
large cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8992) Fair scheduler can delete a dynamic queue while an application attempt is being added to the queue

2021-08-01 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-8992:

Fix Version/s: (was: 3.2.1)
   3.3.0

> Fair scheduler can delete a dynamic queue while an application attempt is 
> being added to the queue
> --
>
> Key: YARN-8992
> URL: https://issues.apache.org/jira/browse/YARN-8992
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-8992.001.patch, YARN-8992.002.patch
>
>
> As discovered in YARN-8990, QueueManager can see a leaf queue being empty 
> while FSLeafQueue.addApp() is called in the middle of  
> {code:java}
> return queue.getNumRunnableApps() == 0 &&
>   leafQueue.getNumNonRunnableApps() == 0 &&
>   leafQueue.getNumAssignedApps() == 0;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8990) Fix fair scheduler race condition in app submit and queue cleanup

2021-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-8990:
-
Labels: pull-request-available  (was: )

> Fix fair scheduler race condition in app submit and queue cleanup
> -
>
> Key: YARN-8990
> URL: https://issues.apache.org/jira/browse/YARN-8990
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.2.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.2.0, 3.3.0
>
> Attachments: YARN-8990.001.patch, YARN-8990.002.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> With the introduction of the dynamic queue deletion in YARN-8191 a race 
> condition was introduced that can cause a queue to be removed while an 
> application submit is in progress.
> The issue occurs in {{FairScheduler.addApplication()}} when an application is 
> submitted to a dynamic queue which is empty or the queue does not exist yet. 
> If during the processing of the application submit the 
> {{AllocationFileLoaderService}} kicks of for an update the queue clean up 
> will be run first. The application submit first creates the queue and get a 
> reference back to the queue. 
> Other checks are performed and as the last action before getting ready to 
> generate an AppAttempt the queue is updated to show the submitted application 
> ID..
> The time between the queue creation and the queue update to show the submit 
> is long enough for the queue to be removed. The application however is lost 
> and will never get any resources assigned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8990) Fix fair scheduler race condition in app submit and queue cleanup

2021-08-01 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391388#comment-17391388
 ] 

Akira Ajisaka commented on YARN-8990:
-

Backport PR opened: https://github.com/apache/hadoop/pull/3254

> Fix fair scheduler race condition in app submit and queue cleanup
> -
>
> Key: YARN-8990
> URL: https://issues.apache.org/jira/browse/YARN-8990
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.2.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.2.0, 3.3.0
>
> Attachments: YARN-8990.001.patch, YARN-8990.002.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> With the introduction of the dynamic queue deletion in YARN-8191 a race 
> condition was introduced that can cause a queue to be removed while an 
> application submit is in progress.
> The issue occurs in {{FairScheduler.addApplication()}} when an application is 
> submitted to a dynamic queue which is empty or the queue does not exist yet. 
> If during the processing of the application submit the 
> {{AllocationFileLoaderService}} kicks of for an update the queue clean up 
> will be run first. The application submit first creates the queue and get a 
> reference back to the queue. 
> Other checks are performed and as the last action before getting ready to 
> generate an AppAttempt the queue is updated to show the submitted application 
> ID..
> The time between the queue creation and the queue update to show the submit 
> is long enough for the queue to be removed. The application however is lost 
> and will never get any resources assigned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8990) Fix fair scheduler race condition in app submit and queue cleanup

2021-08-01 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-8990:

   Fix Version/s: (was: 3.2.0)
Target Version/s: 3.2.3
  Labels: pull-request-available release-blocker  (was: 
pull-request-available)

> Fix fair scheduler race condition in app submit and queue cleanup
> -
>
> Key: YARN-8990
> URL: https://issues.apache.org/jira/browse/YARN-8990
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.2.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Blocker
>  Labels: pull-request-available, release-blocker
> Fix For: 3.3.0
>
> Attachments: YARN-8990.001.patch, YARN-8990.002.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> With the introduction of the dynamic queue deletion in YARN-8191 a race 
> condition was introduced that can cause a queue to be removed while an 
> application submit is in progress.
> The issue occurs in {{FairScheduler.addApplication()}} when an application is 
> submitted to a dynamic queue which is empty or the queue does not exist yet. 
> If during the processing of the application submit the 
> {{AllocationFileLoaderService}} kicks of for an update the queue clean up 
> will be run first. The application submit first creates the queue and get a 
> reference back to the queue. 
> Other checks are performed and as the last action before getting ready to 
> generate an AppAttempt the queue is updated to show the submitted application 
> ID..
> The time between the queue creation and the queue update to show the submit 
> is long enough for the queue to be removed. The application however is lost 
> and will never get any resources assigned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8990) Fix fair scheduler race condition in app submit and queue cleanup

2021-08-01 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-8990:

Fix Version/s: 3.2.0

> Fix fair scheduler race condition in app submit and queue cleanup
> -
>
> Key: YARN-8990
> URL: https://issues.apache.org/jira/browse/YARN-8990
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.2.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Blocker
>  Labels: pull-request-available, release-blocker
> Fix For: 3.2.0, 3.3.0
>
> Attachments: YARN-8990.001.patch, YARN-8990.002.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> With the introduction of the dynamic queue deletion in YARN-8191 a race 
> condition was introduced that can cause a queue to be removed while an 
> application submit is in progress.
> The issue occurs in {{FairScheduler.addApplication()}} when an application is 
> submitted to a dynamic queue which is empty or the queue does not exist yet. 
> If during the processing of the application submit the 
> {{AllocationFileLoaderService}} kicks of for an update the queue clean up 
> will be run first. The application submit first creates the queue and get a 
> reference back to the queue. 
> Other checks are performed and as the last action before getting ready to 
> generate an AppAttempt the queue is updated to show the submitted application 
> ID..
> The time between the queue creation and the queue update to show the submit 
> is long enough for the queue to be removed. The application however is lost 
> and will never get any resources assigned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-8990) Fix fair scheduler race condition in app submit and queue cleanup

2021-08-01 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka reopened YARN-8990:
-

> Fix fair scheduler race condition in app submit and queue cleanup
> -
>
> Key: YARN-8990
> URL: https://issues.apache.org/jira/browse/YARN-8990
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.2.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Blocker
>  Labels: pull-request-available, release-blocker
> Fix For: 3.3.0
>
> Attachments: YARN-8990.001.patch, YARN-8990.002.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> With the introduction of the dynamic queue deletion in YARN-8191 a race 
> condition was introduced that can cause a queue to be removed while an 
> application submit is in progress.
> The issue occurs in {{FairScheduler.addApplication()}} when an application is 
> submitted to a dynamic queue which is empty or the queue does not exist yet. 
> If during the processing of the application submit the 
> {{AllocationFileLoaderService}} kicks of for an update the queue clean up 
> will be run first. The application submit first creates the queue and get a 
> reference back to the queue. 
> Other checks are performed and as the last action before getting ready to 
> generate an AppAttempt the queue is updated to show the submitted application 
> ID..
> The time between the queue creation and the queue update to show the submit 
> is long enough for the queue to be removed. The application however is lost 
> and will never get any resources assigned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org