[jira] [Commented] (YARN-10504) Implement weight mode in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17261984#comment-17261984 ] Hadoop QA commented on YARN-10504: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 17s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 1s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 9 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 32s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 33s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 48s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 46s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 52s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 43s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/450/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 78 new + 758 unchanged - 13 fixed = 836 total (was 771) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 27s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {
[jira] [Commented] (YARN-10504) Implement weight mode in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17261962#comment-17261962 ] Hadoop QA commented on YARN-10504: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 15s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 1s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 9 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 13s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 30s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 48s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 46s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 44s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/449/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 78 new + 758 unchanged - 13 fixed = 836 total (was 771) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 28s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {
[jira] [Commented] (YARN-10504) Implement weight mode in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17261961#comment-17261961 ] Benjamin Teke commented on YARN-10504: -- [~wangda], [~zhuqi] Added a small new patch. ver.6 had a small issue in ParentQueue.getCapacityConfigurationTypeForQueues, where if the passed queues collection was empty the iterator where the mixed mode check was implemented threw a NoSuchElementException, causing all of the AutoCreatedTests to fail. Also setChildQueues has a similar mixed mode check, where the root queue is included in the check, causing an unnecessary exception. So I extended that if as well. Additionally I added [~zhuqi]'s suggestion to LeafQueue.updateClusterResource. There is one more issue: now 2 of the TestAbsoluteResourceWithAutoQueue tests are failing, because the apps are stuck in SUBMITTED state. [~wangda] do you have an idea on this one? > Implement weight mode in Capacity Scheduler > --- > > Key: YARN-10504 > URL: https://issues.apache.org/jira/browse/YARN-10504 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Attachments: YARN-10504.001.patch, YARN-10504.002.patch, > YARN-10504.003.patch, YARN-10504.004.patch, YARN-10504.005.patch, > YARN-10504.006.patch, YARN-10504.007.patch, YARN-10504.ver-1.patch, > YARN-10504.ver-2.patch, YARN-10504.ver-3.patch > > > To allow the possibility to flexibly create queues in Capacity Scheduler a > weight mode should be introduced. The existing \{{capacity }}property should > be used with a different syntax, i.e: > root.users.capacity = (1.0) or ~1.0 or ^1.0 or @1.0 > root.users.capacity = 1.0w > root.users.capacity = w:1.0 > Weight support should not impact the existing functionality. > > The new functionality should: > * accept and validate the new weight values > * enforce a singular mode on the whole queue tree > * (re)calculate the relative (percentage-based) capacities based on the > weights during launch and every time the queue structure changes -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10504) Implement weight mode in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Teke updated YARN-10504: - Attachment: YARN-10504.007.patch > Implement weight mode in Capacity Scheduler > --- > > Key: YARN-10504 > URL: https://issues.apache.org/jira/browse/YARN-10504 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Attachments: YARN-10504.001.patch, YARN-10504.002.patch, > YARN-10504.003.patch, YARN-10504.004.patch, YARN-10504.005.patch, > YARN-10504.006.patch, YARN-10504.007.patch, YARN-10504.ver-1.patch, > YARN-10504.ver-2.patch, YARN-10504.ver-3.patch > > > To allow the possibility to flexibly create queues in Capacity Scheduler a > weight mode should be introduced. The existing \{{capacity }}property should > be used with a different syntax, i.e: > root.users.capacity = (1.0) or ~1.0 or ^1.0 or @1.0 > root.users.capacity = 1.0w > root.users.capacity = w:1.0 > Weight support should not impact the existing functionality. > > The new functionality should: > * accept and validate the new weight values > * enforce a singular mode on the whole queue tree > * (re)calculate the relative (percentage-based) capacities based on the > weights during launch and every time the queue structure changes -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10504) Implement weight mode in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17261954#comment-17261954 ] Benjamin Teke commented on YARN-10504: -- [~wangda]/[~zhuqi]/[~gandras], Let me take care of the suggestions in [~zhuqi]'s comment above. I'll base it on the ver.6 patch. > Implement weight mode in Capacity Scheduler > --- > > Key: YARN-10504 > URL: https://issues.apache.org/jira/browse/YARN-10504 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Attachments: YARN-10504.001.patch, YARN-10504.002.patch, > YARN-10504.003.patch, YARN-10504.004.patch, YARN-10504.005.patch, > YARN-10504.006.patch, YARN-10504.ver-1.patch, YARN-10504.ver-2.patch, > YARN-10504.ver-3.patch > > > To allow the possibility to flexibly create queues in Capacity Scheduler a > weight mode should be introduced. The existing \{{capacity }}property should > be used with a different syntax, i.e: > root.users.capacity = (1.0) or ~1.0 or ^1.0 or @1.0 > root.users.capacity = 1.0w > root.users.capacity = w:1.0 > Weight support should not impact the existing functionality. > > The new functionality should: > * accept and validate the new weight values > * enforce a singular mode on the whole queue tree > * (re)calculate the relative (percentage-based) capacities based on the > weights during launch and every time the queue structure changes -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10504) Implement weight mode in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-10504: -- Attachment: YARN-10504.006.patch > Implement weight mode in Capacity Scheduler > --- > > Key: YARN-10504 > URL: https://issues.apache.org/jira/browse/YARN-10504 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Attachments: YARN-10504.001.patch, YARN-10504.002.patch, > YARN-10504.003.patch, YARN-10504.004.patch, YARN-10504.005.patch, > YARN-10504.006.patch, YARN-10504.ver-1.patch, YARN-10504.ver-2.patch, > YARN-10504.ver-3.patch > > > To allow the possibility to flexibly create queues in Capacity Scheduler a > weight mode should be introduced. The existing \{{capacity }}property should > be used with a different syntax, i.e: > root.users.capacity = (1.0) or ~1.0 or ^1.0 or @1.0 > root.users.capacity = 1.0w > root.users.capacity = w:1.0 > Weight support should not impact the existing functionality. > > The new functionality should: > * accept and validate the new weight values > * enforce a singular mode on the whole queue tree > * (re)calculate the relative (percentage-based) capacities based on the > weights during launch and every time the queue structure changes -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10504) Implement weight mode in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17261943#comment-17261943 ] Wangda Tan commented on YARN-10504: --- Updated ver.6 patch, which includes the following: 1) Unit tests to test end to end capability and mixed percentage/weight mode. 2) Rewrite ParentQueue.setChildQueues. It was a bit messy before. Now it enforced more strict checks, and rewrote check statements for better readability. [~zhuqi]/[~bteke] /[~gandras] , I haven't addressed your comment, so it will be nice if you can help to make the changes. (And again, please add a comment if you plan to do that to avoid editing the same code). > Implement weight mode in Capacity Scheduler > --- > > Key: YARN-10504 > URL: https://issues.apache.org/jira/browse/YARN-10504 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Attachments: YARN-10504.001.patch, YARN-10504.002.patch, > YARN-10504.003.patch, YARN-10504.004.patch, YARN-10504.005.patch, > YARN-10504.ver-1.patch, YARN-10504.ver-2.patch, YARN-10504.ver-3.patch > > > To allow the possibility to flexibly create queues in Capacity Scheduler a > weight mode should be introduced. The existing \{{capacity }}property should > be used with a different syntax, i.e: > root.users.capacity = (1.0) or ~1.0 or ^1.0 or @1.0 > root.users.capacity = 1.0w > root.users.capacity = w:1.0 > Weight support should not impact the existing functionality. > > The new functionality should: > * accept and validate the new weight values > * enforce a singular mode on the whole queue tree > * (re)calculate the relative (percentage-based) capacities based on the > weights during launch and every time the queue structure changes -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10504) Implement weight mode in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17261921#comment-17261921 ] Wangda Tan commented on YARN-10504: --- Make sense [~zhuqi]. [~bteke]/[~gandras] , if you're not making changes to the patch, [~zhuqi] can you take care of the issue? I plan to add a few more test case coverages for the weight mode today/tomorrow. I will not touch any existing logic. > Implement weight mode in Capacity Scheduler > --- > > Key: YARN-10504 > URL: https://issues.apache.org/jira/browse/YARN-10504 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Attachments: YARN-10504.001.patch, YARN-10504.002.patch, > YARN-10504.003.patch, YARN-10504.004.patch, YARN-10504.005.patch, > YARN-10504.ver-1.patch, YARN-10504.ver-2.patch, YARN-10504.ver-3.patch > > > To allow the possibility to flexibly create queues in Capacity Scheduler a > weight mode should be introduced. The existing \{{capacity }}property should > be used with a different syntax, i.e: > root.users.capacity = (1.0) or ~1.0 or ^1.0 or @1.0 > root.users.capacity = 1.0w > root.users.capacity = w:1.0 > Weight support should not impact the existing functionality. > > The new functionality should: > * accept and validate the new weight values > * enforce a singular mode on the whole queue tree > * (re)calculate the relative (percentage-based) capacities based on the > weights during launch and every time the queue structure changes -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10504) Implement weight mode in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17261872#comment-17261872 ] zhuqi edited comment on YARN-10504 at 1/9/21, 2:07 PM: --- [~wangda] [~bteke] [~gandras] 1. The {{updateAbsoluteCapacitiesAndRelatedFields should update maxApplications, but in some case, for example:}} {\{ in TestCapacitySchedulerAutoQueueCreation -> }}testAutoCreatedQueueActivationDeactivation {code:java} //submit user_3 app. This cant be allocated since there is no capacity // in NO_LABEL, SSD but can be in GPU label submitApp(mockRM, parentQueue, USER3, USER3, 4, 1); final CSQueue user3LeafQueue = cs.getQueue(USER3); validateCapacities((AutoCreatedLeafQueue) user3LeafQueue, 0.0f, 0.0f, 1.0f, 1.0f); validateCapacitiesByLabel((ManagedParentQueue) parentQueue, (AutoCreatedLeafQueue) user3LeafQueue, NODEL_LABEL_GPU); {code} The case is no capacity in user_3 autoCreatedLeafQueue, so in {{updateAbsoluteCapacitiesAndRelatedFields}} {code:java} private void updateAbsoluteCapacitiesAndRelatedFields() { updateAbsoluteCapacities(); CapacitySchedulerConfiguration schedulerConf = csContext.getConfiguration(); // If maxApplications not set, use the system total max app, apply newly // calculated abs capacity of the queue. if (maxApplications <= 0) { int maxSystemApps = schedulerConf. getMaximumSystemApplications(); maxApplications = (int) (maxSystemApps * queueCapacities.getAbsoluteCapacity()); } maxApplicationsPerUser = Math.min(maxApplications, (int) (maxApplications * (usersManager.getUserLimit() / 100.0f) * usersManager.getUserLimitFactor())); } // because capacities will update to 0 if (availableCapacity >= leafQueueTemplateCapacities .getAbsoluteCapacity(nodeLabel)) { updateCapacityFromTemplate(capacities, nodeLabel); activate(leafQueue, nodeLabel); } else{ updateToZeroCapacity(capacities, nodeLabel); } // And because, the update will be after reinitializeFromTemplate final AutoCreatedLeafQueueConfig initialLeafQueueTemplate = queueManagementPolicy.getInitialLeafQueueConfiguration(leafQueue); leafQueue.reinitializeFromTemplate(initialLeafQueueTemplate); // Do one update cluster resource call to make sure all absolute resources // effective resources are updated. updateClusterResource(this.csContext.getClusterResource(), new ResourceLimits(this.csContext.getClusterResource()));{code} The maxApplications and maxApplicationsPerUser will be 0. So will should handle in new logic in //TODO recalculate max applications because they can depend on capacity The todo should be removed, just pass the AutoCreatedLeafQueue case now, or add logic to make this case's maxApplications to a fixed default num. 2. As mentioned by [~bteke] "Sharing my latest findings on TestAbsoluteResourceWithAutoQueue failure: {{AutoCreatedLeafQueue#reinitializeFromTemplate }}was refactored, now the getting and merging the QueueCapacities happens *before* calling the {{ParentQueue#updateClusterResource}} (and {{LeafQueue#updateClusterResource}}). In \{{LeafQueue#updateClusterResource }}the \{{AbstractCSQueue#updateEffectiveResources }}is called where the effectiveMinResource of the created queue is overridden with the template's effectiveMinResources which is exactly the same the test is getting in the asserts." We should changed the \{{LeafQueue updateClusterResource }}to: {code:java} // public void updateClusterResource(Resource clusterResource, ResourceLimits currentResourceLimits) { writeLock.lock(); try { ... if (!(this instanceof AutoCreatedLeafQueue)) { super.updateEffectiveResources(clusterResource); } }{code} It will fix absolute case TestAbsoluteResourceWithAutoQueue . If you any other advice? Thanks. was (Author: zhuqi): [~wangda] [~bteke] 1. The {{updateAbsoluteCapacitiesAndRelatedFields should update maxApplications, but in some case, for example:}} {{ in TestCapacitySchedulerAutoQueueCreation -> }}testAutoCreatedQueueActivationDeactivation {code:java} //submit user_3 app. This cant be allocated since there is no capacity // in NO_LABEL, SSD but can be in GPU label submitApp(mockRM, parentQueue, USER3, USER3, 4, 1); final CSQueue user3LeafQueue = cs.getQueue(USER3); validateCapacities((AutoCreatedLeafQueue) user3LeafQueue, 0.0f, 0.0f, 1.0f, 1.0f); validateCapacitiesByLabel((ManagedParentQueue) parentQueue, (AutoCreatedLeafQueue) user3LeafQueue, NODEL_LABEL_GPU); {code} The case is no capacity in user_3 autoCreatedLeafQueue, so in {{updateAbsoluteCapacitiesAndRelatedFields}} {code:java} private void updateAbsoluteCapacitiesAndRelatedFields() { updateAbsoluteCapacities(); CapacitySchedulerConfiguration schedulerConf = csContext.getConfiguration(); // If maxApplications not set, use the system total max app, apply newl
[jira] [Commented] (YARN-10504) Implement weight mode in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17261872#comment-17261872 ] zhuqi commented on YARN-10504: -- [~wangda] [~bteke] 1. The {{updateAbsoluteCapacitiesAndRelatedFields should update maxApplications, but in some case, for example:}} {{ in TestCapacitySchedulerAutoQueueCreation -> }}testAutoCreatedQueueActivationDeactivation {code:java} //submit user_3 app. This cant be allocated since there is no capacity // in NO_LABEL, SSD but can be in GPU label submitApp(mockRM, parentQueue, USER3, USER3, 4, 1); final CSQueue user3LeafQueue = cs.getQueue(USER3); validateCapacities((AutoCreatedLeafQueue) user3LeafQueue, 0.0f, 0.0f, 1.0f, 1.0f); validateCapacitiesByLabel((ManagedParentQueue) parentQueue, (AutoCreatedLeafQueue) user3LeafQueue, NODEL_LABEL_GPU); {code} The case is no capacity in user_3 autoCreatedLeafQueue, so in {{updateAbsoluteCapacitiesAndRelatedFields}} {code:java} private void updateAbsoluteCapacitiesAndRelatedFields() { updateAbsoluteCapacities(); CapacitySchedulerConfiguration schedulerConf = csContext.getConfiguration(); // If maxApplications not set, use the system total max app, apply newly // calculated abs capacity of the queue. if (maxApplications <= 0) { int maxSystemApps = schedulerConf. getMaximumSystemApplications(); maxApplications = (int) (maxSystemApps * queueCapacities.getAbsoluteCapacity()); } maxApplicationsPerUser = Math.min(maxApplications, (int) (maxApplications * (usersManager.getUserLimit() / 100.0f) * usersManager.getUserLimitFactor())); } // because capacities will update to 0 if (availableCapacity >= leafQueueTemplateCapacities .getAbsoluteCapacity(nodeLabel)) { updateCapacityFromTemplate(capacities, nodeLabel); activate(leafQueue, nodeLabel); } else{ updateToZeroCapacity(capacities, nodeLabel); } // And because, the update will be after reinitializeFromTemplate final AutoCreatedLeafQueueConfig initialLeafQueueTemplate = queueManagementPolicy.getInitialLeafQueueConfiguration(leafQueue); leafQueue.reinitializeFromTemplate(initialLeafQueueTemplate); // Do one update cluster resource call to make sure all absolute resources // effective resources are updated. updateClusterResource(this.csContext.getClusterResource(), new ResourceLimits(this.csContext.getClusterResource()));{code} The maxApplications and maxApplicationsPerUser will be 0. So will should handle in new logic in //TODO recalculate max applications because they can depend on capacity The todo should be removed, just pass the AutoCreatedLeafQueue case now, or add logic to make this case's maxApplications to a fixed default num. 2. As mentioned by [~bteke] "Sharing my latest findings on TestAbsoluteResourceWithAutoQueue failure: {{AutoCreatedLeafQueue#reinitializeFromTemplate }}was refactored, now the getting and merging the QueueCapacities happens *before* calling the {{ParentQueue#updateClusterResource}} (and {{LeafQueue#updateClusterResource}}). In \{{LeafQueue#updateClusterResource }}the \{{AbstractCSQueue#updateEffectiveResources }}is called where the effectiveMinResource of the created queue is overridden with the template's effectiveMinResources which is exactly the same the test is getting in the asserts." We should changed the {{LeafQueue updateClusterResource }}to: {code:java} // public void updateClusterResource(Resource clusterResource, ResourceLimits currentResourceLimits) { writeLock.lock(); try { ... if (!(this instanceof AutoCreatedLeafQueue)) { super.updateEffectiveResources(clusterResource); } }{code} It will fix absolute case TestAbsoluteResourceWithAutoQueue . If you any other advice? Thanks. > Implement weight mode in Capacity Scheduler > --- > > Key: YARN-10504 > URL: https://issues.apache.org/jira/browse/YARN-10504 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Attachments: YARN-10504.001.patch, YARN-10504.002.patch, > YARN-10504.003.patch, YARN-10504.004.patch, YARN-10504.005.patch, > YARN-10504.ver-1.patch, YARN-10504.ver-2.patch, YARN-10504.ver-3.patch > > > To allow the possibility to flexibly create queues in Capacity Scheduler a > weight mode should be introduced. The existing \{{capacity }}property should > be used with a different syntax, i.e: > root.users.capacity = (1.0) or ~1.0 or ^1.0 or @1.0 > root.users.capacity = 1.0w > root.users.capacity = w:1.0 > Weight support should not impact the existing functionality. > > The new functionality should: > * accept and validate the new weight values > * enforce a singular mode on the whole queue tree > * (re)calculate the relative (percent
[jira] [Updated] (YARN-10558) Fix failure of TestDistributedShell#testDSShellWithOpportunisticContainers
[ https://issues.apache.org/jira/browse/YARN-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated YARN-10558: Fix Version/s: 3.3.1 > Fix failure of TestDistributedShell#testDSShellWithOpportunisticContainers > -- > > Key: YARN-10558 > URL: https://issues.apache.org/jira/browse/YARN-10558 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.1 > > Time Spent: 50m > Remaining Estimate: 0h > > The TestDistributedShell#testDSShellWithOpportunisticContainers always fails > due to insufficient test configuration. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10334) TestDistributedShell leaks resources on timeout/failure
[ https://issues.apache.org/jira/browse/YARN-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated YARN-10334: Fix Version/s: 3.3.1 > TestDistributedShell leaks resources on timeout/failure > --- > > Key: YARN-10334 > URL: https://issues.apache.org/jira/browse/YARN-10334 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-shell, test, yarn >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: newbie, pull-request-available, test > Fix For: 3.4.0, 3.3.1 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > {{TestDistributedShell}} times out on trunk. I found that the application, > and containers will stay running in the background long after the unit test > has failed. > This causes failure of other test cases and several false positives failures > as result of: > * Ports will stay busy, so other tests cases fail to launch. > * Unit tests fail because of memory restrictions. > Although the unit test is already broken on trunk, we do not want its > failures to other unit tests. > {{TestDistributedShell}} needs to be revisited to make sure that all > {{YarnClients}}, and {{YarnApplications}} are closed properly at the end of > the each unit test (including exception and timeouts) > Steps to reproduce: > {code:bash} > mvn test -Dtest=TestDistributedShell#testDSShellWithOpportunisticContainers > ## this will timeout as > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 90.234 s <<< FAILURE! - in > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell > [ERROR] > testDSShellWithOpportunisticContainers(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 90.018 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 9 > milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:1117) > at > org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:1089) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers(TestDistributedShell.java:1438) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > [INFO] > [INFO] Results: > [INFO] > [ERROR] Errors: > [ERROR] TestDistributedShell.testDSShellWithOpportunisticContainers:1438 » > TestTimedOut > [INFO] > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0 > {code} > Using {{ps}} command, you can find the yarn processes are still in the > background > {code:bash} > /bin/bash -c $JRE_HOME/bin/java -Xmx512m > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster > --container_type OPPORTUNISTIC --container_memory 128 --container_vcores 1 > --num_containers 2 --priority 0 --appname DistributedShell --homedir > file:/Users/ahussein > 1>$WORK_DIR8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1593554710896_0001/container_1593554710896_0001_01_01/AppMaster.stdout > > 2>$WORK_DIR8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1593554710896_0001/container_1593554710896_0001_01_01/AppMaster.stderr > $JRE_HOME/bin/java -Xmx512m > org.apache.hadoop.yarn
[jira] [Updated] (YARN-10536) Client in distributedShell swallows interrupt exceptions
[ https://issues.apache.org/jira/browse/YARN-10536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated YARN-10536: Fix Version/s: 3.3.1 > Client in distributedShell swallows interrupt exceptions > > > Key: YARN-10536 > URL: https://issues.apache.org/jira/browse/YARN-10536 > Project: Hadoop YARN > Issue Type: Bug > Components: client, distributed-shell >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.1 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > In {{applications.distributedshell.Client}} , the method > {{monitorApplication}} loops waiting for the following conditions: > * Application fails: reaches {{YarnApplicationState.KILLED}}, or > {{YarnApplicationState.FAILED}} > * Application succeeds: {{FinalApplicationStatus.SUCCEEDED}} or > {{YarnApplicationState.FINISHED}} > * the time spent waiting is longer than {{clientTimeout}} (if it exists in > the parameters). > When the Client thread is interrupted, it ignores the exception: > {code:java} > // Check app status every 1 second. > try { > Thread.sleep(1000); > } catch (InterruptedException e) { > LOG.debug("Thread sleep in monitoring loop interrupted"); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org