***UNCHECKED*** [jira] [Updated] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8771: -- Fix Version/s: (was: 3.0.4) > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8771.001.patch, YARN-8771.002.patch, > YARN-8771.003.patch, YARN-8771.004.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when > {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, > needToUnreserve which is the result of {{Resources#greaterThan}} will be > {{false}}. This is not reasonable because required resource did exceed the > headroom and unreserve is needed. > After that, when reaching the unreserve process in > RegularContainerAllocator#assignContainer, unreserve process will be skipped > when shouldAllocOrReserveNewContainer is true (when required containers > > reserved containers) and needToUnreserve is wrongly calculated to be false: > {code:java} > if (availableContainers > 0) { > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > // unreserve process can be wrongly skipped when > shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required > resource did exceed the headroom > if (!shouldAllocOrReserveNewContainer || needToUnreserve) { > ... > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-8771: --- Attachment: YARN-8771.004.patch > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch, > YARN-8771.003.patch, YARN-8771.004.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when > {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, > needToUnreserve which is the result of {{Resources#greaterThan}} will be > {{false}}. This is not reasonable because required resource did exceed the > headroom and unreserve is needed. > After that, when reaching the unreserve process in > RegularContainerAllocator#assignContainer, unreserve process will be skipped > when shouldAllocOrReserveNewContainer is true (when required containers > > reserved containers) and needToUnreserve is wrongly calculated to be false: > {code:java} > if (availableContainers > 0) { > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > // unreserve process can be wrongly skipped when > shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required > resource did exceed the headroom > if (!shouldAllocOrReserveNewContainer || needToUnreserve) { > ... > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-8771: --- Attachment: YARN-8771.003.patch > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch, > YARN-8771.003.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when > {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, > needToUnreserve which is the result of {{Resources#greaterThan}} will be > {{false}}. This is not reasonable because required resource did exceed the > headroom and unreserve is needed. > After that, when reaching the unreserve process in > RegularContainerAllocator#assignContainer, unreserve process will be skipped > when shouldAllocOrReserveNewContainer is true (when required containers > > reserved containers) and needToUnreserve is wrongly calculated to be false: > {code:java} > if (availableContainers > 0) { > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > // unreserve process can be wrongly skipped when > shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required > resource did exceed the headroom > if (!shouldAllocOrReserveNewContainer || needToUnreserve) { > ... > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8771: - Target Version/s: 3.1.1, 3.2.0 > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when > {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, > needToUnreserve which is the result of {{Resources#greaterThan}} will be > {{false}}. This is not reasonable because required resource did exceed the > headroom and unreserve is needed. > After that, when reaching the unreserve process in > RegularContainerAllocator#assignContainer, unreserve process will be skipped > when shouldAllocOrReserveNewContainer is true (when required containers > > reserved containers) and needToUnreserve is wrongly calculated to be false: > {code:java} > if (availableContainers > 0) { > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > // unreserve process can be wrongly skipped when > shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required > resource did exceed the headroom > if (!shouldAllocOrReserveNewContainer || needToUnreserve) { > ... > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-8771: --- Description: We found this problem when cluster is almost but not exhausted (93% used), scheduler kept allocating for an app but always fail to commit, this can blocking requests from other apps and parts of cluster resource can't be used. Reproduce this problem: (1) use DominantResourceCalculator (2) cluster resource has empty resource type, for example: gpu=0 (3) scheduler allocates container for app1 who has reserved containers and whose queue limit or user limit reached(used + required > limit). Reference codes in RegularContainerAllocator#assignContainer: {code:java} // How much need to unreserve equals to: // max(required - headroom, amountNeedUnreserve) Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); Resource resourceNeedToUnReserve = Resources.max(rc, clusterResource, Resources.subtract(capability, headRoom), currentResoureLimits.getAmountNeededUnreserve()); boolean needToUnreserve = Resources.greaterThan(rc, clusterResource, resourceNeedToUnReserve, Resources.none()); {code} For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, needToUnreserve which is the result of {{Resources#greaterThan}} will be {{false}}. This is not reasonable because required resource did exceed the headroom and unreserve is needed. After that, when reaching the unreserve process in RegularContainerAllocator#assignContainer, unreserve process will be skipped when shouldAllocOrReserveNewContainer is true (when required containers > reserved containers) and needToUnreserve is wrongly calculated to be false: {code:java} if (availableContainers > 0) { if (rmContainer == null && reservationsContinueLooking && node.getLabels().isEmpty()) { // unreserve process can be wrongly skipped when shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required resource did exceed the headroom if (!shouldAllocOrReserveNewContainer || needToUnreserve) { ... } } } {code} was: We found this problem when cluster is almost but not exhausted (93% used), scheduler kept allocating for an app but always fail to commit, this can blocking requests from other apps and parts of cluster resource can't be used. Reproduce this problem: (1) use DominantResourceCalculator (2) cluster resource has empty resource type, for example: gpu=0 (3) scheduler allocates container for app1 who has reserved containers and whose queue limit or user limit reached(used + required > limit). Reference codes in RegularContainerAllocator#assignContainer: {code:java} // How much need to unreserve equals to: // max(required - headroom, amountNeedUnreserve) Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); Resource resourceNeedToUnReserve = Resources.max(rc, clusterResource, Resources.subtract(capability, headRoom), currentResoureLimits.getAmountNeededUnreserve()); boolean needToUnreserve = Resources.greaterThan(rc, clusterResource, resourceNeedToUnReserve, Resources.none()); {code} For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, needToUnreserve which is the result of {{Resources#greaterThan}} will be {{false}}. This is not reasonable because required resource did exceed the headroom and unreserve is needed. After that, when reaching the unreserve process in RegularContainerAllocator#assignContainer, unreserve process will be skipped when shouldAllocOrReserveNewContainer is true (when required containers > reserved containers) and needToUnreserve is wrongly calculated to be false: {code:java} if (availableContainers > 0) { if (rmContainer == null && reservationsContinueLooking && node.getLabels().isEmpty()) { if (!shouldAllocOrReserveNewContainer || needToUnreserve) { ...// unreserve process can be wrongly skipped here!!! } } } {code} > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch
[jira] [Updated] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-8771: --- Description: We found this problem when cluster is almost but not exhausted (93% used), scheduler kept allocating for an app but always fail to commit, this can blocking requests from other apps and parts of cluster resource can't be used. Reproduce this problem: (1) use DominantResourceCalculator (2) cluster resource has empty resource type, for example: gpu=0 (3) scheduler allocates container for app1 who has reserved containers and whose queue limit or user limit reached(used + required > limit). Reference codes in RegularContainerAllocator#assignContainer: {code:java} // How much need to unreserve equals to: // max(required - headroom, amountNeedUnreserve) Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); Resource resourceNeedToUnReserve = Resources.max(rc, clusterResource, Resources.subtract(capability, headRoom), currentResoureLimits.getAmountNeededUnreserve()); boolean needToUnreserve = Resources.greaterThan(rc, clusterResource, resourceNeedToUnReserve, Resources.none()); {code} For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, needToUnreserve which is the result of {{Resources#greaterThan}} will be {{false}}. This is not reasonable because required resource did exceed the headroom and unreserve is needed. After that, when reaching the unreserve process in RegularContainerAllocator#assignContainer, unreserve process will be skipped when shouldAllocOrReserveNewContainer is true (when required containers > reserved containers) and needToUnreserve is wrongly calculated to be false: {code:java} if (availableContainers > 0) { if (rmContainer == null && reservationsContinueLooking && node.getLabels().isEmpty()) { if (!shouldAllocOrReserveNewContainer || needToUnreserve) { ...// unreserve process can be wrongly skipped here!!! } } } {code} was: We found this problem when cluster is almost but not exhausted (93% used), scheduler kept allocating for an app but always fail to commit, this can blocking requests from other apps and parts of cluster resource can't be used. Reproduce this problem: (1) use DominantResourceCalculator (2) cluster resource has empty resource type, for example: gpu=0 (3) scheduler allocates container for app1 who has reserved containers and whose queue limit or user limit reached(used + required > limit). Reference codes in RegularContainerAllocator#assignContainer: {code:java} // How much need to unreserve equals to: // max(required - headroom, amountNeedUnreserve) Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); Resource resourceNeedToUnReserve = Resources.max(rc, clusterResource, Resources.subtract(capability, headRoom), currentResoureLimits.getAmountNeededUnreserve()); boolean needToUnreserve = Resources.greaterThan(rc, clusterResource, resourceNeedToUnReserve, Resources.none()); {code} For example, value of resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, needToUnreserve which is the result of {{Resources#greaterThan}} will be {{false}} if using DominantResourceCalculator. This is the not reasonable because required resource did exceed the headroom and unreserve is needed. After that, when reaching the unreserve process in RegularContainerAllocator#assignContainer, unreserve process will be skipped when shouldAllocOrReserveNewContainer is true (when required containers > reserved containers) and needToUnreserve is wrongly calculated to be false: {code:java} if (availableContainers > 0) { if (rmContainer == null && reservationsContinueLooking && node.getLabels().isEmpty()) { if (!shouldAllocOrReserveNewContainer || needToUnreserve) { ...// unreserve process can be wrongly skipped here!!! } } } {code} > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch > > > We found this problem when cluster is almos
[jira] [Updated] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-8771: --- Description: We found this problem when cluster is almost but not exhausted (93% used), scheduler kept allocating for an app but always fail to commit, this can blocking requests from other apps and parts of cluster resource can't be used. Reproduce this problem: (1) use DominantResourceCalculator (2) cluster resource has empty resource type, for example: gpu=0 (3) scheduler allocates container for app1 who has reserved containers and whose queue limit or user limit reached(used + required > limit). Reference codes in RegularContainerAllocator#assignContainer: {code:java} // How much need to unreserve equals to: // max(required - headroom, amountNeedUnreserve) Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); Resource resourceNeedToUnReserve = Resources.max(rc, clusterResource, Resources.subtract(capability, headRoom), currentResoureLimits.getAmountNeededUnreserve()); boolean needToUnreserve = Resources.greaterThan(rc, clusterResource, resourceNeedToUnReserve, Resources.none()); {code} For example, value of resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, needToUnreserve which is the result of {{Resources#greaterThan}} will be {{false}} if using DominantResourceCalculator. This is the not reasonable because required resource did exceed the headroom and unreserve is needed. After that, when reaching the unreserve process in RegularContainerAllocator#assignContainer, unreserve process will be skipped when shouldAllocOrReserveNewContainer is true (when required containers > reserved containers) and needToUnreserve is wrongly calculated to be false: {code:java} if (availableContainers > 0) { if (rmContainer == null && reservationsContinueLooking && node.getLabels().isEmpty()) { if (!shouldAllocOrReserveNewContainer || needToUnreserve) { ...// unreserve process can be wrongly skipped here!!! } } } {code} was: We found this problem when cluster is almost but not exhausted (93% used), scheduler kept allocating for an app but always fail to commit, this can blocking requests from other apps and parts of cluster resource can't be used. Reproduce this problem: (1) use DominantResourceCalculator (2) cluster resource has empty resource type, for example: gpu=0 (3) scheduler allocates container for app1 who has reserved containers and whose queue limit or user limit reached(used + required > limit). Reference codes in RegularContainerAllocator#assignContainer: {code:java} boolean needToUnreserve = Resources.greaterThan(rc, clusterResource, resourceNeedToUnReserve, Resources.none()); {code} value of resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu>, result of {{Resources#greaterThan}} will be false if using DominantResourceCalculator. > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, value of resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> > when {{headRoom=<0GB
[jira] [Updated] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-8771: --- Attachment: YARN-8771.002.patch > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > value of resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu>, result of > {{Resources#greaterThan}} will be false if using DominantResourceCalculator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-8771: --- Attachment: YARN-8771.001.patch > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > value of resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu>, result of > {{Resources#greaterThan}} will be false if using DominantResourceCalculator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org