[jira] [Commented] (YUNIKORN-820) Update SI dependency in the core repo
[ https://issues.apache.org/jira/browse/YUNIKORN-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403522#comment-17403522 ] Ting Yao,Huang commented on YUNIKORN-820: - Since this issue is urgent and we cant fix this UT problem now, So we create [YUNIKORN-821|https://issues.apache.org/jira/browse/YUNIKORN-821] to track that UT problem. > Update SI dependency in the core repo > - > > Key: YUNIKORN-820 > URL: https://issues.apache.org/jira/browse/YUNIKORN-820 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - common >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Labels: pull-request-available > Fix For: 1.0.0 > > > Need to update the core dep to the latest SI > The recent gRPC, protobuf version changes introduce minor changes. There is a > UT: {{TestSIFromAlloc}} is failing after that as well, we need to fix that in > a follow up issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-821) UT:TestSIFromAlloc() is failing after gRPC and protobuf version changes
[ https://issues.apache.org/jira/browse/YUNIKORN-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ting Yao,Huang updated YUNIKORN-821: Description: After gRPC and protobuf version changes, UT: `TestSIFromAlloc()` is failing. Since Yunikorn-820 is urgent and we cant fix this UT problem in that issue, So we create this issue to track that UT problem. c.c. [~wwei] was: After gRPC and protobuf version changes, UT: `TestSIFromAlloc()` is failing. Since Yunikorn-820 is urgent and we cant fix this UT problem in that issue, So we create this issue to fix that UT problem. c.c. [~wwei] > UT:TestSIFromAlloc() is failing after gRPC and protobuf version changes > --- > > Key: YUNIKORN-821 > URL: https://issues.apache.org/jira/browse/YUNIKORN-821 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - common >Reporter: Ting Yao,Huang >Priority: Critical > Fix For: 1.0.0 > > > After gRPC and protobuf version changes, UT: `TestSIFromAlloc()` is failing. > Since Yunikorn-820 is urgent and we cant fix this UT problem in that issue, > So we create this issue to track that UT problem. c.c. [~wwei] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-821) UT:TestSIFromAlloc() is failing after gRPC and protobuf version changes
[ https://issues.apache.org/jira/browse/YUNIKORN-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YUNIKORN-821: - Fix Version/s: 1.0.0 > UT:TestSIFromAlloc() is failing after gRPC and protobuf version changes > --- > > Key: YUNIKORN-821 > URL: https://issues.apache.org/jira/browse/YUNIKORN-821 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - common >Reporter: Ting Yao,Huang >Priority: Critical > Fix For: 1.0.0 > > > After gRPC and protobuf version changes, UT: `TestSIFromAlloc()` is failing. > Since Yunikorn-820 is urgent and we cant fix this UT problem in that issue, > So we create this issue to fix that UT problem. c.c. [~wwei] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-821) UT:TestSIFromAlloc() is failing after gRPC and protobuf version changes
[ https://issues.apache.org/jira/browse/YUNIKORN-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YUNIKORN-821: - Priority: Critical (was: Major) > UT:TestSIFromAlloc() is failing after gRPC and protobuf version changes > --- > > Key: YUNIKORN-821 > URL: https://issues.apache.org/jira/browse/YUNIKORN-821 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - common >Reporter: Ting Yao,Huang >Priority: Critical > > After gRPC and protobuf version changes, UT: `TestSIFromAlloc()` is failing. > Since Yunikorn-820 is urgent and we cant fix this UT problem in that issue, > So we create this issue to fix that UT problem. c.c. [~wwei] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-820) Update SI dependency in the core repo
[ https://issues.apache.org/jira/browse/YUNIKORN-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ting Yao,Huang resolved YUNIKORN-820. - Target Version: 1.0.0 Resolution: Fixed > Update SI dependency in the core repo > - > > Key: YUNIKORN-820 > URL: https://issues.apache.org/jira/browse/YUNIKORN-820 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - common >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Labels: pull-request-available > Fix For: 1.0.0 > > > Need to update the core dep to the latest SI > The recent gRPC, protobuf version changes introduce minor changes. There is a > UT: {{TestSIFromAlloc}} is failing after that as well, we need to fix that in > a follow up issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-821) UT:TestSIFromAlloc() is failing after gRPC and protobuf version changes
Ting Yao,Huang created YUNIKORN-821: --- Summary: UT:TestSIFromAlloc() is failing after gRPC and protobuf version changes Key: YUNIKORN-821 URL: https://issues.apache.org/jira/browse/YUNIKORN-821 Project: Apache YuniKorn Issue Type: Bug Components: core - common Reporter: Ting Yao,Huang After gRPC and protobuf version changes, UT: `TestSIFromAlloc()` is failing. Since Yunikorn-820 is urgent and we cant fix this UT problem in that issue, So we create this issue to fix that UT problem. c.c. [~wwei] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-819) Update go version dependency to 1.15
[ https://issues.apache.org/jira/browse/YUNIKORN-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403405#comment-17403405 ] Weiwei Yang commented on YUNIKORN-819: -- hi [~chia7712], the major reason is that our current tests are running against 1.15. such as https://github.com/apache/incubator-yunikorn-k8shim/blob/25a71ba3fc646cbd96dff5b2882388eceb27bf9c/.github/workflows/main.yml#L15. It might potentially be a bigger change if we upgrade to 1.17. > Update go version dependency to 1.15 > > > Key: YUNIKORN-819 > URL: https://issues.apache.org/jira/browse/YUNIKORN-819 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - cache, scheduler-interface, shim - kubernetes >Reporter: Weiwei Yang >Priority: Major > Labels: newbie > Fix For: 1.0.0 > > > We need to update all go dependency to 1.15 > this is defined in the top level go mod file -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-819) Update go version dependency to 1.15
[ https://issues.apache.org/jira/browse/YUNIKORN-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403350#comment-17403350 ] Chia-Ping Tsai commented on YUNIKORN-819: - just curious. Why not updating to go 1.17 (the latest version)? > Update go version dependency to 1.15 > > > Key: YUNIKORN-819 > URL: https://issues.apache.org/jira/browse/YUNIKORN-819 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - cache, scheduler-interface, shim - kubernetes >Reporter: Weiwei Yang >Priority: Major > Labels: newbie > Fix For: 1.0.0 > > > We need to update all go dependency to 1.15 > this is defined in the top level go mod file -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-813) The capacity of undefined resource should NOT be considered zero
[ https://issues.apache.org/jira/browse/YUNIKORN-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-813: Labels: pull-request-available (was: ) > The capacity of undefined resource should NOT be considered zero > > > Key: YUNIKORN-813 > URL: https://issues.apache.org/jira/browse/YUNIKORN-813 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Assignee: Chia-Ping Tsai >Priority: Major > Labels: pull-request-available > > {code} > resources: > max: > memory: 1 > {code} > If above configuration is added to a leaf queue, the queue can't run any > application since the "vcore" is assumed to be zero. That obstructs us from > limiting only a part of resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-813) The capacity of undefined resource should NOT be considered zero
[ https://issues.apache.org/jira/browse/YUNIKORN-813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403288#comment-17403288 ] Chia-Ping Tsai commented on YUNIKORN-813: - {quote} If a resource is undefined I think it should be considered max value/unlimited {quote} After tracking the code again, it seems to me the "undefined resource" of child should reference the resource from parent. As root always have max resources of vcore/memory, the other queue can get the "valid" max resource from "the parent". > The capacity of undefined resource should NOT be considered zero > > > Key: YUNIKORN-813 > URL: https://issues.apache.org/jira/browse/YUNIKORN-813 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Assignee: Chia-Ping Tsai >Priority: Major > > {code} > resources: > max: > memory: 1 > {code} > If above configuration is added to a leaf queue, the queue can't run any > application since the "vcore" is assumed to be zero. That obstructs us from > limiting only a part of resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-820) Update SI dependency in the core repo
[ https://issues.apache.org/jira/browse/YUNIKORN-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-820: Labels: pull-request-available (was: ) > Update SI dependency in the core repo > - > > Key: YUNIKORN-820 > URL: https://issues.apache.org/jira/browse/YUNIKORN-820 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - common >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Labels: pull-request-available > Fix For: 1.0.0 > > > Need to update the core dep to the latest SI > The recent gRPC, protobuf version changes introduce minor changes. There is a > UT: {{TestSIFromAlloc}} is failing after that as well, we need to fix that in > a follow up issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-820) Update SI dependency in the core repo
Weiwei Yang created YUNIKORN-820: Summary: Update SI dependency in the core repo Key: YUNIKORN-820 URL: https://issues.apache.org/jira/browse/YUNIKORN-820 Project: Apache YuniKorn Issue Type: Bug Components: core - common Reporter: Weiwei Yang Fix For: 1.0.0 Need to update the core dep to the latest SI The recent gRPC, protobuf version changes introduce minor changes. There is a UT: {{TestSIFromAlloc}} is failing after that as well, we need to fix that in a follow up issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Assigned] (YUNIKORN-820) Update SI dependency in the core repo
[ https://issues.apache.org/jira/browse/YUNIKORN-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang reassigned YUNIKORN-820: Assignee: Weiwei Yang > Update SI dependency in the core repo > - > > Key: YUNIKORN-820 > URL: https://issues.apache.org/jira/browse/YUNIKORN-820 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - common >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Fix For: 1.0.0 > > > Need to update the core dep to the latest SI > The recent gRPC, protobuf version changes introduce minor changes. There is a > UT: {{TestSIFromAlloc}} is failing after that as well, we need to fix that in > a follow up issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-810) Add a field in scheduler-interface to represent required node for the scheduler
[ https://issues.apache.org/jira/browse/YUNIKORN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved YUNIKORN-810. -- Resolution: Fixed > Add a field in scheduler-interface to represent required node for the > scheduler > --- > > Key: YUNIKORN-810 > URL: https://issues.apache.org/jira/browse/YUNIKORN-810 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: scheduler-interface >Reporter: Ting Yao,Huang >Assignee: Ting Yao,Huang >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-819) Update go version dependency to 1.15
Weiwei Yang created YUNIKORN-819: Summary: Update go version dependency to 1.15 Key: YUNIKORN-819 URL: https://issues.apache.org/jira/browse/YUNIKORN-819 Project: Apache YuniKorn Issue Type: Bug Components: core - cache, scheduler-interface, shim - kubernetes Reporter: Weiwei Yang Fix For: 1.0.0 We need to update all go dependency to 1.15 this is defined in the top level go mod file -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-815) Fix scheduler interface makefile issues
[ https://issues.apache.org/jira/browse/YUNIKORN-815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved YUNIKORN-815. -- Resolution: Fixed > Fix scheduler interface makefile issues > --- > > Key: YUNIKORN-815 > URL: https://issues.apache.org/jira/browse/YUNIKORN-815 > Project: Apache YuniKorn > Issue Type: Bug > Components: scheduler-interface >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > After YUNIKORN-760, we are seeing issues to update the scheduler-interface > dependency in other repos. This is because YUNIKORN-760 has removed the GRPC > generated code, but we have useful code in the core repo that still needs > them, such as > https://github.com/apache/incubator-yunikorn-core/tree/master/cmd. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Reopened] (YUNIKORN-810) Add a field in scheduler-interface to represent required node for the scheduler
[ https://issues.apache.org/jira/browse/YUNIKORN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang reopened YUNIKORN-810: -- > Add a field in scheduler-interface to represent required node for the > scheduler > --- > > Key: YUNIKORN-810 > URL: https://issues.apache.org/jira/browse/YUNIKORN-810 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: scheduler-interface >Reporter: Ting Yao,Huang >Assignee: Ting Yao,Huang >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-738) Update helm index for v0.11
[ https://issues.apache.org/jira/browse/YUNIKORN-738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kinga Marton resolved YUNIKORN-738. --- Fix Version/s: 0.11 Resolution: Fixed > Update helm index for v0.11 > --- > > Key: YUNIKORN-738 > URL: https://issues.apache.org/jira/browse/YUNIKORN-738 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: release >Reporter: Tao Yang >Assignee: Kinga Marton >Priority: Major > Labels: pull-request-available > Fix For: 0.11 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-738) Update helm index for v0.11
[ https://issues.apache.org/jira/browse/YUNIKORN-738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-738: Labels: pull-request-available (was: ) > Update helm index for v0.11 > --- > > Key: YUNIKORN-738 > URL: https://issues.apache.org/jira/browse/YUNIKORN-738 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: release >Reporter: Tao Yang >Assignee: Kinga Marton >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-704) [Umbrella] Use the same mechanism to schedule daemon set pods as the default scheduler
[ https://issues.apache.org/jira/browse/YUNIKORN-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403234#comment-17403234 ] Chaoran Yu commented on YUNIKORN-704: - [~Huang Ting Yao] No worries! Thanks for working on this feature! > [Umbrella] Use the same mechanism to schedule daemon set pods as the default > scheduler > -- > > Key: YUNIKORN-704 > URL: https://issues.apache.org/jira/browse/YUNIKORN-704 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Chaoran Yu >Assignee: Ting Yao,Huang >Priority: Blocker > Fix For: 1.0.0 > > Attachments: fluent-bit-describe.yaml, fluent-bit.yaml > > > We sometimes see DaemonSet pods fail to be scheduled. Please see attached > files for the YAML and _kubectl describe_ output of one such pod. We > originally suspected [node > reservation|https://github.com/apache/incubator-yunikorn-core/blob/v0.10.0/pkg/scheduler/context.go#L41] > was to blame. But even after setting the DISABLE_RESERVATION environment > variable to true, we still see such scheduling failures. The issue is > especially severe when K8s nodes have disk pressure that causes lots of pods > to be evicted. Newly created pods will stay in pending forever. We have to > temporarily uninstall YuniKorn and let the default scheduler do the > scheduling for these pods. > This issue is critical because lots of important pods belong to a DaemonSet, > such as Fluent Bit, a common logging solution. This is probably the last > remaining roadblock for us to have the confidence to have YuniKorn entirely > replace the default scheduler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-807) Improve performance of node sorting
[ https://issues.apache.org/jira/browse/YUNIKORN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403200#comment-17403200 ] Craig Condit commented on YUNIKORN-807: --- [~sunilg], thanks for the feedback. I agree with using a followup Jira to address specific scheduler behavior – this one is explicitly about refactoring / performance improvement. I don't foresee hotspotting in nodeUpdated handling being a real problem. Updates should be relatively infrequent per node, and the locks are taken per-node. > Improve performance of node sorting > --- > > Key: YUNIKORN-807 > URL: https://issues.apache.org/jira/browse/YUNIKORN-807 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Attachments: Node Sorting Performance Improvement.pdf > > > YuniKorn currently sorts all nodes on demand whenever scheduling of a > container occurs. This causes significant performance degradation as the > number of nodes increases. > If we replace the on-demand sorting with a B-Tree sorted proactively, we can > improve performance considerably. > This is a similar approach to YUNIKORN-21, but without the associated > behavioral changes. > I've attached a design document with the details of the approach and the > performance improvement gained. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-704) [Umbrella] Use the same mechanism to schedule daemon set pods as the default scheduler
[ https://issues.apache.org/jira/browse/YUNIKORN-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403143#comment-17403143 ] Ting Yao,Huang commented on YUNIKORN-704: - Hi [~yuchaoran2011] [~chenya_zhang], Since we decided to implement in other way, so this issue might delay. please refer [Yunikorn-810|https://issues.apache.org/jira/browse/YUNIKORN-810] [Yunikorn-811|https://issues.apache.org/jira/browse/YUNIKORN-811] [Yunikorn-812|https://issues.apache.org/jira/browse/YUNIKORN-812]. Sorry for the delay. > [Umbrella] Use the same mechanism to schedule daemon set pods as the default > scheduler > -- > > Key: YUNIKORN-704 > URL: https://issues.apache.org/jira/browse/YUNIKORN-704 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Chaoran Yu >Assignee: Ting Yao,Huang >Priority: Blocker > Fix For: 1.0.0 > > Attachments: fluent-bit-describe.yaml, fluent-bit.yaml > > > We sometimes see DaemonSet pods fail to be scheduled. Please see attached > files for the YAML and _kubectl describe_ output of one such pod. We > originally suspected [node > reservation|https://github.com/apache/incubator-yunikorn-core/blob/v0.10.0/pkg/scheduler/context.go#L41] > was to blame. But even after setting the DISABLE_RESERVATION environment > variable to true, we still see such scheduling failures. The issue is > especially severe when K8s nodes have disk pressure that causes lots of pods > to be evicted. Newly created pods will stay in pending forever. We have to > temporarily uninstall YuniKorn and let the default scheduler do the > scheduling for these pods. > This issue is critical because lots of important pods belong to a DaemonSet, > such as Fluent Bit, a common logging solution. This is probably the last > remaining roadblock for us to have the confidence to have YuniKorn entirely > replace the default scheduler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-813) The capacity of undefined resource should NOT be considered zero
[ https://issues.apache.org/jira/browse/YUNIKORN-813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402999#comment-17402999 ] Chia-Ping Tsai commented on YUNIKORN-813: - [~kmarton] thanks for your feedback. {quote} If a resource is undefined I think it should be considered max value/unlimited {quote} We are on the same page :) > The capacity of undefined resource should NOT be considered zero > > > Key: YUNIKORN-813 > URL: https://issues.apache.org/jira/browse/YUNIKORN-813 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Assignee: Chia-Ping Tsai >Priority: Major > > {code} > resources: > max: > memory: 1 > {code} > If above configuration is added to a leaf queue, the queue can't run any > application since the "vcore" is assumed to be zero. That obstructs us from > limiting only a part of resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-813) The capacity of undefined resource should NOT be considered zero
[ https://issues.apache.org/jira/browse/YUNIKORN-813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402981#comment-17402981 ] Kinga Marton commented on YUNIKORN-813: --- If a resource is undefined I think it should be considered max value/unlimited. We are already using this approach in a couple of places in the code such as i queue quota check. > The capacity of undefined resource should NOT be considered zero > > > Key: YUNIKORN-813 > URL: https://issues.apache.org/jira/browse/YUNIKORN-813 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Assignee: Chia-Ping Tsai >Priority: Major > > {code} > resources: > max: > memory: 1 > {code} > If above configuration is added to a leaf queue, the queue can't run any > application since the "vcore" is assumed to be zero. That obstructs us from > limiting only a part of resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org