[jira] [Resolved] (YUNIKORN-638) Make placeholder image configurable
[ https://issues.apache.org/jira/browse/YUNIKORN-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-638. Fix Version/s: 1.0.0 Resolution: Fixed both repos committed > Make placeholder image configurable > --- > > Key: YUNIKORN-638 > URL: https://issues.apache.org/jira/browse/YUNIKORN-638 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: shim - kubernetes >Reporter: Chaoran Yu >Assignee: Peter Bacsko >Priority: Critical > Labels: pull-request-available > Fix For: 1.0.0 > > > The placeholder image is currently hard-coded as a constant at > https://github.com/apache/incubator-yunikorn-k8shim/blob/v0.10.0/pkg/common/constants/constants.go#L55. > In many sectors and enterprises, it's common to have restricted internet > access. When replacing the placeholder image with something else, the entire > k8shim image also needs to be rebuilt. It's more inconvenient when different > deployment environments (dev/test, staging and prod) can't access images in > another environment. > It would be better if the placeholder image can be configured in the Helm > chart: > https://github.com/apache/incubator-yunikorn-release/blob/master/helm-charts/yunikorn/values.yaml. > That would make CI/CD easier too -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-951) Add perf-tool description into benchmarking tutorial page
[ https://issues.apache.org/jira/browse/YUNIKORN-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved YUNIKORN-951. -- Fix Version/s: 1.0.0 Resolution: Fixed > Add perf-tool description into benchmarking tutorial page > - > > Key: YUNIKORN-951 > URL: https://issues.apache.org/jira/browse/YUNIKORN-951 > Project: Apache YuniKorn > Issue Type: Task > Components: release, website >Reporter: Chen Yu Teng >Assignee: Chen Yu Teng >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > > Describe performance tool and how to use this. > Update perf tools doc into yunikorn > website([https://yunikorn.apache.org/docs/performance/performance_tutorial]) > Excepted context: > # Cases setting in conf.yaml > ** Describe perf cases with default parameters according to conf.yaml > context > ** Parameters description > # How to start test > ** commands > # Meaning of outputs. > ** Explain what diagrams will produce according to default conf.yaml -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-1124) Avoid passing empty nodeAttributes in UpdateNode request
[ https://issues.apache.org/jira/browse/YUNIKORN-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-1124: Issue Type: Bug (was: Improvement) > Avoid passing empty nodeAttributes in UpdateNode request > > > Key: YUNIKORN-1124 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1124 > Project: Apache YuniKorn > Issue Type: Bug >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > > YUNIKORN-1123 had fixed YUNIKORN-1090 by passing empty nodeAttributes in > UpdateNode request. But it is better to handle this empty checks in core > itself and avoid passing empty attributes map from shim side. In addition, > need to assess the following: > > # Does core uses OccupiedResource and labels attributes being sent as part > of Create NodeRequest from shim? > # By any chance, Does shim need to pass yunikorn.apache.org/nodeType > attribute to core? Anyway, need to make it as constant -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-1124) Avoid passing empty nodeAttributes in UpdateNode request
[ https://issues.apache.org/jira/browse/YUNIKORN-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-1124: Target Version: 1.0.0 > Avoid passing empty nodeAttributes in UpdateNode request > > > Key: YUNIKORN-1124 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1124 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > > YUNIKORN-1123 had fixed YUNIKORN-1090 by passing empty nodeAttributes in > UpdateNode request. But it is better to handle this empty checks in core > itself and avoid passing empty attributes map from shim side. In addition, > need to assess the following: > > # Does core uses OccupiedResource and labels attributes being sent as part > of Create NodeRequest from shim? > # By any chance, Does shim need to pass yunikorn.apache.org/nodeType > attribute to core? Anyway, need to make it as constant -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-1124) Avoid passing empty nodeAttributes in UpdateNode request
[ https://issues.apache.org/jira/browse/YUNIKORN-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-1124: Fix Version/s: (was: 1.0.0) > Avoid passing empty nodeAttributes in UpdateNode request > > > Key: YUNIKORN-1124 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1124 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > > YUNIKORN-1123 had fixed YUNIKORN-1090 by passing empty nodeAttributes in > UpdateNode request. But it is better to handle this empty checks in core > itself and avoid passing empty attributes map from shim side. In addition, > need to assess the following: > > # Does core uses OccupiedResource and labels attributes being sent as part > of Create NodeRequest from shim? > # By any chance, Does shim need to pass yunikorn.apache.org/nodeType > attribute to core? Anyway, need to make it as constant -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-1118) Config validation should soft-succeed if yunikorn is not reachable
[ https://issues.apache.org/jira/browse/YUNIKORN-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-1118. - Fix Version/s: 1.0.0 Resolution: Fixed config updates soft-succeed if the scheduler REST API is not reachable or returns garbage > Config validation should soft-succeed if yunikorn is not reachable > -- > > Key: YUNIKORN-1118 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1118 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > Currently, the admission controller fails to validate new configmaps if > yunikorn is not running or not reachable. We should allow updates in this > case since otherwise we may not be able to get YuniKorn running again. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-1121) MockScheduler addTask ignores resource settings
[ https://issues.apache.org/jira/browse/YUNIKORN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-1121. - Fix Version/s: 1.0.0 Resolution: Fixed change committed: one container will be added to the task with the resources set to the ask > MockScheduler addTask ignores resource settings > --- > > Key: YUNIKORN-1121 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1121 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Craig Condit >Priority: Major > Labels: newbie, pull-request-available > Fix For: 1.0.0 > > > Reviewing YUNIKORN-1105 I found a bug in the mock scheduler… > Looking through the changes I saw files being changed that I thought would > not require any changes. I ran the tests and they failed without the change. > I was wondering why we were seeing those failures. I ran the tests in the > debugger without the changes that I thought were unneeded and saw weird > things. > The problem is here: > {code:java} > func (fc *MockScheduler) addTask(appID string, taskID string, ask > *si.Resource){code} > Is hopelessly broken. The ask that gets passed in is completely ignored. That > means every task that was created always was interpreted as a > {{_PodQOSBestEffort_}} __ and got memory set to 1 which used to be 1M. Now > that we fixed things it gets set to 1,000,000 or the real 1M. > The breakage is triggered by the function in the resource code which does the > right thing: > {code:java} > func GetPodResource(pod *v1.Pod) (resource *si.Resource){code} > In the old setup as long as the memory for best effort (i.e. 1) was smaller > than the resource set for the task things would just pass without an issue. > Since 1 was the smallest possible it would always work. Accounting on nodes > etc was most likely way off but none of these tests checked that anyway. > This causes *all* tests that use resources within a Task using the mock > scheduler to not test the real thing, not even close. > It also hinders us from testing failure cases. We can never create a task > that does not fit on a node as an example unless the node is full. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-1125) remove unlimited node
Wilfred Spiegelenburg created YUNIKORN-1125: --- Summary: remove unlimited node Key: YUNIKORN-1125 URL: https://issues.apache.org/jira/browse/YUNIKORN-1125 Project: Apache YuniKorn Issue Type: Task Components: core - scheduler Reporter: Wilfred Spiegelenburg In YUNIKORN-791 we allowed an unlimited node to be registered in the core. This was used in the first plugin implementation. The current plugin implementation does not use the unlimited node at this point in time. There is also no expectation that it will need it in the future. After we are confident that the current plugin model will definitely not require the unlimited node we should remove the code. An unlimited node is not realistic. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-1124) Avoid passing empty nodeAttributes in UpdateNode request
[ https://issues.apache.org/jira/browse/YUNIKORN-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17507915#comment-17507915 ] Wilfred Spiegelenburg commented on YUNIKORN-1124: - The cases that we pass in an empty list of attributes is when we delete, re-instate or decommission a node. For those cases we really do not need any attributes. We just need to know the node for which we need to change the status. This is all linked to the fact that the shim does not set a node partition. We have YUNIKORN-802 open for this. Support for setting the partition to anything is not in the shim. That is why all nodes end up in the default partition. The core must have a partition, if it is missing the _default_ partition is set. The setting of the partition to the default is the root cause of YUNIKORN-1123. For the real update we never pass an empty list as we have at least a status to send. We *MUST* handle missing attributes in the message correctly on the core side. So we should initialise the map, if the map is nil, to set the partition. > Avoid passing empty nodeAttributes in UpdateNode request > > > Key: YUNIKORN-1124 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1124 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Fix For: 1.0.0 > > > YUNIKORN-1123 had fixed YUNIKORN-1090 by passing empty nodeAttributes in > UpdateNode request. But it is better to handle this empty checks in core > itself and avoid passing empty attributes map from shim side. In addition, > need to assess the following: > > # Does core uses OccupiedResource and labels attributes being sent as part > of Create NodeRequest from shim? > # By any chance, Does shim need to pass yunikorn.apache.org/nodeType > attribute to core? Anyway, need to make it as constant -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-1118) Config validation should soft-succeed if yunikorn is not reachable
[ https://issues.apache.org/jira/browse/YUNIKORN-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-1118: - Labels: pull-request-available (was: ) > Config validation should soft-succeed if yunikorn is not reachable > -- > > Key: YUNIKORN-1118 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1118 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > > Currently, the admission controller fails to validate new configmaps if > yunikorn is not running or not reachable. We should allow updates in this > case since otherwise we may not be able to get YuniKorn running again. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-1121) MockScheduler addTask ignores resource settings
[ https://issues.apache.org/jira/browse/YUNIKORN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-1121: - Labels: newbie pull-request-available (was: newbie) > MockScheduler addTask ignores resource settings > --- > > Key: YUNIKORN-1121 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1121 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Craig Condit >Priority: Major > Labels: newbie, pull-request-available > > Reviewing YUNIKORN-1105 I found a bug in the mock scheduler… > Looking through the changes I saw files being changed that I thought would > not require any changes. I ran the tests and they failed without the change. > I was wondering why we were seeing those failures. I ran the tests in the > debugger without the changes that I thought were unneeded and saw weird > things. > The problem is here: > {code:java} > func (fc *MockScheduler) addTask(appID string, taskID string, ask > *si.Resource){code} > Is hopelessly broken. The ask that gets passed in is completely ignored. That > means every task that was created always was interpreted as a > {{_PodQOSBestEffort_}} __ and got memory set to 1 which used to be 1M. Now > that we fixed things it gets set to 1,000,000 or the real 1M. > The breakage is triggered by the function in the resource code which does the > right thing: > {code:java} > func GetPodResource(pod *v1.Pod) (resource *si.Resource){code} > In the old setup as long as the memory for best effort (i.e. 1) was smaller > than the resource set for the task things would just pass without an issue. > Since 1 was the smallest possible it would always work. Accounting on nodes > etc was most likely way off but none of these tests checked that anyway. > This causes *all* tests that use resources within a Task using the mock > scheduler to not test the real thing, not even close. > It also hinders us from testing failure cases. We can never create a task > that does not fit on a node as an example unless the node is full. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-1124) Avoid passing empty nodeAttributes in UpdateNode request
Manikandan R created YUNIKORN-1124: -- Summary: Avoid passing empty nodeAttributes in UpdateNode request Key: YUNIKORN-1124 URL: https://issues.apache.org/jira/browse/YUNIKORN-1124 Project: Apache YuniKorn Issue Type: Improvement Reporter: Manikandan R Assignee: Manikandan R Fix For: 1.0.0 YUNIKORN-1123 had fixed YUNIKORN-1090 by passing empty nodeAttributes in UpdateNode request. But it is better to handle this empty checks in core itself and avoid passing empty attributes map from shim side. In addition, need to assess the following: # Does core uses OccupiedResource and labels attributes being sent as part of Create NodeRequest from shim? # By any chance, Does shim need to pass yunikorn.apache.org/nodeType attribute to core? Anyway, need to make it as constant -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Assigned] (YUNIKORN-1121) MockScheduler addTask ignores resource settings
[ https://issues.apache.org/jira/browse/YUNIKORN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit reassigned YUNIKORN-1121: -- Assignee: Craig Condit > MockScheduler addTask ignores resource settings > --- > > Key: YUNIKORN-1121 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1121 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Craig Condit >Priority: Major > Labels: newbie > > Reviewing YUNIKORN-1105 I found a bug in the mock scheduler… > Looking through the changes I saw files being changed that I thought would > not require any changes. I ran the tests and they failed without the change. > I was wondering why we were seeing those failures. I ran the tests in the > debugger without the changes that I thought were unneeded and saw weird > things. > The problem is here: > {code:java} > func (fc *MockScheduler) addTask(appID string, taskID string, ask > *si.Resource){code} > Is hopelessly broken. The ask that gets passed in is completely ignored. That > means every task that was created always was interpreted as a > {{_PodQOSBestEffort_}} __ and got memory set to 1 which used to be 1M. Now > that we fixed things it gets set to 1,000,000 or the real 1M. > The breakage is triggered by the function in the resource code which does the > right thing: > {code:java} > func GetPodResource(pod *v1.Pod) (resource *si.Resource){code} > In the old setup as long as the memory for best effort (i.e. 1) was smaller > than the resource set for the task things would just pass without an issue. > Since 1 was the smallest possible it would always work. Accounting on nodes > etc was most likely way off but none of these tests checked that anyway. > This causes *all* tests that use resources within a Task using the mock > scheduler to not test the real thing, not even close. > It also hinders us from testing failure cases. We can never create a task > that does not fit on a node as an example unless the node is full. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-1107) Make health check occur in the background
[ https://issues.apache.org/jira/browse/YUNIKORN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17507689#comment-17507689 ] Craig Condit commented on YUNIKORN-1107: [~lowc1012], on a large cluster, the health check can take a considerable amount of time as it has to walk all the internal data structures, acquiring locks along the way that can block scheduler progress. An attacker would only need to spam lots of health check requests in a short period of time to essentially block the scheduler from making forward progress. We really only need to run the check maybe every 30-60 seconds. The liveness probe doesn't really make sense for YuniKorn, as if the service is running, it is "live". The health check, in part because it needs to acquire and release many locks, can sometimes report incorrect information depending upon the timing of operations. It also may report issues that are really more relevant for the K8s cluster health as a whole and do not indicate a problem with YK itself. This is useful for diagnostics, but is not a reliable indicator that YK should be terminated and restarted. > Make health check occur in the background > - > > Key: YUNIKORN-1107 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1107 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler >Reporter: Craig Condit >Assignee: Ryan Lo >Priority: Major > > Currently, the health check endpoint in the REST API performs a lengthy > process that could be used as a denial-of-service vector. We should schedule > the health check in the background periodically, and have the REST API simply > report the results of the latest check. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-1123) UpdateNode may cause the scheduler to crash
[ https://issues.apache.org/jira/browse/YUNIKORN-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoran Yu resolved YUNIKORN-1123. -- Resolution: Fixed > UpdateNode may cause the scheduler to crash > --- > > Key: YUNIKORN-1123 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1123 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Chaoran Yu >Assignee: Manikandan R >Priority: Critical > Labels: pull-request-available > > [https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/rmproxy/rmproxy.go#L369] > may cause the scheduler to crash because the node.Attributes map could be > uninitialized. > Example: > > {code:java} > 2022-03-16T05:22:46.077Z INFO cache/nodes.go:216 report updated nodes to > scheduler {"request": > "nodes: action:DECOMISSION > rmID:\"mycluster\" "} > panic: assignment to entry in nil map > goroutine 395199 [running]: > github.com/apache/incubator-yunikorn-core/pkg/rmproxy.(*RMProxy).UpdateNode.func1(0xc02d9fa320, > 0xc00030) > > /go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20220221055154-ff851af3b358/pkg/rmproxy/rmproxy.go:368 > +0x11d > created by > github.com/apache/incubator-yunikorn-core/pkg/rmproxy.(*RMProxy).UpdateNode > > /go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20220221055154-ff851af3b358/pkg/rmproxy/rmproxy.go:364 > +0x7a > {code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-208) Increase unit test coverage for webservice.go
[ https://issues.apache.org/jira/browse/YUNIKORN-208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ting Yao,Huang closed YUNIKORN-208. --- Resolution: Won't Fix > Increase unit test coverage for webservice.go > - > > Key: YUNIKORN-208 > URL: https://issues.apache.org/jira/browse/YUNIKORN-208 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: webapp >Reporter: Weiwei Yang >Assignee: Ting Yao,Huang >Priority: Major > > The webapp package coverage is less than 20%. We need to add more unit tests > for this. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-1122) Move constants to scheduler interface
[ https://issues.apache.org/jira/browse/YUNIKORN-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17507471#comment-17507471 ] Peter Bacsko commented on YUNIKORN-1122: One more thought from me: there is a attribute which is defined separately in the core and shim, namely the "ready" setting, see: https://github.com/apache/incubator-yunikorn-core/blob/f8f4c4bb9f0323697e28ac83610fd14d1dcc6f1a/pkg/scheduler/objects/node.go#L36 https://github.com/apache/incubator-yunikorn-k8shim/blob/af2fb201048bb9798501da4d62517098a91e7b41/pkg/common/constants/constants.go#L26 This could be moved to the SI, too. > Move constants to scheduler interface > - > > Key: YUNIKORN-1122 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1122 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - common, scheduler-interface, shim - kubernetes >Reporter: Weiwei Yang >Assignee: TingYao Huang >Priority: Major > > While reviewing YUNIKORN-1103, I found there are quite some constants are > still defined in shim/core repo. Since we have the ability to define > constants in SI, we should move all COMMON constants to SI. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-1123) UpdateNode may cause the scheduler to crash
[ https://issues.apache.org/jira/browse/YUNIKORN-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-1123: - Labels: pull-request-available (was: ) > UpdateNode may cause the scheduler to crash > --- > > Key: YUNIKORN-1123 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1123 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Chaoran Yu >Assignee: Manikandan R >Priority: Critical > Labels: pull-request-available > > [https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/rmproxy/rmproxy.go#L369] > may cause the scheduler to crash because the node.Attributes map could be > uninitialized. > Example: > > {code:java} > 2022-03-16T05:22:46.077Z INFO cache/nodes.go:216 report updated nodes to > scheduler {"request": > "nodes: action:DECOMISSION > rmID:\"mycluster\" "} > panic: assignment to entry in nil map > goroutine 395199 [running]: > github.com/apache/incubator-yunikorn-core/pkg/rmproxy.(*RMProxy).UpdateNode.func1(0xc02d9fa320, > 0xc00030) > > /go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20220221055154-ff851af3b358/pkg/rmproxy/rmproxy.go:368 > +0x11d > created by > github.com/apache/incubator-yunikorn-core/pkg/rmproxy.(*RMProxy).UpdateNode > > /go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20220221055154-ff851af3b358/pkg/rmproxy/rmproxy.go:364 > +0x7a > {code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-208) Increase unit test coverage for webservice.go
[ https://issues.apache.org/jira/browse/YUNIKORN-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17507430#comment-17507430 ] Ting Yao,Huang commented on YUNIKORN-208: - Just for unit test. Since we got 78% of coverage, and I don't think we need some smoke test, I think we can close this Jira with "won't fix". > Increase unit test coverage for webservice.go > - > > Key: YUNIKORN-208 > URL: https://issues.apache.org/jira/browse/YUNIKORN-208 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: webapp >Reporter: Weiwei Yang >Assignee: Ting Yao,Huang >Priority: Major > > The webapp package coverage is less than 20%. We need to add more unit tests > for this. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Assigned] (YUNIKORN-1123) UpdateNode may cause the scheduler to crash
[ https://issues.apache.org/jira/browse/YUNIKORN-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R reassigned YUNIKORN-1123: -- Assignee: Manikandan R > UpdateNode may cause the scheduler to crash > --- > > Key: YUNIKORN-1123 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1123 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Chaoran Yu >Assignee: Manikandan R >Priority: Critical > > [https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/rmproxy/rmproxy.go#L369] > may cause the scheduler to crash because the node.Attributes map could be > uninitialized. > Example: > > {code:java} > 2022-03-16T05:22:46.077Z INFO cache/nodes.go:216 report updated nodes to > scheduler {"request": > "nodes: action:DECOMISSION > rmID:\"mycluster\" "} > panic: assignment to entry in nil map > goroutine 395199 [running]: > github.com/apache/incubator-yunikorn-core/pkg/rmproxy.(*RMProxy).UpdateNode.func1(0xc02d9fa320, > 0xc00030) > > /go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20220221055154-ff851af3b358/pkg/rmproxy/rmproxy.go:368 > +0x11d > created by > github.com/apache/incubator-yunikorn-core/pkg/rmproxy.(*RMProxy).UpdateNode > > /go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20220221055154-ff851af3b358/pkg/rmproxy/rmproxy.go:364 > +0x7a > {code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-1107) Make health check occur in the background
[ https://issues.apache.org/jira/browse/YUNIKORN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17507364#comment-17507364 ] Ryan Lo commented on YUNIKORN-1107: --- Hi [~ccondit] , I was wondering the reason why we should implement the health check service with YK scheduler in the background. We have tried the k8s livenessProbe, but something happened to cause scheduler's shutdown. ([https://github.com/apache/incubator-yunikorn-k8shim/pull/340/files |https://github.com/apache/incubator-yunikorn-k8shim/pull/340/files)]) 1. Do you have any clue about the shutdown under your investigation? 2. I don't really understand why the health check endpoint could be a denial-of-service vector. > Make health check occur in the background > - > > Key: YUNIKORN-1107 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1107 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler >Reporter: Craig Condit >Assignee: Ryan Lo >Priority: Major > > Currently, the health check endpoint in the REST API performs a lengthy > process that could be used as a denial-of-service vector. We should schedule > the health check in the background periodically, and have the REST API simply > report the results of the latest check. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org