[jira] [Resolved] (YUNIKORN-638) Make placeholder image configurable

2022-03-16 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-638.

Fix Version/s: 1.0.0
   Resolution: Fixed

both repos committed

> Make placeholder image configurable
> ---
>
> Key: YUNIKORN-638
> URL: https://issues.apache.org/jira/browse/YUNIKORN-638
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: shim - kubernetes
>Reporter: Chaoran Yu
>Assignee: Peter Bacsko
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> The placeholder image is currently hard-coded as a constant at 
> https://github.com/apache/incubator-yunikorn-k8shim/blob/v0.10.0/pkg/common/constants/constants.go#L55.
>  In many sectors and enterprises, it's common to have restricted internet 
> access. When replacing the placeholder image with something else, the entire 
> k8shim image also needs to be rebuilt. It's more inconvenient when different 
> deployment environments (dev/test, staging and prod) can't access images in 
> another environment. 
> It would be better if the placeholder image can be configured in the Helm 
> chart: 
> https://github.com/apache/incubator-yunikorn-release/blob/master/helm-charts/yunikorn/values.yaml.
>  That would make CI/CD easier too



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-951) Add perf-tool description into benchmarking tutorial page

2022-03-16 Thread Weiwei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved YUNIKORN-951.
--
Fix Version/s: 1.0.0
   Resolution: Fixed

> Add perf-tool description into benchmarking tutorial page
> -
>
> Key: YUNIKORN-951
> URL: https://issues.apache.org/jira/browse/YUNIKORN-951
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: release, website
>Reporter: Chen Yu Teng
>Assignee: Chen Yu Teng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Describe performance tool and how to use this.
> Update perf tools doc into yunikorn 
> website([https://yunikorn.apache.org/docs/performance/performance_tutorial])
> Excepted context:
>  #  Cases setting in conf.yaml
>  ** Describe perf cases with default parameters according to conf.yaml 
> context  
>  ** Parameters description
>  #  How to start test
>  ** commands 
>  #  Meaning of outputs.
>  ** Explain what diagrams will produce according to default conf.yaml



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-1124) Avoid passing empty nodeAttributes in UpdateNode request

2022-03-16 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-1124:

Issue Type: Bug  (was: Improvement)

> Avoid passing empty nodeAttributes in UpdateNode request
> 
>
> Key: YUNIKORN-1124
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1124
> Project: Apache YuniKorn
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> YUNIKORN-1123 had fixed YUNIKORN-1090 by passing empty nodeAttributes in 
> UpdateNode request. But it is better to handle this empty checks in core 
> itself and avoid passing empty attributes map from shim side. In addition, 
> need to assess the following:
>  
>  # Does core uses OccupiedResource and labels attributes being sent as part 
> of Create NodeRequest from shim?
>  # By any chance, Does shim need to pass yunikorn.apache.org/nodeType 
> attribute to core? Anyway, need to make it as constant



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-1124) Avoid passing empty nodeAttributes in UpdateNode request

2022-03-16 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-1124:

Target Version: 1.0.0

> Avoid passing empty nodeAttributes in UpdateNode request
> 
>
> Key: YUNIKORN-1124
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1124
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> YUNIKORN-1123 had fixed YUNIKORN-1090 by passing empty nodeAttributes in 
> UpdateNode request. But it is better to handle this empty checks in core 
> itself and avoid passing empty attributes map from shim side. In addition, 
> need to assess the following:
>  
>  # Does core uses OccupiedResource and labels attributes being sent as part 
> of Create NodeRequest from shim?
>  # By any chance, Does shim need to pass yunikorn.apache.org/nodeType 
> attribute to core? Anyway, need to make it as constant



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-1124) Avoid passing empty nodeAttributes in UpdateNode request

2022-03-16 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-1124:

Fix Version/s: (was: 1.0.0)

> Avoid passing empty nodeAttributes in UpdateNode request
> 
>
> Key: YUNIKORN-1124
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1124
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> YUNIKORN-1123 had fixed YUNIKORN-1090 by passing empty nodeAttributes in 
> UpdateNode request. But it is better to handle this empty checks in core 
> itself and avoid passing empty attributes map from shim side. In addition, 
> need to assess the following:
>  
>  # Does core uses OccupiedResource and labels attributes being sent as part 
> of Create NodeRequest from shim?
>  # By any chance, Does shim need to pass yunikorn.apache.org/nodeType 
> attribute to core? Anyway, need to make it as constant



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-1118) Config validation should soft-succeed if yunikorn is not reachable

2022-03-16 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-1118.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

config updates soft-succeed if the scheduler REST API is not reachable or 
returns garbage

> Config validation should soft-succeed if yunikorn is not reachable
> --
>
> Key: YUNIKORN-1118
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1118
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Currently, the admission controller fails to validate new configmaps if 
> yunikorn is not running or not reachable. We should allow updates in this 
> case since otherwise we may not be able to get YuniKorn running again.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-1121) MockScheduler addTask ignores resource settings

2022-03-16 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-1121.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

change committed: one container will be added to the task with the resources 
set to the ask

> MockScheduler addTask ignores resource settings
> ---
>
> Key: YUNIKORN-1121
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1121
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Wilfred Spiegelenburg
>Assignee: Craig Condit
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 1.0.0
>
>
> Reviewing YUNIKORN-1105 I found a bug in the mock scheduler…
> Looking through the changes I saw files being changed that I thought would 
> not require any changes. I ran the tests and they failed without the change. 
> I was wondering why we were seeing those failures. I ran the tests in the 
> debugger without the changes that I thought were unneeded and saw weird 
> things.
> The problem is here:
> {code:java}
> func (fc *MockScheduler) addTask(appID string, taskID string, ask 
> *si.Resource){code}
> Is hopelessly broken. The ask that gets passed in is completely ignored. That 
> means every task that was created always was interpreted as a 
> {{_PodQOSBestEffort_}} __ and got memory set to 1 which used to be 1M. Now 
> that we fixed things it gets set to 1,000,000 or the real 1M.
> The breakage is triggered by the function in the resource code which does the 
> right thing:
> {code:java}
> func GetPodResource(pod *v1.Pod) (resource *si.Resource){code}
> In the old setup as long as the memory for best effort (i.e. 1) was smaller 
> than the resource set for the task things would just pass without an issue. 
> Since 1 was the smallest possible it would always work. Accounting on nodes 
> etc was most likely way off but none of these tests checked that anyway.
> This causes *all* tests that use resources within a Task using the mock 
> scheduler to not test the real thing, not even close.
> It also hinders us from testing failure cases. We can never create a task 
> that does not fit on a node as an example unless the node is full.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-1125) remove unlimited node

2022-03-16 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-1125:
---

 Summary: remove unlimited node
 Key: YUNIKORN-1125
 URL: https://issues.apache.org/jira/browse/YUNIKORN-1125
 Project: Apache YuniKorn
  Issue Type: Task
  Components: core - scheduler
Reporter: Wilfred Spiegelenburg


In YUNIKORN-791 we allowed an unlimited node to be registered in the core. This 
was used in the first plugin implementation. The current plugin implementation 
does not use the unlimited node at this point in time. There is also no 
expectation that it will need it in the future.

After we are confident that the current plugin model will definitely not 
require the unlimited node we should remove the code. An unlimited node is not 
realistic.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-1124) Avoid passing empty nodeAttributes in UpdateNode request

2022-03-16 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17507915#comment-17507915
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-1124:
-

The cases that we pass in an empty list of attributes is when we delete, 
re-instate or decommission a node. For those cases we really do not need any 
attributes. We just need to know the node for which we need to change the 
status.

This is all linked to the fact that the shim does not set a node partition. We 
have YUNIKORN-802 open for this. Support for setting the partition to anything 
is not in the shim. That is why all nodes end up in the default partition. The 
core must have a partition, if it is missing the _default_ partition is set. 
The setting of the partition to the default is the root cause of YUNIKORN-1123.

For the real update we never pass an empty list as we have at least a status to 
send.

We *MUST* handle missing attributes in the message correctly on the core side. 
So we should initialise the map, if the map is nil, to set the partition.

> Avoid passing empty nodeAttributes in UpdateNode request
> 
>
> Key: YUNIKORN-1124
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1124
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Fix For: 1.0.0
>
>
> YUNIKORN-1123 had fixed YUNIKORN-1090 by passing empty nodeAttributes in 
> UpdateNode request. But it is better to handle this empty checks in core 
> itself and avoid passing empty attributes map from shim side. In addition, 
> need to assess the following:
>  
>  # Does core uses OccupiedResource and labels attributes being sent as part 
> of Create NodeRequest from shim?
>  # By any chance, Does shim need to pass yunikorn.apache.org/nodeType 
> attribute to core? Anyway, need to make it as constant



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-1118) Config validation should soft-succeed if yunikorn is not reachable

2022-03-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-1118:
-
Labels: pull-request-available  (was: )

> Config validation should soft-succeed if yunikorn is not reachable
> --
>
> Key: YUNIKORN-1118
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1118
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
>
> Currently, the admission controller fails to validate new configmaps if 
> yunikorn is not running or not reachable. We should allow updates in this 
> case since otherwise we may not be able to get YuniKorn running again.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-1121) MockScheduler addTask ignores resource settings

2022-03-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-1121:
-
Labels: newbie pull-request-available  (was: newbie)

> MockScheduler addTask ignores resource settings
> ---
>
> Key: YUNIKORN-1121
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1121
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Wilfred Spiegelenburg
>Assignee: Craig Condit
>Priority: Major
>  Labels: newbie, pull-request-available
>
> Reviewing YUNIKORN-1105 I found a bug in the mock scheduler…
> Looking through the changes I saw files being changed that I thought would 
> not require any changes. I ran the tests and they failed without the change. 
> I was wondering why we were seeing those failures. I ran the tests in the 
> debugger without the changes that I thought were unneeded and saw weird 
> things.
> The problem is here:
> {code:java}
> func (fc *MockScheduler) addTask(appID string, taskID string, ask 
> *si.Resource){code}
> Is hopelessly broken. The ask that gets passed in is completely ignored. That 
> means every task that was created always was interpreted as a 
> {{_PodQOSBestEffort_}} __ and got memory set to 1 which used to be 1M. Now 
> that we fixed things it gets set to 1,000,000 or the real 1M.
> The breakage is triggered by the function in the resource code which does the 
> right thing:
> {code:java}
> func GetPodResource(pod *v1.Pod) (resource *si.Resource){code}
> In the old setup as long as the memory for best effort (i.e. 1) was smaller 
> than the resource set for the task things would just pass without an issue. 
> Since 1 was the smallest possible it would always work. Accounting on nodes 
> etc was most likely way off but none of these tests checked that anyway.
> This causes *all* tests that use resources within a Task using the mock 
> scheduler to not test the real thing, not even close.
> It also hinders us from testing failure cases. We can never create a task 
> that does not fit on a node as an example unless the node is full.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-1124) Avoid passing empty nodeAttributes in UpdateNode request

2022-03-16 Thread Manikandan R (Jira)
Manikandan R created YUNIKORN-1124:
--

 Summary: Avoid passing empty nodeAttributes in UpdateNode request
 Key: YUNIKORN-1124
 URL: https://issues.apache.org/jira/browse/YUNIKORN-1124
 Project: Apache YuniKorn
  Issue Type: Improvement
Reporter: Manikandan R
Assignee: Manikandan R
 Fix For: 1.0.0


YUNIKORN-1123 had fixed YUNIKORN-1090 by passing empty nodeAttributes in 
UpdateNode request. But it is better to handle this empty checks in core itself 
and avoid passing empty attributes map from shim side. In addition, need to 
assess the following:

 
 # Does core uses OccupiedResource and labels attributes being sent as part of 
Create NodeRequest from shim?
 # By any chance, Does shim need to pass yunikorn.apache.org/nodeType attribute 
to core? Anyway, need to make it as constant



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Assigned] (YUNIKORN-1121) MockScheduler addTask ignores resource settings

2022-03-16 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit reassigned YUNIKORN-1121:
--

Assignee: Craig Condit

> MockScheduler addTask ignores resource settings
> ---
>
> Key: YUNIKORN-1121
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1121
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Wilfred Spiegelenburg
>Assignee: Craig Condit
>Priority: Major
>  Labels: newbie
>
> Reviewing YUNIKORN-1105 I found a bug in the mock scheduler…
> Looking through the changes I saw files being changed that I thought would 
> not require any changes. I ran the tests and they failed without the change. 
> I was wondering why we were seeing those failures. I ran the tests in the 
> debugger without the changes that I thought were unneeded and saw weird 
> things.
> The problem is here:
> {code:java}
> func (fc *MockScheduler) addTask(appID string, taskID string, ask 
> *si.Resource){code}
> Is hopelessly broken. The ask that gets passed in is completely ignored. That 
> means every task that was created always was interpreted as a 
> {{_PodQOSBestEffort_}} __ and got memory set to 1 which used to be 1M. Now 
> that we fixed things it gets set to 1,000,000 or the real 1M.
> The breakage is triggered by the function in the resource code which does the 
> right thing:
> {code:java}
> func GetPodResource(pod *v1.Pod) (resource *si.Resource){code}
> In the old setup as long as the memory for best effort (i.e. 1) was smaller 
> than the resource set for the task things would just pass without an issue. 
> Since 1 was the smallest possible it would always work. Accounting on nodes 
> etc was most likely way off but none of these tests checked that anyway.
> This causes *all* tests that use resources within a Task using the mock 
> scheduler to not test the real thing, not even close.
> It also hinders us from testing failure cases. We can never create a task 
> that does not fit on a node as an example unless the node is full.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-1107) Make health check occur in the background

2022-03-16 Thread Craig Condit (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17507689#comment-17507689
 ] 

Craig Condit commented on YUNIKORN-1107:


[~lowc1012], on a large cluster, the health check can take a considerable 
amount of time as it has to walk all the internal data structures, acquiring 
locks along the way that can block scheduler progress. An attacker would only 
need to spam lots of health check requests in a short period of time to 
essentially block the scheduler from making forward progress. We really only 
need to run the check maybe every 30-60 seconds.

The liveness probe doesn't really make sense for YuniKorn, as if the service is 
running, it is "live". The health check, in part because it needs to acquire 
and release many locks, can sometimes report incorrect information depending 
upon the timing of operations. It also may report issues that are really more 
relevant for the K8s cluster health as a whole and do not indicate a problem 
with YK itself. This is useful for diagnostics, but is not a reliable indicator 
that YK should be terminated and restarted.



> Make health check occur in the background
> -
>
> Key: YUNIKORN-1107
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1107
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Craig Condit
>Assignee: Ryan Lo
>Priority: Major
>
> Currently, the health check endpoint in the REST API performs a lengthy 
> process that could be used as a denial-of-service vector. We should schedule 
> the health check in the background periodically, and have the REST API simply 
> report the results of the latest check.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-1123) UpdateNode may cause the scheduler to crash

2022-03-16 Thread Chaoran Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoran Yu resolved YUNIKORN-1123.
--
Resolution: Fixed

> UpdateNode may cause the scheduler to crash
> ---
>
> Key: YUNIKORN-1123
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1123
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Chaoran Yu
>Assignee: Manikandan R
>Priority: Critical
>  Labels: pull-request-available
>
> [https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/rmproxy/rmproxy.go#L369]
>  may cause the scheduler to crash because the node.Attributes map could be 
> uninitialized. 
> Example:
>  
> {code:java}
> 2022-03-16T05:22:46.077Z INFO cache/nodes.go:216 report updated nodes to 
> scheduler {"request": 
> "nodes: action:DECOMISSION > rmID:\"mycluster\" "}
> panic: assignment to entry in nil map
> goroutine 395199 [running]:
> github.com/apache/incubator-yunikorn-core/pkg/rmproxy.(*RMProxy).UpdateNode.func1(0xc02d9fa320,
>  0xc00030)
>  
> /go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20220221055154-ff851af3b358/pkg/rmproxy/rmproxy.go:368
>  +0x11d
> created by 
> github.com/apache/incubator-yunikorn-core/pkg/rmproxy.(*RMProxy).UpdateNode
>  
> /go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20220221055154-ff851af3b358/pkg/rmproxy/rmproxy.go:364
>  +0x7a
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-208) Increase unit test coverage for webservice.go

2022-03-16 Thread Ting Yao,Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ting Yao,Huang closed YUNIKORN-208.
---
Resolution: Won't Fix

> Increase unit test coverage for webservice.go
> -
>
> Key: YUNIKORN-208
> URL: https://issues.apache.org/jira/browse/YUNIKORN-208
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: webapp
>Reporter: Weiwei Yang
>Assignee: Ting Yao,Huang
>Priority: Major
>
> The webapp package coverage is less than 20%. We need to add more unit tests 
> for this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-1122) Move constants to scheduler interface

2022-03-16 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17507471#comment-17507471
 ] 

Peter Bacsko commented on YUNIKORN-1122:


One more thought from me: there is a attribute which is defined separately in 
the core and shim, namely the "ready" setting, see:

https://github.com/apache/incubator-yunikorn-core/blob/f8f4c4bb9f0323697e28ac83610fd14d1dcc6f1a/pkg/scheduler/objects/node.go#L36
https://github.com/apache/incubator-yunikorn-k8shim/blob/af2fb201048bb9798501da4d62517098a91e7b41/pkg/common/constants/constants.go#L26

This could be moved to the SI, too.

> Move constants to scheduler interface
> -
>
> Key: YUNIKORN-1122
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1122
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - common, scheduler-interface, shim - kubernetes
>Reporter: Weiwei Yang
>Assignee: TingYao Huang
>Priority: Major
>
> While reviewing YUNIKORN-1103, I found there are quite some constants are 
> still defined in shim/core repo. Since we have the ability to define 
> constants in SI, we should move all COMMON constants to SI.  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-1123) UpdateNode may cause the scheduler to crash

2022-03-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-1123:
-
Labels: pull-request-available  (was: )

> UpdateNode may cause the scheduler to crash
> ---
>
> Key: YUNIKORN-1123
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1123
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Chaoran Yu
>Assignee: Manikandan R
>Priority: Critical
>  Labels: pull-request-available
>
> [https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/rmproxy/rmproxy.go#L369]
>  may cause the scheduler to crash because the node.Attributes map could be 
> uninitialized. 
> Example:
>  
> {code:java}
> 2022-03-16T05:22:46.077Z INFO cache/nodes.go:216 report updated nodes to 
> scheduler {"request": 
> "nodes: action:DECOMISSION > rmID:\"mycluster\" "}
> panic: assignment to entry in nil map
> goroutine 395199 [running]:
> github.com/apache/incubator-yunikorn-core/pkg/rmproxy.(*RMProxy).UpdateNode.func1(0xc02d9fa320,
>  0xc00030)
>  
> /go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20220221055154-ff851af3b358/pkg/rmproxy/rmproxy.go:368
>  +0x11d
> created by 
> github.com/apache/incubator-yunikorn-core/pkg/rmproxy.(*RMProxy).UpdateNode
>  
> /go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20220221055154-ff851af3b358/pkg/rmproxy/rmproxy.go:364
>  +0x7a
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-208) Increase unit test coverage for webservice.go

2022-03-16 Thread Ting Yao,Huang (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17507430#comment-17507430
 ] 

Ting Yao,Huang commented on YUNIKORN-208:
-

Just for unit test.

Since we got 78% of coverage, and I don't think we need some smoke test, I 
think we can close this Jira with "won't fix".

> Increase unit test coverage for webservice.go
> -
>
> Key: YUNIKORN-208
> URL: https://issues.apache.org/jira/browse/YUNIKORN-208
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: webapp
>Reporter: Weiwei Yang
>Assignee: Ting Yao,Huang
>Priority: Major
>
> The webapp package coverage is less than 20%. We need to add more unit tests 
> for this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Assigned] (YUNIKORN-1123) UpdateNode may cause the scheduler to crash

2022-03-16 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R reassigned YUNIKORN-1123:
--

Assignee: Manikandan R

> UpdateNode may cause the scheduler to crash
> ---
>
> Key: YUNIKORN-1123
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1123
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Chaoran Yu
>Assignee: Manikandan R
>Priority: Critical
>
> [https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/rmproxy/rmproxy.go#L369]
>  may cause the scheduler to crash because the node.Attributes map could be 
> uninitialized. 
> Example:
>  
> {code:java}
> 2022-03-16T05:22:46.077Z INFO cache/nodes.go:216 report updated nodes to 
> scheduler {"request": 
> "nodes: action:DECOMISSION > rmID:\"mycluster\" "}
> panic: assignment to entry in nil map
> goroutine 395199 [running]:
> github.com/apache/incubator-yunikorn-core/pkg/rmproxy.(*RMProxy).UpdateNode.func1(0xc02d9fa320,
>  0xc00030)
>  
> /go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20220221055154-ff851af3b358/pkg/rmproxy/rmproxy.go:368
>  +0x11d
> created by 
> github.com/apache/incubator-yunikorn-core/pkg/rmproxy.(*RMProxy).UpdateNode
>  
> /go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20220221055154-ff851af3b358/pkg/rmproxy/rmproxy.go:364
>  +0x7a
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-1107) Make health check occur in the background

2022-03-16 Thread Ryan Lo (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17507364#comment-17507364
 ] 

Ryan Lo commented on YUNIKORN-1107:
---

Hi [~ccondit] , I was wondering the reason why we should implement the health 
check service with YK scheduler in the background. We have tried the k8s 
livenessProbe, but something happened to cause scheduler's shutdown. 
([https://github.com/apache/incubator-yunikorn-k8shim/pull/340/files 
|https://github.com/apache/incubator-yunikorn-k8shim/pull/340/files)])

1. Do you have any clue about the shutdown under your investigation?
2. I don't really understand why the health check endpoint could be a 
denial-of-service vector.

> Make health check occur in the background
> -
>
> Key: YUNIKORN-1107
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1107
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Craig Condit
>Assignee: Ryan Lo
>Priority: Major
>
> Currently, the health check endpoint in the REST API performs a lengthy 
> process that could be used as a denial-of-service vector. We should schedule 
> the health check in the background periodically, and have the REST API simply 
> report the results of the latest check.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org