[jira] [Commented] (YUNIKORN-55) Add pod labels and annotations to allocation ask attributes
[ https://issues.apache.org/jira/browse/YUNIKORN-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097188#comment-17097188 ] Wilfred Spiegelenburg commented on YUNIKORN-55: --- As I commented in the PR just blindly adding all meta data from the pod on asks is I think not the right thing to do. Some make sense only on the app and like namespace are already there. Can we get a more detailed description of what we want, expect or need from the labels and annotations? I think filtering for the ask would be the right way forward: * specific labels from the pod on each ask, remove app level ones * all labels with namespace {{yunikorn.apapche.org}} * no annotations for now due to the structure and content > Add pod labels and annotations to allocation ask attributes > --- > > Key: YUNIKORN-55 > URL: https://issues.apache.org/jira/browse/YUNIKORN-55 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler, shim - kubernetes, webapp >Reporter: Weiwei Yang >Assignee: Adam Antal >Priority: Major > Labels: pull-request-available > Time Spent: 2h 38m > Remaining Estimate: 1h 28m > > In YUNIKORN-54, we simplify the way to generate application IDs. The side > effect is when we look at info from web UI, we lose some info about the pod > info such as pod name, namespace, etc. > A proper way to handle this is to get this info from pod and add labels, > annotations info to allocation ask as attributes, and then send to > scheduler-core. Also on web rest API, we need to display these attributes too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-122) Document admission controller requirements for deployment
Wilfred Spiegelenburg created YUNIKORN-122: -- Summary: Document admission controller requirements for deployment Key: YUNIKORN-122 URL: https://issues.apache.org/jira/browse/YUNIKORN-122 Project: Apache YuniKorn Issue Type: Bug Components: documentation, shim - kubernetes Reporter: Wilfred Spiegelenburg In YUNIKORN-121 we added the pod security policy {{hostNetwork: true}} to the admission controller deployment. We need to document and explain why it is needed and when it is needed. In YUNIKORN-80 we added a global toleration for all taints as we have affinity with the scheduler setup. None of this is documented. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-121) The admission controller should use hostNetwork
[ https://issues.apache.org/jira/browse/YUNIKORN-121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved YUNIKORN-121. -- Resolution: Fixed > The admission controller should use hostNetwork > --- > > Key: YUNIKORN-121 > URL: https://issues.apache.org/jira/browse/YUNIKORN-121 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Labels: pull-request-available > > When running on EKS, with custom network plugin enabled (e.g calico). Without > using hostNetwork, the API server will not be able to connect to the webhook. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-121) The admission controller should use hostNetwork
[ https://issues.apache.org/jira/browse/YUNIKORN-121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-121: Labels: pull-request-available (was: ) > The admission controller should use hostNetwork > --- > > Key: YUNIKORN-121 > URL: https://issues.apache.org/jira/browse/YUNIKORN-121 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Labels: pull-request-available > > When running on EKS, with custom network plugin enabled (e.g calico). Without > using hostNetwork, the API server will not be able to connect to the webhook. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-121) The admission controller should use hostNetwork
Weiwei Yang created YUNIKORN-121: Summary: The admission controller should use hostNetwork Key: YUNIKORN-121 URL: https://issues.apache.org/jira/browse/YUNIKORN-121 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Weiwei Yang Assignee: Weiwei Yang When running on EKS, with custom network plugin enabled (e.g calico). Without using hostNetwork, the API server will not be able to connect to the webhook. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-116) Typo in the rest api to get nodes info
[ https://issues.apache.org/jira/browse/YUNIKORN-116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved YUNIKORN-116. -- Fix Version/s: 0.9 Resolution: Fixed Hi [~kmarton] Thanks for the quick fix! > Typo in the rest api to get nodes info > -- > > Key: YUNIKORN-116 > URL: https://issues.apache.org/jira/browse/YUNIKORN-116 > Project: Apache YuniKorn > Issue Type: Bug > Components: webapp >Reporter: Weiwei Yang >Assignee: Kinga Marton >Priority: Trivial > Labels: pull-request-available > Fix For: 0.9 > > > There is a typo in the output of the nodes API: > {code} > [ > { > "partitionName": "[mycluster]default", > "nodesInfo": [ > { > "nodeID": "ip-10-116-72-66.us-west-2.compute.internal", > "hostName": "ip-10-116-72-66.us-west-2.compute.internal", > "*RackName*": "/rack-default", > "capacity": "[attachable-volumes-aws-ebs:25 > ephemeral-storage:94477937300 hugepages-1Gi:0 hugepages-2Mi:0 memory:7463 > pods:29 vcore:1900]", > "allocated": "[memory:3958 vcore:1600]", > "occupied": "[vcore:110]", > "available": "[attachable-volumes-aws-ebs:25 > ephemeral-storage:94477937300 hugepages-1Gi:0 hugepages-2Mi:0 memory:3505 > pods:29 vcore:190]", > "allocations": [ > ... > {code} > RackName -> rackName -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Assigned] (YUNIKORN-115) Fix flaky test cases from scheduler_reservation_test.go
[ https://issues.apache.org/jira/browse/YUNIKORN-115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kinga Marton reassigned YUNIKORN-115: - Assignee: Kinga Marton > Fix flaky test cases from scheduler_reservation_test.go > --- > > Key: YUNIKORN-115 > URL: https://issues.apache.org/jira/browse/YUNIKORN-115 > Project: Apache YuniKorn > Issue Type: Bug > Components: test - smoke >Reporter: Kinga Marton >Assignee: Kinga Marton >Priority: Critical > > There are some flay tests in scheduler_reservation_test.go, what are failing > intermittently both locally and triggered by a PR as well. Let's fix them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-120) Publish pod related events to the shim side
Adam Antal created YUNIKORN-120: --- Summary: Publish pod related events to the shim side Key: YUNIKORN-120 URL: https://issues.apache.org/jira/browse/YUNIKORN-120 Project: Apache YuniKorn Issue Type: Sub-task Components: core - cache, shim - kubernetes Reporter: Adam Antal Assignee: Adam Antal -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-119) Expose the event cache API
Adam Antal created YUNIKORN-119: --- Summary: Expose the event cache API Key: YUNIKORN-119 URL: https://issues.apache.org/jira/browse/YUNIKORN-119 Project: Apache YuniKorn Issue Type: Sub-task Reporter: Adam Antal Assignee: Adam Antal Let's create a REST API to query the events from the event cache. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-118) Register more granular events in the event cache
Adam Antal created YUNIKORN-118: --- Summary: Register more granular events in the event cache Key: YUNIKORN-118 URL: https://issues.apache.org/jira/browse/YUNIKORN-118 Project: Apache YuniKorn Issue Type: Sub-task Components: core - cache Reporter: Adam Antal Assignee: Adam Antal In YUNIKORN-117 we only stored last visited time for each object. We can store some more granular information instead of just the last time the scheduler visited it. (the event itself associated with the visit) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-117) Create event cache for queue and application events
Adam Antal created YUNIKORN-117: --- Summary: Create event cache for queue and application events Key: YUNIKORN-117 URL: https://issues.apache.org/jira/browse/YUNIKORN-117 Project: Apache YuniKorn Issue Type: Sub-task Components: core - cache, core - scheduler Reporter: Adam Antal Assignee: Adam Antal Create a simple preliminary implementation of the event cache of YUNIKORN-42. We have the following limited scope for this task: - implement it as a separate process from the scheduler (similar to {{PartitionManager}}) - only deal with queues and applications (the pods and nodes can be added later) - only store the apps last visited time from the scheduler - clean up those objects that haven't been visited in the last 24h Other cache implementations can be also considered. As a starting point, channels are a safe choice to have async communication with the scheduler without expecting bigger performance loss. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-42) Better to support POD events for YuniKorn to troubleshoot allocation failures
[ https://issues.apache.org/jira/browse/YUNIKORN-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096550#comment-17096550 ] Adam Antal commented on YUNIKORN-42: Had a discussion with [~wwei] and [~wilfreds] today, and here are some notes. First of all, the scope of this issue is huge, so I'll create subtasks to handle it more easily. The first milestone should be to have an event cache in core which only considers application and queue based events. Later we can think about pushing these events to the shim side, adding pod(AllocationAsk)/node based events as well and API to expose this information to the user - even UI support. What is also important to have this run in a separate goroutine (similar to {{PartitionManager}}) and be independent of the scheduler. We don't want to have performance degradation as a side-effect of better trackability of events. > Better to support POD events for YuniKorn to troubleshoot allocation failures > - > > Key: YUNIKORN-42 > URL: https://issues.apache.org/jira/browse/YUNIKORN-42 > Project: Apache YuniKorn > Issue Type: Task >Reporter: Wangda Tan >Assignee: Adam Antal >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Now it is tricky to do troubleshoot for pod allocation, we need better expose > this information to POD description. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-116) Typo in the rest api to get nodes info
[ https://issues.apache.org/jira/browse/YUNIKORN-116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-116: Labels: pull-request-available (was: ) > Typo in the rest api to get nodes info > -- > > Key: YUNIKORN-116 > URL: https://issues.apache.org/jira/browse/YUNIKORN-116 > Project: Apache YuniKorn > Issue Type: Bug > Components: webapp >Reporter: Weiwei Yang >Assignee: Kinga Marton >Priority: Trivial > Labels: pull-request-available > > There is a typo in the output of the nodes API: > {code} > [ > { > "partitionName": "[mycluster]default", > "nodesInfo": [ > { > "nodeID": "ip-10-116-72-66.us-west-2.compute.internal", > "hostName": "ip-10-116-72-66.us-west-2.compute.internal", > "*RackName*": "/rack-default", > "capacity": "[attachable-volumes-aws-ebs:25 > ephemeral-storage:94477937300 hugepages-1Gi:0 hugepages-2Mi:0 memory:7463 > pods:29 vcore:1900]", > "allocated": "[memory:3958 vcore:1600]", > "occupied": "[vcore:110]", > "available": "[attachable-volumes-aws-ebs:25 > ephemeral-storage:94477937300 hugepages-1Gi:0 hugepages-2Mi:0 memory:3505 > pods:29 vcore:190]", > "allocations": [ > ... > {code} > RackName -> rackName -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Assigned] (YUNIKORN-110) Configured queue capacity should not exceed configured max capacity
[ https://issues.apache.org/jira/browse/YUNIKORN-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kinga Marton reassigned YUNIKORN-110: - Assignee: Kinga Marton > Configured queue capacity should not exceed configured max capacity > --- > > Key: YUNIKORN-110 > URL: https://issues.apache.org/jira/browse/YUNIKORN-110 > Project: Apache YuniKorn > Issue Type: Bug >Reporter: Kinga Marton >Assignee: Kinga Marton >Priority: Major > Attachments: Screenshot 2020-04-23 at 16.24.13.png > > > When running some examples I saw in the UI that the configured capacity was > higher than the configured max capacity for the root queue. > I think this has no sense, since the max capacity for the root queue is the > same as the cluster capacity, an it is a hard limit, so having higher value > makes no sense. > As the documentation says there should be no guaranteed resource set for the > root queue: > https://github.com/apache/incubator-yunikorn-core/blob/master/docs/queue_config.md -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Assigned] (YUNIKORN-107) Display additional queue information
[ https://issues.apache.org/jira/browse/YUNIKORN-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kinga Marton reassigned YUNIKORN-107: - Assignee: Kinga Marton > Display additional queue information > > > Key: YUNIKORN-107 > URL: https://issues.apache.org/jira/browse/YUNIKORN-107 > Project: Apache YuniKorn > Issue Type: Improvement > Components: webapp >Reporter: Kinga Marton >Assignee: Kinga Marton >Priority: Major > > Right now we are displaying only a few things about a queue (capacity related > values and its state), but in the QueueInfo we have a lot of other values, > what we should > display:[https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/cache/queue_info.go#L45-L66] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Assigned] (YUNIKORN-107) Display additional queue information
[ https://issues.apache.org/jira/browse/YUNIKORN-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kinga Marton reassigned YUNIKORN-107: - Assignee: (was: Kinga Marton) > Display additional queue information > > > Key: YUNIKORN-107 > URL: https://issues.apache.org/jira/browse/YUNIKORN-107 > Project: Apache YuniKorn > Issue Type: Improvement > Components: webapp >Reporter: Kinga Marton >Priority: Major > > Right now we are displaying only a few things about a queue (capacity related > values and its state), but in the QueueInfo we have a lot of other values, > what we should > display:[https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/cache/queue_info.go#L45-L66] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-53) Use asserts for nil error checks in test code
[ https://issues.apache.org/jira/browse/YUNIKORN-53?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096458#comment-17096458 ] Adam Antal commented on YUNIKORN-53: Yes, sure. My patch is has been in progress, I will upload when I finish with it. > Use asserts for nil error checks in test code > - > > Key: YUNIKORN-53 > URL: https://issues.apache.org/jira/browse/YUNIKORN-53 > Project: Apache YuniKorn > Issue Type: Bug > Components: test - smoke, test - unit >Reporter: Wilfred Spiegelenburg >Assignee: Adam Antal >Priority: Minor > Labels: newbie, pull-request-available > > In a lot of tests we use the construct: > {code} > if err != nil { > t.Fatal("some text" > }{code} > We should replace that with the simple: > {code} > assert.NilError(t, err, "some text"){code} > that is part of the standard assert package we already use in the test code -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Comment Edited] (YUNIKORN-53) Use asserts for nil error checks in test code
[ https://issues.apache.org/jira/browse/YUNIKORN-53?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096458#comment-17096458 ] Adam Antal edited comment on YUNIKORN-53 at 4/30/20, 11:54 AM: --- Yes, sure. My patch has been in progress, I will upload when I finish with it. was (Author: adam.antal): Yes, sure. My patch is has been in progress, I will upload when I finish with it. > Use asserts for nil error checks in test code > - > > Key: YUNIKORN-53 > URL: https://issues.apache.org/jira/browse/YUNIKORN-53 > Project: Apache YuniKorn > Issue Type: Bug > Components: test - smoke, test - unit >Reporter: Wilfred Spiegelenburg >Assignee: Adam Antal >Priority: Minor > Labels: newbie, pull-request-available > > In a lot of tests we use the construct: > {code} > if err != nil { > t.Fatal("some text" > }{code} > We should replace that with the simple: > {code} > assert.NilError(t, err, "some text"){code} > that is part of the standard assert package we already use in the test code -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-93) Queue maximum capacity cleanup
[ https://issues.apache.org/jira/browse/YUNIKORN-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kinga Marton updated YUNIKORN-93: - Attachment: queuesResponse.json > Queue maximum capacity cleanup > -- > > Key: YUNIKORN-93 > URL: https://issues.apache.org/jira/browse/YUNIKORN-93 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Kinga Marton >Assignee: Kinga Marton >Priority: Major > Labels: pull-request-available > Attachments: queuesResponse.json > > > Right now the absolute used capacity is hardcoded to 20%. > The usage bar is rendered by this value, but currently, it is hardcoded. > Note, both capacity/max capacity could be 0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-93) Queue maximum capacity cleanup
[ https://issues.apache.org/jira/browse/YUNIKORN-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096379#comment-17096379 ] Kinga Marton commented on YUNIKORN-93: -- Sure, [~akhilpb]. I have attached a sample response to this Jira. > Queue maximum capacity cleanup > -- > > Key: YUNIKORN-93 > URL: https://issues.apache.org/jira/browse/YUNIKORN-93 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Kinga Marton >Assignee: Kinga Marton >Priority: Major > Labels: pull-request-available > Attachments: queuesResponse.json > > > Right now the absolute used capacity is hardcoded to 20%. > The usage bar is rendered by this value, but currently, it is hardcoded. > Note, both capacity/max capacity could be 0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org