[jira] [Commented] (YUNIKORN-941) split scheduler and admission controller deployment

2022-01-11 Thread Craig Condit (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17472951#comment-17472951
 ] 

Craig Condit commented on YUNIKORN-941:
---

Committed #346 to master for admission controller changes.

> split scheduler and admission controller deployment
> ---
>
> Key: YUNIKORN-941
> URL: https://issues.apache.org/jira/browse/YUNIKORN-941
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: shim - kubernetes
>Reporter: Kinga Marton
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Attachments: logs_322.zip
>
>
> To support proper YuniKorn upgrades and restarts we should move the admission 
> controller out of the scheduler deployment and make it a separate deployment.
> This could also allow the admission controller to be made high available and 
> allow simpler no down time upgrades possible. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-941) split scheduler and admission controller deployment

2022-01-11 Thread Craig Condit (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17472861#comment-17472861
 ] 

Craig Condit commented on YUNIKORN-941:
---

Committed #60 for helm chart changes, will commit #346 once e2e tests run 
successfully.

> split scheduler and admission controller deployment
> ---
>
> Key: YUNIKORN-941
> URL: https://issues.apache.org/jira/browse/YUNIKORN-941
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: shim - kubernetes
>Reporter: Kinga Marton
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Attachments: logs_322.zip
>
>
> To support proper YuniKorn upgrades and restarts we should move the admission 
> controller out of the scheduler deployment and make it a separate deployment.
> This could also allow the admission controller to be made high available and 
> allow simpler no down time upgrades possible. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-941) split scheduler and admission controller deployment

2022-01-10 Thread Craig Condit (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17472261#comment-17472261
 ] 

Craig Condit commented on YUNIKORN-941:
---

PR #331 opened for shim-side changes, and #346 for release (helm chart) changes.

> split scheduler and admission controller deployment
> ---
>
> Key: YUNIKORN-941
> URL: https://issues.apache.org/jira/browse/YUNIKORN-941
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: shim - kubernetes
>Reporter: Kinga Marton
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Attachments: logs_322.zip
>
>
> To support proper YuniKorn upgrades and restarts we should move the admission 
> controller out of the scheduler deployment and make it a separate deployment.
> This could also allow the admission controller to be made high available and 
> allow simpler no down time upgrades possible. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-941) split scheduler and admission controller deployment

2022-01-06 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17469824#comment-17469824
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-941:


The admission controller code has been updated since the draft of this change 
went in. I think it is better to finish the change from v1beta1 to the v1 
version via YUNIKORN-938. It can be handled separately and without making 
changes to the way we do the certs etc.

I have asked [~pbacsko] to look at that jira.

> split scheduler and admission controller deployment
> ---
>
> Key: YUNIKORN-941
> URL: https://issues.apache.org/jira/browse/YUNIKORN-941
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: shim - kubernetes
>Reporter: Kinga Marton
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Attachments: logs_322.zip
>
>
> To support proper YuniKorn upgrades and restarts we should move the admission 
> controller out of the scheduler deployment and make it a separate deployment.
> This could also allow the admission controller to be made high available and 
> allow simpler no down time upgrades possible. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-941) split scheduler and admission controller deployment

2021-12-17 Thread Craig Condit (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461723#comment-17461723
 ] 

Craig Condit commented on YUNIKORN-941:
---

PR #346 has been opened as an alternative approach with the admission 
controller doing its own cert management and webhook registration on startup. 
This avoids the race conditions, and also doesn't require an init container 
which simplifies the setup dramatically.

> split scheduler and admission controller deployment
> ---
>
> Key: YUNIKORN-941
> URL: https://issues.apache.org/jira/browse/YUNIKORN-941
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: shim - kubernetes
>Reporter: Kinga Marton
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Attachments: logs_322.zip
>
>
> To support proper YuniKorn upgrades and restarts we should move the admission 
> controller out of the scheduler deployment and make it a separate deployment.
> This could also allow the admission controller to be made high available and 
> allow simpler no down time upgrades possible. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-941) split scheduler and admission controller deployment

2021-12-10 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17457032#comment-17457032
 ] 

Peter Bacsko commented on YUNIKORN-941:
---

I think the commit needs to be reverted and we should start working on the 
replacement of {{admission_util.sh}} and leverage the {{initContainers}} 
approach.

> split scheduler and admission controller deployment
> ---
>
> Key: YUNIKORN-941
> URL: https://issues.apache.org/jira/browse/YUNIKORN-941
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Kinga Marton
>Assignee: Peter Bacsko
>Priority: Blocker
>  Labels: pull-request-available
> Attachments: logs_322.zip
>
>
> To support proper YuniKorn upgrades and restarts we should move the admission 
> controller out of the scheduler deployment and make it a separate deployment.
> This could also allow the admission controller to be made high available and 
> allow simpler no down time upgrades possible. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-941) split scheduler and admission controller deployment

2021-12-10 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17457031#comment-17457031
 ] 

Peter Bacsko commented on YUNIKORN-941:
---

[~wwei] as Kinga explained, she ran into some unexpected issues regarding 
secrets. This is what happens when k8s wants to start the adm. controller:

{noformat}
Events:
  Type Reason   Age  From   Message
   --       ---
  Normal   Scheduled5m4s default-scheduler  Successfully 
assigned yunikorn/yunikorn-admission-controller-5c46b58647-spxwk to yk8s-worker
  Warning  FailedMount  3m1s kubeletUnable to 
attach or mount volumes: unmounted volumes=[webhook-tls-certs], unattached 
volumes=[kube-api-access-55zht webhook-tls-certs]: timed out waiting for the 
condition
  Warning  FailedMount  54s (x10 over 5m4s)  kubelet
MountVolume.SetUp failed for volume "webhook-tls-certs" : secret 
"webhook-server-tls" not found
  Warning  FailedMount  47s  kubeletUnable to 
attach or mount volumes: unmounted volumes=[webhook-tls-certs], unattached 
volumes=[webhook-tls-certs kube-api-access-55zht]: timed out waiting for the 
condition
{noformat}

This is from 
https://github.com/apache/incubator-yunikorn-k8shim/runs/4440291100?check_suite_focus=true

We can no longer create the secrets in the {{postStart}} / {{exec}} section. 
See Kinga's comment 
[above|https://issues.apache.org/jira/browse/YUNIKORN-941?focusedCommentId=17455091=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17455091].

> split scheduler and admission controller deployment
> ---
>
> Key: YUNIKORN-941
> URL: https://issues.apache.org/jira/browse/YUNIKORN-941
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Kinga Marton
>Assignee: Peter Bacsko
>Priority: Blocker
>  Labels: pull-request-available
> Attachments: logs_322.zip
>
>
> To support proper YuniKorn upgrades and restarts we should move the admission 
> controller out of the scheduler deployment and make it a separate deployment.
> This could also allow the admission controller to be made high available and 
> allow simpler no down time upgrades possible. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-941) split scheduler and admission controller deployment

2021-12-09 Thread Weiwei Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17456681#comment-17456681
 ] 

Weiwei Yang commented on YUNIKORN-941:
--

PR https://github.com/apache/incubator-yunikorn-release/pull/50 caused the e2e 
failures, need more investigation on this. Attached the failure log in this 
JIRA. [~pbacsko], [~kmarton] please take a look. Thanks

> split scheduler and admission controller deployment
> ---
>
> Key: YUNIKORN-941
> URL: https://issues.apache.org/jira/browse/YUNIKORN-941
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Kinga Marton
>Assignee: Peter Bacsko
>Priority: Blocker
>  Labels: pull-request-available
> Attachments: logs_322.zip
>
>
> To support proper YuniKorn upgrades and restarts we should move the admission 
> controller out of the scheduler deployment and make it a separate deployment.
> This could also allow the admission controller to be made high available and 
> allow simpler no down time upgrades possible. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-941) split scheduler and admission controller deployment

2021-12-08 Thread Chaoran Yu (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17455372#comment-17455372
 ] 

Chaoran Yu commented on YUNIKORN-941:
-

[~kmarton] Thanks for digging into it. I second your proposal

> split scheduler and admission controller deployment
> ---
>
> Key: YUNIKORN-941
> URL: https://issues.apache.org/jira/browse/YUNIKORN-941
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Kinga Marton
>Assignee: Peter Bacsko
>Priority: Blocker
>  Labels: pull-request-available
>
> To support proper YuniKorn upgrades and restarts we should move the admission 
> controller out of the scheduler deployment and make it a separate deployment.
> This could also allow the admission controller to be made high available and 
> allow simpler no down time upgrades possible. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-941) split scheduler and admission controller deployment

2021-12-08 Thread Kinga Marton (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17455091#comment-17455091
 ] 

Kinga Marton commented on YUNIKORN-941:
---

[~yuchaoran], [~wilfreds] I suggest to move out this issue from the 0.12 
release. And remove the changes from the release repository from the 0.12 
branch after it will be created.

I am suggesting this because I found the root cause of the failing precommit: 
the secret is not created at the pint we want to mount it. The secret is 
created in the admission_util.sh script, what is running in a post start hook. 
And here we have a chicken and egg problem: 
 * the secret needs the TLS certs, which are creeated fron the admission 
controller code, so in the actual setup we cannot create the secret in an init 
container.

Instead of continuing to hack around the admission controller I suggest to 
remove the admission_util.sh script and use init containers for creating all 
the necessary certificates and secrets, but this is a bigger work. 

There is a good article about how we can create the admission controllers in a 
more elegant way than we are doing it now: 
[https://www.velotio.com/engineering-blog/managing-tls-certificate-for-kubernetes-admission-webhook]

> split scheduler and admission controller deployment
> ---
>
> Key: YUNIKORN-941
> URL: https://issues.apache.org/jira/browse/YUNIKORN-941
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Kinga Marton
>Assignee: Kinga Marton
>Priority: Blocker
>  Labels: pull-request-available
>
> To support proper YuniKorn upgrades and restarts we should move the admission 
> controller out of the scheduler deployment and make it a separate deployment.
> This could also allow the admission controller to be made high available and 
> allow simpler no down time upgrades possible. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-941) split scheduler and admission controller deployment

2021-12-03 Thread Kinga Marton (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452833#comment-17452833
 ] 

Kinga Marton commented on YUNIKORN-941:
---

Thank you [~yuchaoran2011] for the review in the release repository. Can you 
please check the shim side changes as well? This 2 are depending on each other.

[https://github.com/apache/incubator-yunikorn-k8shim/pull/331]

 

> split scheduler and admission controller deployment
> ---
>
> Key: YUNIKORN-941
> URL: https://issues.apache.org/jira/browse/YUNIKORN-941
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Kinga Marton
>Assignee: Kinga Marton
>Priority: Blocker
>  Labels: pull-request-available
>
> To support proper YuniKorn upgrades and restarts we should move the admission 
> controller out of the scheduler deployment and make it a separate deployment.
> This could also allow the admission controller to be made high available and 
> allow simpler no down time upgrades possible. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-941) split scheduler and admission controller deployment

2021-12-02 Thread Kinga Marton (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452463#comment-17452463
 ] 

Kinga Marton commented on YUNIKORN-941:
---

Some note on the newly created charts:
 * as [~yuchaoran2011] suggested, we will use a subchart for the admission 
controller. 
 * since during a YK upgrade we want to make sure that no pods will be handled 
by the default scheduler during the YK downtime, it is essentially to have the 
admission controlller running when the scheduler will be upgraded. By using 
seubcharts this is possible with the following steps:
 ** update the admission controller (helm upgrade will do the upgrade only if 
there are some chnges, so for this steps we need to make sure that there are no 
changes in the scheduler)
 ** after the admission controller is updated, we can update the helm 
deployment again and include the scheduler changes as well. Since helm will 
detect the admission controller it is already up to date, it won't touch it.
 * We need to do the upgrade in this two steps, because during a normal upgrade 
helm aggregates all the manifests into one and then it will sort them according 
to their type and alphabetically, but will not wait for the dependeies being 
installed first. See more details in the following Helm documentation: 
[https://helm.sh/docs/topics/charts/#operational-aspects-of-using-dependencies]

> split scheduler and admission controller deployment
> ---
>
> Key: YUNIKORN-941
> URL: https://issues.apache.org/jira/browse/YUNIKORN-941
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Kinga Marton
>Assignee: Kinga Marton
>Priority: Major
>  Labels: pull-request-available
>
> To support proper YuniKorn upgrades and restarts we should move the admission 
> controller out of the scheduler deployment and make it a separate deployment.
> This could also allow the admission controller to be made high available and 
> allow simpler no down time upgrades possible. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-941) split scheduler and admission controller deployment

2021-11-23 Thread Kinga Marton (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17448112#comment-17448112
 ] 

Kinga Marton commented on YUNIKORN-941:
---

I created the following 2 PR's:

[https://github.com/apache/incubator-yunikorn-release/pull/50]

[https://github.com/apache/incubator-yunikorn-k8shim/pull/331]

In this PR's I just moved away the admission controller related things from the 
scheduler image. 

However now we have it in a different deployment, independently from the 
scheduler, I mould moove forward and try to remove the admission_utils.sh 
script, and handle the admission controller from helm charts or from code, 
without running shell scripts.

> split scheduler and admission controller deployment
> ---
>
> Key: YUNIKORN-941
> URL: https://issues.apache.org/jira/browse/YUNIKORN-941
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Kinga Marton
>Assignee: Kinga Marton
>Priority: Major
>
> To support proper YuniKorn upgrades and restarts we should move the admission 
> controller out of the scheduler deployment and make it a separate deployment.
> This could also allow the admission controller to be made high available and 
> allow simpler no down time upgrades possible. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org