[ 
https://issues.apache.org/jira/browse/YUNIKORN-201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188256#comment-17188256
 ] 

Kinga Marton edited comment on YUNIKORN-201 at 9/1/20, 2:41 PM:
----------------------------------------------------------------

[~wwei],
{quote}Change the State field in app-CRD to "scheduling state", to indicate 
this only reflects the state in the scheduler
{quote}
I can change the appstatus to scheduling state, but I would keep the Status, 
because this is a predefined subresource. By changing this we will have the 
Status in the following format: 
{code:yaml}
status:           
  type: object
  properties:             
    scheduling_status:               
      type: string
    message:               
      type: string
    lastupdate:               
      type: string
{code}
{quote}Make sure when there is no allocation in an app, make sure the app state 
is "Waiting".
{quote}
 If we change the status in the CRD, we will need to make changes in the core 
side as well, since we agreed that the source of truth will be the core side 
state and I don't think it is a good idea to make an exception for this case. 
{quote}When {{SparkApplication}} is deleted, delete the app-CRD as well. And 
then remove this app from the scheduler.
{quote}
-I think we can handle this issue with YUNIKORN-266, where we will delete the 
related pods as well, when the application is deleted.-

[~Huang Ting Yao], is it possible to set the SparkApplication as 
ownerRefference for the CRD, so it should be deleted when the SparkApplication 
is deleted? 

Related YUNIKORN-26, I still think that we should allocate some time to fix 
that issue, because until we have that issue, I have the impression that we 
don't have a stable basement for the application handling. 


was (Author: kmarton):
[~wwei],
{quote}Change the State field in app-CRD to "scheduling state", to indicate 
this only reflects the state in the scheduler
{quote}
I can change the appstatus to scheduling state, but I would keep the Status, 
because this is a predefined subresource. By changing this we will have the 
Status in the following format: 
{code:yaml}
status:           
  type: object
  properties:             
    scheduling_status:               
      type: string
    message:               
      type: string
    lastupdate:               
      type: string
{code}
{quote}Make sure when there is no allocation in an app, make sure the app state 
is "Waiting".
{quote}
 If we change the status in the CRD, we will need to make changes in the core 
side as well, since we agreed that the source of truth will be the core side 
state and I don't think it is a good idea to make an exception for this case. 
{quote}When {{SparkApplication}} is deleted, delete the app-CRD as well. And 
then remove this app from the scheduler.
{quote}
I think we can handle this issue with YUNIKORN-266, where we will delete the 
related pods as well, when the application is deleted.

Related YUNIKORN-26, I still think that we should allocate some time to fix 
that issue, because until we have that issue, I have the impression that we 
don't have a stable basement for the application handling. 

> Application tracking API and CRD
> --------------------------------
>
>                 Key: YUNIKORN-201
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-201
>             Project: Apache YuniKorn
>          Issue Type: New Feature
>          Components: core - scheduler, scheduler-interface, shim - kubernetes
>            Reporter: Weiwei Yang
>            Assignee: Kinga Marton
>            Priority: Major
>
> Today, YK works behind the scene, and the workflow is like
>  # app operator or job server launch a bunch of pods on K8s
>  # YK gets notified and group pods to apps based on appID
>  # YK schedules the pods with respect to the app info
> This provides a simple model to integrate with existing K8s and to support 
> workloads, but it has some user experience issues. Such as
>  # YK can hardly manage the app lifecycle end to end. An outstanding issue is 
> we do not know when an app is finished if we only look at the pod status. 
>  # YK doesn't have ability to admit apps. We need the ability to admit app 
> based on various conditions, e.g resource quota, cluster overhead, ACL, etc. 
>  # Hard to track app status. Sometimes app might be pending in resource 
> queues, but we do not have a good way to expose such status info.
> To further improve the user experience, we need to introduce an application 
> tracking API and K8s custom resource definition (CRD). The CRD will be used 
> by app operator/job server to interact with YK, to get the lifecycle fully 
> controlled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org

Reply via email to