[jira] [Commented] (YUNIKORN-201) Application tracking API and CRD

Weiwei Yang (Jira) Fri, 28 Aug 2020 15:35:32 -0700


    [ 
https://issues.apache.org/jira/browse/YUNIKORN-201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186834#comment-17186834
 ]


Weiwei Yang commented on YUNIKORN-201:
--------------------------------------

hi [~Huang Ting Yao], [~kmarton]

In this case, I am expecting the state should be "Waiting" instead of 
"Running", according to 
http://yunikorn.apache.org/docs/next/design/scheduler_object_states. Not sure 
if there is any bug here.
This is caused by YUNIKORN-26, but the problem is a bit complicated... the 
reason is YuniKorn doesn't know if the job has been completed/failed/succeed, 
only the operator knows that. Internally, spark-k8s-operator monitors the spark 
driver/executor pods, and changes {{SparkApplication}} state based on some 
conditions. This is per-app logic, that can never be coded into YuniKorn. Based 
on these things, I'd propose:
# Change the State field in app-CRD to "scheduling state", to indicate this 
only reflects the state in the scheduler
# Make sure when there is no allocation in an app, make sure the app state is 
"Waiting".
# When {{SparkApplication}} is deleted, delete the app-CRD as well. And then 
remove this app from the scheduler.


> Application tracking API and CRD
> --------------------------------
>
>                 Key: YUNIKORN-201
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-201
>             Project: Apache YuniKorn
>          Issue Type: New Feature
>          Components: core - scheduler, scheduler-interface, shim - kubernetes
>            Reporter: Weiwei Yang
>            Assignee: Kinga Marton
>            Priority: Major
>
> Today, YK works behind the scene, and the workflow is like
>  # app operator or job server launch a bunch of pods on K8s
>  # YK gets notified and group pods to apps based on appID
>  # YK schedules the pods with respect to the app info
> This provides a simple model to integrate with existing K8s and to support 
> workloads, but it has some user experience issues. Such as
>  # YK can hardly manage the app lifecycle end to end. An outstanding issue is 
> we do not know when an app is finished if we only look at the pod status. 
>  # YK doesn't have ability to admit apps. We need the ability to admit app 
> based on various conditions, e.g resource quota, cluster overhead, ACL, etc. 
>  # Hard to track app status. Sometimes app might be pending in resource 
> queues, but we do not have a good way to expose such status info.
> To further improve the user experience, we need to introduce an application 
> tracking API and K8s custom resource definition (CRD). The CRD will be used 
> by app operator/job server to interact with YK, to get the lifecycle fully 
> controlled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YUNIKORN-201) Application tracking API and CRD

Reply via email to