[
https://issues.apache.org/jira/browse/YUNIKORN-201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186834#comment-17186834
]
Weiwei Yang commented on YUNIKORN-201:
--------------------------------------
hi [~Huang Ting Yao], [~kmarton]
In this case, I am expecting the state should be "Waiting" instead of
"Running", according to
http://yunikorn.apache.org/docs/next/design/scheduler_object_states. Not sure
if there is any bug here.
This is caused by YUNIKORN-26, but the problem is a bit complicated... the
reason is YuniKorn doesn't know if the job has been completed/failed/succeed,
only the operator knows that. Internally, spark-k8s-operator monitors the spark
driver/executor pods, and changes {{SparkApplication}} state based on some
conditions. This is per-app logic, that can never be coded into YuniKorn. Based
on these things, I'd propose:
# Change the State field in app-CRD to "scheduling state", to indicate this
only reflects the state in the scheduler
# Make sure when there is no allocation in an app, make sure the app state is
"Waiting".
# When {{SparkApplication}} is deleted, delete the app-CRD as well. And then
remove this app from the scheduler.
> Application tracking API and CRD
> --------------------------------
>
> Key: YUNIKORN-201
> URL: https://issues.apache.org/jira/browse/YUNIKORN-201
> Project: Apache YuniKorn
> Issue Type: New Feature
> Components: core - scheduler, scheduler-interface, shim - kubernetes
> Reporter: Weiwei Yang
> Assignee: Kinga Marton
> Priority: Major
>
> Today, YK works behind the scene, and the workflow is like
> # app operator or job server launch a bunch of pods on K8s
> # YK gets notified and group pods to apps based on appID
> # YK schedules the pods with respect to the app info
> This provides a simple model to integrate with existing K8s and to support
> workloads, but it has some user experience issues. Such as
> # YK can hardly manage the app lifecycle end to end. An outstanding issue is
> we do not know when an app is finished if we only look at the pod status.
> # YK doesn't have ability to admit apps. We need the ability to admit app
> based on various conditions, e.g resource quota, cluster overhead, ACL, etc.
> # Hard to track app status. Sometimes app might be pending in resource
> queues, but we do not have a good way to expose such status info.
> To further improve the user experience, we need to introduce an application
> tracking API and K8s custom resource definition (CRD). The CRD will be used
> by app operator/job server to interact with YK, to get the lifecycle fully
> controlled.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]