[ 
https://issues.apache.org/jira/browse/YUNIKORN-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091556#comment-17091556
 ] 

Adam Antal commented on YUNIKORN-42:
------------------------------------

[~wwei] thanks for assign this to me. I know that you are all kinda busy with 
the release, so I wrote down a few thoughts and you can respond to them when 
you have time.

I recently got through the design doc and browsed the Scheduler Interface to 
have more insight on the purpose of this jira. I think [~wangda]'s original 
implementation plan is a good one, I would make a few suggestions according to 
my and [~wilfreds] previous comments.

I would like to approach the pod, application, queue and node cases separately.
For *pods* it's logical to pass these events from the scheduler to the shim, 
and the shim can further emit these to the k8s event system. So end-users will 
{{kubectl describe}} the pending pod to see any errors that the scheduler can 
emit. I'd like to change the way how it does so.
Because new pods are requested through {{AllocationAsk}} in {{UpdateRequest}}, 
so the proposed {{DiagnosticInformation}} in {{UpdateResponse}} is too broad 
for this purpose. I'd put it into {{RejectedAllocationAsk}}, but as I can see 
we already have a reason string that describes the rejection. Could we leverage 
that perhaps?

Since *nodes* are also ResourceManager dependent objects, I'd so something 
similar for emitting node-related events as well. As I searched the SI, I've 
found {{AcceptedNode}} and {{RejectedNode}} objects - can we also use these for 
the event system?

*Queues* are scheduler-level concepts so these should not be passed along with 
the SI.

With regards to *applications*: I have the impression that applications are 
RM-level concepts because they are included in the SI protocol. That being said 
we also have to provide some diagnostics on that level, but there is no such 
utility as {{kubectl describe application}} in k8s side - so the question is: 
do we really need to do that?
One idea that I could think of is that we can also emit CRDs on behalf of the 
shim that represents applications and that object can be the target of these 
events. This is handled by the shim obviously, and can be synchronized with the 
Spark / other applications' state (where we actually no need to communicate 
this with the scheduler continuosly).
I see some advantage of these CRDs in contexts like work-preserving recovery (I 
am not aware how this is currently handled in K8s), but would be pretty 
straightforward to just read up the CRDs when an RM has to resync its state.

As for the event cache in the scheduler component, I think [~wangda]'s proposal 
is good: we also need a way to approach the problem from the scheduler's 
perspective. I'd definitely like to keep that piece of the architecture.

Please explain your opinion on that. I will create an updated POC document with 
the things we discuss in this thread. I welcome your thoughts/constructive 
criticism.

> Better to support POD events for YuniKorn to troubleshoot allocation failures
> -----------------------------------------------------------------------------
>
>                 Key: YUNIKORN-42
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-42
>             Project: Apache YuniKorn
>          Issue Type: Task
>            Reporter: Wangda Tan
>            Assignee: Adam Antal
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now it is tricky to do troubleshoot for pod allocation, we need better expose 
> this information to POD description.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org

Reply via email to