[jira] [Commented] (YUNIKORN-704) [Umbrella] Use the same mechanism to schedule daemon set pods as the default scheduler

Weiwei Yang (Jira) Wed, 18 Aug 2021 22:25:07 -0700


    [ 
https://issues.apache.org/jira/browse/YUNIKORN-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401485#comment-17401485
 ]


Weiwei Yang commented on YUNIKORN-704:
--------------------------------------

hi [~wilfreds] I just checked the cordon and uncordon node, you are correct 
about the taint "node.kubernetes.io/unschedulable" is added automatically for 
the cordoned nodes. what you are describing here is like what we want to do 
with the additional placement constraint, that was part of the interface 
design, but not implemented today. It's like adding a short circuit of 
evaluating node selectors. Dropping what we have done so far 
(shim/scheduler-interface changes) and move in that direction adds lots of work.

I just checked the changes [~Huang Ting Yao] has made, it is pretty 
straightforward, just to ignore unschedulable node when the ask has that 
certain attribute (when converts from a daemon set pod). that should be enough 
for solving this issue. I think it is better to go with this, espically 
[~chenya_zhang] is waiting on this. does that make sense?

> [Umbrella] Use the same mechanism to schedule daemon set pods as the default 
> scheduler
> --------------------------------------------------------------------------------------
>
>                 Key: YUNIKORN-704
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-704
>             Project: Apache YuniKorn
>          Issue Type: Improvement
>          Components: shim - kubernetes
>            Reporter: Chaoran Yu
>            Assignee: Ting Yao,Huang
>            Priority: Blocker
>             Fix For: 1.0.0
>
>         Attachments: fluent-bit-describe.yaml, fluent-bit.yaml
>
>
> We sometimes see DaemonSet pods fail to be scheduled. Please see attached 
> files for the YAML and _kubectl describe_ output of one such pod. We 
> originally suspected [node 
> reservation|https://github.com/apache/incubator-yunikorn-core/blob/v0.10.0/pkg/scheduler/context.go#L41]
>  was to blame. But even after setting the DISABLE_RESERVATION environment 
> variable to true, we still see such scheduling failures. The issue is 
> especially severe when K8s nodes have disk pressure that causes lots of pods 
> to be evicted. Newly created pods will stay in pending forever. We have to 
> temporarily uninstall YuniKorn and let the default scheduler do the 
> scheduling for these pods. 
> This issue is critical because lots of important pods belong to a DaemonSet, 
> such as Fluent Bit, a common logging solution. This is probably the last 
> remaining roadblock for us to have the confidence to have YuniKorn entirely 
> replace the default scheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org

[jira] [Commented] (YUNIKORN-704) [Umbrella] Use the same mechanism to schedule daemon set pods as the default scheduler

Reply via email to