[ https://issues.apache.org/jira/browse/YUNIKORN-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401485#comment-17401485 ]
Weiwei Yang commented on YUNIKORN-704: -------------------------------------- hi [~wilfreds] I just checked the cordon and uncordon node, you are correct about the taint "node.kubernetes.io/unschedulable" is added automatically for the cordoned nodes. what you are describing here is like what we want to do with the additional placement constraint, that was part of the interface design, but not implemented today. It's like adding a short circuit of evaluating node selectors. Dropping what we have done so far (shim/scheduler-interface changes) and move in that direction adds lots of work. I just checked the changes [~Huang Ting Yao] has made, it is pretty straightforward, just to ignore unschedulable node when the ask has that certain attribute (when converts from a daemon set pod). that should be enough for solving this issue. I think it is better to go with this, espically [~chenya_zhang] is waiting on this. does that make sense? > [Umbrella] Use the same mechanism to schedule daemon set pods as the default > scheduler > -------------------------------------------------------------------------------------- > > Key: YUNIKORN-704 > URL: https://issues.apache.org/jira/browse/YUNIKORN-704 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes > Reporter: Chaoran Yu > Assignee: Ting Yao,Huang > Priority: Blocker > Fix For: 1.0.0 > > Attachments: fluent-bit-describe.yaml, fluent-bit.yaml > > > We sometimes see DaemonSet pods fail to be scheduled. Please see attached > files for the YAML and _kubectl describe_ output of one such pod. We > originally suspected [node > reservation|https://github.com/apache/incubator-yunikorn-core/blob/v0.10.0/pkg/scheduler/context.go#L41] > was to blame. But even after setting the DISABLE_RESERVATION environment > variable to true, we still see such scheduling failures. The issue is > especially severe when K8s nodes have disk pressure that causes lots of pods > to be evicted. Newly created pods will stay in pending forever. We have to > temporarily uninstall YuniKorn and let the default scheduler do the > scheduling for these pods. > This issue is critical because lots of important pods belong to a DaemonSet, > such as Fluent Bit, a common logging solution. This is probably the last > remaining roadblock for us to have the confidence to have YuniKorn entirely > replace the default scheduler. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org