[ https://issues.apache.org/jira/browse/YUNIKORN-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401961#comment-17401961 ]
Wilfred Spiegelenburg edited comment on YUNIKORN-704 at 8/20/21, 2:07 AM: -------------------------------------------------------------------------- It does not line up with the way the default scheduler does things. The daemon set *must* only be scheduled on the specified node. If for some reason a different node, not the specified one, allows the pod to be placed things break. The node that *needs* the daemon set pod might never become schedulable. This could break things even worse. We need to keep in mind that the daemon set pod is not just for any node just for the specified node. The daemon set controller has already chosen the node. The scheduler just needs to place it on that exact node. was (Author: wifreds): It does not line up with the way the default scheduler does things. The daemon set *must* only be scheduled on the specific node. If for some reason a node that is not specified as the required node in the spec allows the pod to be placed the node that *needs* the daemon set pod might never become schedulable. This can break things even worse. We need to keep in mind that the daemon set pod is not just for any node just for the specified node. > [Umbrella] Use the same mechanism to schedule daemon set pods as the default > scheduler > -------------------------------------------------------------------------------------- > > Key: YUNIKORN-704 > URL: https://issues.apache.org/jira/browse/YUNIKORN-704 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes > Reporter: Chaoran Yu > Assignee: Ting Yao,Huang > Priority: Blocker > Fix For: 1.0.0 > > Attachments: fluent-bit-describe.yaml, fluent-bit.yaml > > > We sometimes see DaemonSet pods fail to be scheduled. Please see attached > files for the YAML and _kubectl describe_ output of one such pod. We > originally suspected [node > reservation|https://github.com/apache/incubator-yunikorn-core/blob/v0.10.0/pkg/scheduler/context.go#L41] > was to blame. But even after setting the DISABLE_RESERVATION environment > variable to true, we still see such scheduling failures. The issue is > especially severe when K8s nodes have disk pressure that causes lots of pods > to be evicted. Newly created pods will stay in pending forever. We have to > temporarily uninstall YuniKorn and let the default scheduler do the > scheduling for these pods. > This issue is critical because lots of important pods belong to a DaemonSet, > such as Fluent Bit, a common logging solution. This is probably the last > remaining roadblock for us to have the confidence to have YuniKorn entirely > replace the default scheduler. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org