[ https://issues.apache.org/jira/browse/YUNIKORN-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wilfred Spiegelenburg resolved YUNIKORN-1708. --------------------------------------------- Fix Version/s: 1.3.0 Resolution: Fixed All placeholders now get the originator pod as the owner not some random owner of the originator pod. > Filtered owner references for placeholder pods. > ----------------------------------------------- > > Key: YUNIKORN-1708 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1708 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes > Reporter: Junyoung Park > Assignee: Qi Zhu > Priority: Major > Labels: AWS, pull-request-available > Fix For: 1.3.0 > > > In AWS EMR on EKS service, the driver real pod's ownerReference is configmap. > And placeholder's ownerReference is also the driver configmap. > When user cancels emr-containers job, the job-submitter is terminated, > but the placeholder still remains in pending state. > [https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks.html] > > *Environment* > * EKS 1.22 > * EMR 6.9 release (Spark 3.3.0) > * Yunikorn 1.2 > * gang scheduling enabled > > *placeholders event log* > {code:java} > Unable to find source-code formatter for language: shell. Available languages > are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, > groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, > perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, > yamlEvents: > Type Reason Age From Message > ---- ------ ---- ---- ------- > Normal Scheduling 19m yunikorn > namespace/tg-driver-spark-000000031ttjn13iom3-0 is queued and waiting for > allocation > Normal PodUnschedulable 19m yunikorn Task > namespace/tg-driver-spark-000000031ttjn13iom3-0 is pending for the requested > resources become available > Warning FailedProvisioning 19m karpenter Failed to > provision new node > {code} > > *placeholders spec* > {code:java} > apiVersion: v1 > kind: Pod > metadata: > name: tg-driver-spark-000000031tu35ohgkc6-0 > namespace: namespace > uid: 80601a03-565c-4d0e-88c7-8c66b590871e > resourceVersion: '546358515' > creationTimestamp: '2023-04-26T15:06:06Z' > labels: > applicationId: spark-000000031tu35ohgkc6 > placeholder: 'true' > queue: root.beta > annotations: > yunikorn.apache.org/placeholder: 'true' > yunikorn.apache.org/schedulingPolicyParameters: > placeholderTimeoutSeconds=300 > yunikorn.apache.org/task-group-name: driver > yunikorn.apache.org/task-groups: >- > [{"name": "driver","minResource":{"cpu": > > "1","memory":"2Gi"},"minMember":1,"nodeSelector":{"karpenter.sh/provisioner-name":"test"}},{"name": > "executor","minResource":{"cpu": > > "1","memory":"5Gi"},"minMember":1,"nodeSelector":{"karpenter.sh/provisioner-name":"test"}}] > ownerReferences: > - apiVersion: batch/v1 > kind: ConfigMap > name: 000000031tu35ohgkc6-spark-defaults > uid: a3044750-c8b5-47b4-9efa-81bd4b064798 > controller: false > blockOwnerDeletion: true > - manager: k8s_yunikorn_scheduler > operation: Update > apiVersion: v1 > time: '2023-04-26T15:06:08Z' > fieldsType: FieldsV1 > fieldsV1: > f:status: > f:conditions: > .: {} > k:{"type":"PodScheduled"}: > .: {} > f:lastProbeTime: {} > f:lastTransitionTime: {} > f:message: {} > f:reason: {} > f:status: {} > f:type: {} > subresource: status > selfLink: >- > /api/v1/namespaces/namespace/pods/tg-driver-spark-000000031tu35ohgkc6-0 > status: > phase: Pending > conditions: > - type: PodScheduled > status: 'False' > lastProbeTime: null > lastTransitionTime: '2023-04-26T15:06:08Z' > reason: Unschedulable > message: request is waiting for cluster resources become available > qosClass: Burstable > spec: > volumes: > - name: kube-api-access-gvxxk > projected: > sources: > - serviceAccountToken: > expirationSeconds: 3607 > path: token > - configMap: > name: kube-root-ca.crt > items: > - key: ca.crt > path: ca.crt > - downwardAPI: > items: > - path: namespace > fieldRef: > apiVersion: v1 > fieldPath: metadata.namespace > defaultMode: 420 > containers: > - name: pause > image: registry.k8s.io/pause:3.7 > resources: > requests: > cpu: '1' > memory: 2Gi > volumeMounts: > - name: kube-api-access-gvxxk > readOnly: true > mountPath: /var/run/secrets/kubernetes.io/serviceaccount > terminationMessagePath: /dev/termination-log > terminationMessagePolicy: File > imagePullPolicy: IfNotPresent > restartPolicy: Never > terminationGracePeriodSeconds: 30 > nodeSelector: > karpenter.sh/provisioner-name: test > serviceAccountName: default > serviceAccount: default > securityContext: > runAsUser: 1000 > runAsGroup: 3000 > schedulerName: yunikorn > priority: 0 > preemptionPolicy: PreemptLowerPriority > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org