Paul Santa Clara created YUNIKORN-2678:
------------------------------------------

             Summary: Yunikorn does not appear to be considering Guaranteed 
resources when allocating Pending Pods.
                 Key: YUNIKORN-2678
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2678
             Project: Apache YuniKorn
          Issue Type: Bug
          Components: core - scheduler
    Affects Versions: 1.5.1
         Environment: EKS 1.29
            Reporter: Paul Santa Clara
         Attachments: jira-queues.yaml, jira-tier0-screenshot.png, 
jira-tier1-screenshot.png, jira-tier2-screenshot.png, jira-tier3-screenshot.png

Please see the attached queue configuration(jira-queues.yaml). 

I will create 100 pods in Tier0, 100 pods in Tier1, 100 pods in Tier2 and 100 
pods in Tier3.  Each Pod will require 1 VCore. Initially, there will be 0 
suitable nodes to run the Pods and all will be Pending. Karpenter will soon 
provision Nodes and Yunikorn will react by binding the Pods. 

Given this 
[code|https://github.com/apache/yunikorn-core/blob/a786feb5761be28e802d08976d224c40639cd86b/pkg/scheduler/objects/sorters.go#L81C74-L81C95],
 I would expect Yunikorn to distribute the allocations such that each of the 
Tier’ed queues reaches its Guarantees.  Instead, I observed a roughly even 
distribution of allocation across all of the queues.
Tier0 fails to meet its Gaurantees while Tier3, for instance, dramatically 
overshoots them.



 
{code:java}
> kubectl get pods -n finance | grep tier-0 | grep Pending | wc -l
   86
> kubectl get pods -n finance | grep tier-1 | grep Pending | wc -l
   83
> kubectl get pods -n finance | grep tier-2 | grep Pending | wc -l
   78
> kubectl get pods -n finance | grep tier-3 | grep Pending | wc -l
   77
{code}

Please see attached screen shots for queue usage.


Note, this situation can also be reproduced without the use of Karpenter by 
simply setting Yunikorn's `service.schedulingInterval` to a high duration, say 
1m.  Doing so will force Yunikorn to react to 400 Pods -across 4 queues- at 
roughly the same time forcing prioritization of queue allocations.



Test code to generate Pods:

{code:java}
from kubernetes import client, config
config.load_kube_config()


v1 = client.CoreV1Api()

def create_pod_manifest(tier, exec,):
    pod_manifest = {
        'apiVersion': 'v1',
        'kind': 'Pod',
        'metadata': {
            'name': f"rolling-test-tier-{tier}-exec-{exec}",
            'namespace': 'finance',
            'labels': {
                'applicationId': f"MyOwnApplicationId-tier-{tier}",
                'queue': f"root.tiers.{tier}"
            },
            "yunikorn.apache.org/user.info": 
'{"user":"system:serviceaccount:finance:spark","groups":["system:serviceaccounts","system:serviceaccounts:finance","system:authenticated"]}'
        },

        'spec': {
            "affinity": {
                "nodeAffinity" : {
                    "requiredDuringSchedulingIgnoredDuringExecution" : {
                        "nodeSelectorTerms" : [
                            {
                                "matchExpressions" : [
                                    {
                                        "key" : "di.rbx.com/dedicated",
                                        "operator" : "In",
                                        "values" : ["spark"]
                                    }
                                ]
                            }
                        ]

                    }
                },
            },
            "tolerations" : [
                {
                    "effect" : "NoSchedule",
                    "key": "dedicated",
                    "operator" : "Equal",
                    "value" : "spark"
                },
            ],

            "schedulerName": "yunikorn",
            'restartPolicy': 'Always',
            'containers': [{
                "name": "ubuntu",
                'image': 'ubuntu',
                "command": ["sleep", "604800"],
                "imagePullPolicy": "IfNotPresent",
                "resources" : {
                    "limits" : {
                        'cpu' : "1"
                    },
                    "requests" : {
                        'cpu' : "1"
                    }
                }
            }]
        }
    }
    return pod_manifest

for i in range(0,4):
    tier = str(i)
    for j in range(0,100):
        exec = str(j)
        pod_manifest = create_pod_manifest(tier, exec)
        print(pod_manifest)
        api_response = v1.create_namespaced_pod(body=pod_manifest, 
namespace="finance")
        print(f"creating tier( {tier} ) exec( {exec} )")
 {code}
 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org

Reply via email to