Thanks Adam for this efforts!

I have did some performance tests on kubemark cluster, if we can improve
some key phases in the scheduling process, the scheduler pod can take 1~4
CPUs at most times and 7~9 CPUs at peak times, the memory seems may easily
go beyond 1G after tens of thousands pods have been scheduled in a cluster
with thousands of nodes.

Utilization of the scheduler pod may depends on the scale of cluster,
number of running pods and some other factors, those initial configuration
values Weiwei suggested make sense to me, I think they are suitable for
most usage scenarios. Moreover, I think it's better to make them
configurable in helm-charts/yunikorn/values.yaml, so that users can easily
update them on demand if they have a large scale cluster or other concerns.

Thanks,
Tao

Weiwei Yang <abvclo...@gmail.com> 于2020年4月21日周二 下午1:22写道:

> Hi Adam
>
> Thanks for the investigation. These data are useful, I think what you
> suggested looks good to me.
>
>    - I think it makes sense to set a relatively small number, this is to
>    ensure we are not getting into trouble on small envs. But for memory, can
>    we request for 1G at least? Memory cannot be throttled, 1G is safer.
>    - For the CPU, it looks like we probably won't go beyond 4~5 CPUs. So
>    setting a limit to 4 makes sense to me. I also checked an EKS cluster, I
>    saw usage around 1 CPU there.
>
> So I think we can do
>
>     resources:
>         requests:
>           cpu: 200m
>           memory: *1Gi*
>         limits:
>           cpu: 4
>           memory: 2Gi
>
> @taoy...@apache.org <taoy...@apache.org> could you help to review this?
> Let us know if this makes sense to you.
>
> Weiwei
>
>
> On Mon, Apr 20, 2020 at 6:16 AM Adam Antal <adam.an...@cloudera.com.invalid>
> wrote:
>
>> Hi,
>>
>> I am working on [YUNIKORN-86] (Set proper resource request and limit for
>> YuniKorn pods) issues and I would like to hear your feedback on this
>> issue.
>> Link:https://issues.apache.org/jira/browse/YUNIKORN-86
>>
>> Currently I am working on finding a justified request and limit value for
>> the scheduler and shim pods in k8s. So far:
>> - Researched a bit on the k8s default scheduler (kube-scheduler) and even
>> though it can be configured, that are no defaults for that pod. On AKS
>> there is a default that I could find on the internet:
>> >>>
>>       resources:
>>         requests:
>>           cpu: 100m
>>           memory: 128Mi
>>         limits:
>>           cpu: 4
>>           memory: 2Gi
>> >>>
>> - Tried to obtain some values from a deployed k8s cluster, but had
>> troubles
>> with the metrics-server (in some old version there was heapster deployed)
>> - Ran the yunikorn-core/pkg/scheduler/tests/scheduler_perf_tests.go and
>> monitored the cpu/memory consumption. I don't have the run's full graph,
>> but these are the approximate values that I experienced
>>    - for most of the time cpu was on around 4 cores
>>    - on peak time it was around 5.2 cores
>>    - the memory consumption was moderate, it was around 4.3% of the 16Gb
>> memory of my laptop - which is around 700Mb when the perf test was ended.
>> The memory depends on the stored/currently running applications, so I
>> think
>> 1Gb should be fine for this for general purpose.
>> Note that the cpu numbers are related to the peak usage, when the
>> scheduler
>> was under pressure, but when only a small amount of pods are
>> scheduled, this is a much smaller number.
>>
>> I suggest the following (similar to AKS) number:
>> >>>
>>       resources:
>>         requests:
>>           cpu: 200m
>>           memory: 512Mi
>>         limits:
>>           cpu: 4
>>           memory: 2Gi
>> >>>
>>
>> One additional thing: there were some measurements of YuniKorn in
>> kubemark.
>> Could you please share your outputs if there's something related to the
>> scheduler and the shim pod's resource usage?
>>
>> Regards,
>> Adam
>>
>

Reply via email to