Hello Tao,

During our discussion with Wilfred yesterday, he mentioned that you folks
at Alibaba have been running YuniKorn at some decent scale. We are also
trying some big workloads (Spark batch jobs) with YuniKorn and would like
to have better visibility in terms of the scheduling performance, and also
create alerts to help us spot issues as soon as they happen. We found that
the current list of metrics that are available in the core are not
comprehensive and some seem to be incorrectly computed. We are reaching out
to kindly ask you what metrics you have found to be most helpful? Or did
you add some new metrics? A more generic question is how have you been
monitoring YuniKorn? Many thanks in advance.

If anyone else on the mailing list has ideas to chime in, that would be
awesome too.

Regards,
Chaoran

Reply via email to