Hi all, recently I've started to work on enhancing YK metrics. This simply means collecting more data (counters, statistics, distributions, etc) that helps debugging and troubleshooting various issues, mostly performance related. I created YUNIKORN-1049 <https://issues.apache.org/jira/browse/YUNIKORN-1049> for this purpose.
I had an idea that looking at what Hadoop YARN exposes as metrics ( https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Metrics.html) could be a good starting point because of the inherent similarities between the two projects. Obviously YARN is more than just a scheduler, but it's still useful as an input. I documented my findings. It was originally an internal document, which I now made public under YUNIKORN-1050 <https://issues.apache.org/jira/browse/YUNIKORN-1050>. It's nowhere near complete, so feel free to take a look and add your suggestions/comments (there are already some from Wilfred, Craig, Sunil). It's viewable for everyone, but suggestions/edits are restricted so just ask for it. I didn't want to make too many subtasks under the JIRA, so I only created some generic ones, those can be broken down later if it's deemed necessary. Thanks, Peter