Hi Stephan, Tracking the same set of metrics for all non-prod jobs could be somewhat expensive on both collection and consumption sides. The only metrics we currently chose to collect are MTTA/R to help us monitor scheduling rate in view of reduced cluster capacity (AURORA-774). Perhaps we could put non-prod collection behind a set of command line switches (Arg<Boolean>)? E.g.:
SLA_COLLECT_NON_PROD_MEDIANS SLA_COLLECT_NON_PROD_JOB_UPTIMES SLA_COLLECT_NON_PROD_PLATFORM_UPTIMES These could be defined in SlaModule and injected into MetricCalculator to let us finely tune the required non-prod collection set. What do you think? Thanks, Maxim On Fri, May 29, 2015 at 7:09 AM, Erb, Stephan <[email protected]> wrote: > Hi everyone, > > we are are interested in the job uptime percentiles and the aggregate cluster > uptime percentage not only for production jobs, but also for our > non-production jobs. > > Are there any reasons why those are not available in a non-prod version, > similar to the current handling of mtta and mttr [1]? If there are no > objections, I will prepare a patch. > > Regards, > Stephan > > [1] > https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java#L69
