Hello devs, We recently had an incident where the master was overloaded by the scheduler's ACKNOWLEDGE requests, causing the http api latencies to spike. I have two questions: - what is the best way to instrument the http api to emit latency metrics? - what's the best way to monitor the master's load, in addition to the api latencies?
apparently monitoring cpu doesn't help much as the master will never saturate a machine with more than 2 cpus. any guidance on this would be much appreciated. Thanks! Eric
