David Robinson created MESOS-1028:
-------------------------------------

             Summary: expose internal metrics
                 Key: MESOS-1028
                 URL: https://issues.apache.org/jira/browse/MESOS-1028
             Project: Mesos
          Issue Type: Improvement
          Components: general
            Reporter: David Robinson


Mesos should export statistics that provide visibility into its internals. This 
would allow users to detect numerous problem without resorting to trolling log 
files.

E.g. export counters of (some of these already exist, most don't):
cgroup create
cgroup destroy
cgroup destroy attempts
resource offers made
resource offers accepted
tasks launched
tasks destroyed
tasks lost
writes to replicated log
queue length

export 50th, 90th, 95th, 99th percentile of time taken to:
start mesos (reach a certain state)
move tasks between two given states (starting -> started)
create a cgroup
destroy a cgroup
send a message from slave to master
start a task
stop a task
register in zookeeper
write to the replicated log

Ideally all these metrics would be exposed via a HTTP+JSON endpoint. See 
[metrics|http://metrics.codahale.com/getting-started/] for an example (albeit 
Java) library (or [medida|http://dln.github.io/medida/] for an unmaintained(?) 
c++ port)

We've previously seen problems where tasks were stuck in cgroup destroy with 
>30,000 attempts. Exposing metrics would allow us to easily detect problems 
like this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to