----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67627/#review204936 -----------------------------------------------------------
Mostly LGTM. Will the UI show 0s or empty spaces? Can you expand on why PID namespaces breaks metrics? docs/reference/observer-configuration.md Lines 27 (patched) <https://reviews.apache.org/r/67627/#comment287754> also disk metrics src/main/python/apache/aurora/tools/thermos_observer.py Lines 68 (patched) <https://reviews.apache.org/r/67627/#comment287753> also disk metrics - Santhosh Kumar Shanmugham On June 18, 2018, 1:57 a.m., Stephan Erb wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/67627/ > ----------------------------------------------------------- > > (Updated June 18, 2018, 1:57 a.m.) > > > Review request for Aurora, Renan DelValle, Reza Motamedi, and Santhosh Kumar > Shanmugham. > > > Repository: aurora > > > Description > ------- > > Add observer command line option `--disable_task_resource_collection` to > disable the collection of CPU, memory, and disk metrics for observed tasks. > This is useful in setups where metrics cannot be gathered reliable (e.g. when > using PID namespaces) or when it is expensive due to hundreds of active tasks > per host. > > > Diffs > ----- > > RELEASE-NOTES.md edc081f502370190597ad028f3275cdfd572f5ca > docs/reference/observer-configuration.md > c791b3480e5bf35e6eb0fbea908ff3242eab315d > src/main/python/apache/aurora/config/BUILD > 12e7fe973f456d0847ce63d3b293131a7f4c3bdd > src/main/python/apache/aurora/tools/thermos_observer.py > fd9465d2e2b3135f3fdf8230777117adaa89337c > src/main/python/apache/thermos/monitoring/resource.py > 72ed4e5a82dfd8a09e0a8262f6da4992ac98542a > src/main/python/apache/thermos/observer/task_observer.py > 94cd6c541bb7f8a4c153cc51caa63d2c08888a49 > src/test/python/apache/thermos/monitoring/test_resource.py > 44450647a180f86903ebd37f2a9f4327496597e9 > > > Diff: https://reviews.apache.org/r/67627/diff/1/ > > > Testing > ------- > > We are running our Mesos agents with enabled PID namespaces (i.e. > `--isolation='namespaces/ipc,namespaces/pid,...'`). Sometimes the hosts are > also tightly packed with many small tasks (e.g. `~130` active tasks and > `~1000` > finished tasks). Even with very relaxed scrape settings of > `--task_process_collection_interval_secs=3000` and > `--task_disk_collection_interval_secs=3000` it can take between `150ms-2500ms` > to render the observer landing page `/main`. This patch reduces this to about > `100ms-150ms`. There is no immediate downside as metrics reporting is broken > anyway due to the PID namespacing. > > > Thanks, > > Stephan Erb > >