-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66103/#review199680
-----------------------------------------------------------




src/main/python/apache/aurora/tools/thermos_observer.py
Lines 89 (patched)
<https://reviews.apache.org/r/66103/#comment280064>

    The agent's HTTP endpoints can have AuthN/AuthZ enabled. We can either add 
an option to specify the credentials file to be used while talking to the 
endpoint or we can call this out as limitation for now.


- Santhosh Kumar Shanmugham


On March 20, 2018, 10:20 a.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66103/
> -----------------------------------------------------------
> 
> (Updated March 20, 2018, 10:20 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Daniel Knightly, Jordan Ly, 
> Santhosh Kumar Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> When disk isolation is enabled in a Mesos agent it calculates the disk usage 
> for each container. 
> Thermos Observer also monitors disk usage using `twitter.common.dirutil`, 
> essentially repeating the work already done by the agent. In practice, we see 
> that disk monitoring is one of the most expensive resource monitoring tasks. 
> For instance, when there are deeply nested directories, the CPU utilization 
> of the observer process can easily reach 1.5 CPUs. It would be ideal if we 
> delegate the disk monitoring task to the agent and do it only once. With this 
> approach, when disk collection has improved in the agent (for instance by 
> implementing XFS isolation), we can simply benefit from it without any code 
> change. Some more information about the problem is provided in AURORA-1918.
> 
> This patch that introduces `MesosDiskCollector` which queries the agent's API 
> endpoint to lookup disk_used_bytes. Note that there is also resource 
> monitoring in thermos executor. Currently, I left the disk collector there to 
> use the `du` implementation. That can be changed in a later patch.
> 
> I modified some vagrant config files including `aurora-executor.service` and 
> `etc_mesos-slave/isolation` for testing. They can be left as is. I included 
> them in this patch to show how this would work e2e.
> 
> 
> Diffs
> -----
> 
>   3rdparty/python/requirements.txt 4ac242cfa2c1c19cb7447816ab86e748839d3d11 
>   examples/jobs/hello_world.aurora 5401bfebe753b5e53abd08baeac501144ced9b5a 
>   examples/vagrant/mesos_config/etc_mesos-slave/isolation 
> 1a7028ffc70116b104ef3ad22b7388f637707a0f 
>   examples/vagrant/systemd/thermos.service 
> 01925bcd2ae44f100df511f3c3951c3f5a1a72aa 
>   src/main/python/apache/aurora/tools/thermos_observer.py 
> dd9f0c46ceac9e939b1b763073314161de0ea614 
>   src/main/python/apache/thermos/monitoring/BUILD 
> 65ba7088f65e7baa5d30744736ba456b46a55e86 
>   src/main/python/apache/thermos/monitoring/disk.py 
> 986d33a5000f8d5db15cb639c81f8b1d756ffa05 
>   src/main/python/apache/thermos/monitoring/resource.py 
> adcdc751c03460dc801a18278faa96d6bd64722b 
>   src/main/python/apache/thermos/observer/task_observer.py 
> a6870d48bddf2a2ccede7bb68195f2baae1d0e47 
>   
> src/test/python/apache/aurora/executor/common/test_resource_manager_integration.py
>  fe74bd1d36666ecd89fca1b5b2251202cbbc0f24 
>   src/test/python/apache/thermos/monitoring/BUILD 
> 8f2b39336dce6c7b580e6ba0009f60afdcb89179 
>   src/test/python/apache/thermos/monitoring/test_disk.py 
> 362393bfd1facf3198e2d438d0596b16700b72b8 
>   src/test/python/apache/thermos/monitoring/test_resource.py 
> e577e552d4ee1807096a15401851bb9fd95fa426 
> 
> 
> Diff: https://reviews.apache.org/r/66103/diff/3/
> 
> 
> Testing
> -------
> 
> I added unit tests.
> Tested in vagrant and it works as intenced.
> I also built and deployed in our test enviroment. In order to measure 
> imporoved performance I created jobs with nested folders and noticed 
> reduction in CPU utilization of the Observer process, by at least 60%. (1.5 
> CPU cores to 0.4 CPU cores)
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>

Reply via email to