> On July 18, 2017, 12:46 a.m., Santhosh Kumar Shanmugham wrote: > > src/main/python/apache/thermos/monitoring/resource.py > > Line 54 (original), 53-60 (patched) > > <https://reviews.apache.org/r/60376/diff/3/?file=1766459#file1766459line54> > > > > Can you add some docstrings to these classes to explain what they will > > contain? Particularly the difference between the `AggregateResourceResult` > > and `(Proc|Full)ResourceResult`.
Done. > On July 18, 2017, 12:46 a.m., Santhosh Kumar Shanmugham wrote: > > src/main/python/apache/thermos/monitoring/resource.py > > Line 61 (original), 67 (patched) > > <https://reviews.apache.org/r/60376/diff/3/?file=1766459#file1766459line68> > > > > Update docstring. Done. > On July 18, 2017, 12:46 a.m., Santhosh Kumar Shanmugham wrote: > > src/main/python/apache/thermos/monitoring/resource.py > > Line 68 (original), 74 (patched) > > <https://reviews.apache.org/r/60376/diff/3/?file=1766459#file1766459line75> > > > > Update docstring. Done. > On July 18, 2017, 12:46 a.m., Santhosh Kumar Shanmugham wrote: > > src/main/python/apache/thermos/monitoring/resource.py > > Lines 88 (patched) > > <https://reviews.apache.org/r/60376/diff/3/?file=1766459#file1766459line89> > > > > snake-case the field names. Done. > On July 18, 2017, 12:46 a.m., Santhosh Kumar Shanmugham wrote: > > src/main/python/apache/thermos/monitoring/resource.py > > Lines 175-179 (patched) > > <https://reviews.apache.org/r/60376/diff/3/?file=1766459#file1766459line181> > > > > Can we use `sum` and `zip` to sum the tuples column-wise? Not sure if that makes it easier to read. I moved to something that matches what this part of the calculation looked like before: ``` aggregated_procs = sum(map(attrgetter('num_procs'), full_resources.proc_usage.values())) aggregated_sample = sum(map(attrgetter('process_sample'), full_resources.proc_usage.values()), ProcessSample.empty()) ``` > On July 18, 2017, 12:46 a.m., Santhosh Kumar Shanmugham wrote: > > src/test/python/apache/thermos/monitoring/test_resource.py > > Lines 104-105 (patched) > > <https://reviews.apache.org/r/60376/diff/3/?file=1766461#file1766461line106> > > > > Feed some random values. done. > On July 18, 2017, 12:46 a.m., Santhosh Kumar Shanmugham wrote: > > src/test/python/apache/thermos/monitoring/test_resource.py > > Lines 113 (patched) > > <https://reviews.apache.org/r/60376/diff/3/?file=1766461#file1766461line115> > > > > Assert disk_usage. done. - Reza ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/60376/#review180775 ----------------------------------------------------------- On July 18, 2017, 6:26 a.m., Reza Motamedi wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/60376/ > ----------------------------------------------------------- > > (Updated July 18, 2017, 6:26 a.m.) > > > Review request for Aurora, David McLaughlin, Joshua Cohen, Jordan Ly, and > Santhosh Kumar Shanmugham. > > > Repository: aurora > > > Description > ------- > > # Observer task page to load consumption info from history > > Resource consumptions of Thermos Processes are periodically calculated by > TaskResourceMonitor threads (one thread per Thermos task). This information > is used to display a (semi) fresh state of the tasks running on a host in the > Observer host page, aka landing page. An aggregate history of the > consumptions is kept at the task level, although TaskResourceMonitor needs to > first collect the resource at the Process level and then aggregate them. > > On the other hand, when an Observer _task page_ is visited, the resources > consumption of Thermos Processes within that task are calculated again and > displayed without being aggregated. This can become very slow since time to > complete resource calculation is affected by the load on the host. > > By applying this patch we take advantage of the periodic work and fulfill > information resource requested in Observer task page from already collected > resource consumptions. > > > Diffs > ----- > > src/main/python/apache/thermos/monitoring/resource.py > 434666696e600a0e6c19edd986c86575539976f2 > src/test/python/apache/aurora/executor/common/test_resource_manager.py > a898e4d81d34d1e30e39db1be1a66bc9e0ab1a35 > src/test/python/apache/thermos/monitoring/test_resource.py > d794a998f1d9fc52ba260cd31ac444aee7f8ed28 > > > Diff: https://reviews.apache.org/r/60376/diff/4/ > > > Testing > ------- > > I stress tested this patch on a host that had a slow Observer page. > Interestingly, I did not need to do much to make the Observer slow. There are > a few points to be made clear first. > - We at Twitter limit the resources allocated to the Observer using > `systemd`. The observer is allowed to use only 20% of a CPU core. The > attached screen shots are from such a setup. > - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, > each with 3 `process`es is enough to make the Observer slow; 11secs to load > `task page`. > > > File Attachments > ---------------- > > without the patch -- Screen Shot 2017-06-22 at 1.11.12 PM.png > > https://reviews.apache.org/media/uploaded/files/2017/06/22/03968028-a2f5-4a99-ba57-b7a41c471436__without_the_patch_--_Screen_Shot_2017-06-22_at_1.11.12_PM.png > with the patch -- Screen Shot 2017-06-22 at 1.07.41 PM.png > > https://reviews.apache.org/media/uploaded/files/2017/06/22/5962c018-27d3-4463-a277-f6ad48b7f2d7__with_the_patch_--_Screen_Shot_2017-06-22_at_1.07.41_PM.png > > > Thanks, > > Reza Motamedi > >