[ https://issues.apache.org/jira/browse/AURORA-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093883#comment-16093883 ]
Reza Motamedi commented on AURORA-1939: --------------------------------------- I just started tailing my logs and as samples came in catted the proc/`<pid>`/stat. log snippets: {noformat} ~ $ tail -n 1000 -f thermos-observer.DEBUG | grep 62328 D0719 22:01:37.055901 24474 process_collector_psutil.py:105] Calculated rate for pid=62328 and children: -1.94176829079 (old: 1488.840000, new: 920.870000) {62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866: 1500007038.32} [{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, vms=11919360, nice=0, status='sleeping', threads=1), 866: ProcessSample(rate=0.0, user=526.98, system=961.84, rss=35430400, vms=1600204800, nice=0, status='sleeping', threads=20)}] [{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, vms=11919360, nice=0, status='sleeping', threads=1), 866: ProcessSample(rate=0.0, user=332.69, system=588.16, rss=35430400, vms=1600204800, nice=0, status='sleeping', threads=20)}] D0719 22:06:22.156851 24474 process_collector_psutil.py:105] Calculated rate for pid=62328 and children: -0.00277094847146 (old: 920.870000, new: 920.080000) {62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866: 1500007038.32} [{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, vms=11919360, nice=0, status='sleeping', threads=1), 866: ProcessSample(rate=0.0, user=332.69, system=588.16, rss=35430400, vms=1600204800, nice=0, status='sleeping', threads=20)}] [{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, vms=11919360, nice=0, status='sleeping', threads=1), 866: ProcessSample(rate=0.0, user=329.27, system=590.79, rss=34787328, vms=1600204800, nice=0, status='sleeping', threads=20)}] D0719 22:11:07.052711 24474 process_collector_psutil.py:105] Calculated rate for pid=62328 and children: -0.0636723871193 (old: 920.080000, new: 901.940000) {62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866: 1500007038.32} [{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, vms=11919360, nice=0, status='sleeping', threads=1), 866: ProcessSample(rate=0.0, user=329.27, system=590.79, rss=34787328, vms=1600204800, nice=0, status='sleeping', threads=20)}] [{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, vms=11919360, nice=0, status='sleeping', threads=1), 866: ProcessSample(rate=0.0, user=326.72, system=575.2, rss=35430400, vms=1600204800, nice=0, status='sleeping', threads=20)}] {noformat} checking values directly. Also added date so that we can compare times. {noformat} ~ $ date; cat /proc/866/stat | awk '{print $14, $15}' Wed Jul 19 22:01:41 UTC 2017 52733 96241 ~ $ date; cat /proc/866/stat | awk '{print $14, $15}' Wed Jul 19 22:06:27 UTC 2017 52766 96296 {noformat} > Thermos landing (host) page reports incorrect CPU rates when it is busy > ----------------------------------------------------------------------- > > Key: AURORA-1939 > URL: https://issues.apache.org/jira/browse/AURORA-1939 > Project: Aurora > Issue Type: Bug > Reporter: Reza Motamedi > Priority: Minor > > Thermos Observer uses `psutil` to monitor resource consumption of Thermos > Processes. On a busy machine, I have noticed negative CPU values when > visiting the Thermos landing page. > In my test I reproduced this by starting many processes that constantly > create short lived children. This indicates that in time between > `process_collector_psutil` looks up the Process children and the time it > calculates the CPU time the pid of the child is actually reused by another > much younger process, which leads to negative CPU times. -- This message was sent by Atlassian JIRA (v6.4.14#64029)