[jira] [Commented] (AURORA-1939) Thermos landing (host) page reports incorrect CPU rates when it is busy

Reza Motamedi (JIRA) Wed, 19 Jul 2017 15:15:23 -0700

    [ 
https://issues.apache.org/jira/browse/AURORA-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093883#comment-16093883
 ]


Reza Motamedi commented on AURORA-1939:
---------------------------------------

I just started tailing my logs and as samples came in catted the 
proc/`<pid>`/stat.

log snippets:
{noformat}
~ $ tail -n 1000 -f thermos-observer.DEBUG  | grep 62328
D0719 22:01:37.055901 24474 process_collector_psutil.py:105] Calculated rate 
for pid=62328 and children: -1.94176829079 (old: 1488.840000, new: 920.870000) 
{62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866: 
1500007038.32} [{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, 
rss=2609152, vms=11919360, nice=0, status='sleeping', threads=1), 866: 
ProcessSample(rate=0.0, user=526.98, system=961.84, rss=35430400, 
vms=1600204800, nice=0, status='sleeping', threads=20)}] [{62328: 
ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, vms=11919360, 
nice=0, status='sleeping', threads=1), 866: ProcessSample(rate=0.0, 
user=332.69, system=588.16, rss=35430400, vms=1600204800, nice=0, 
status='sleeping', threads=20)}]
D0719 22:06:22.156851 24474 process_collector_psutil.py:105] Calculated rate 
for pid=62328 and children: -0.00277094847146 (old: 920.870000, new: 
920.080000) {62328: 1500007014.22, 866: 1500007038.32} -> {62328: 
1500007014.22, 866: 1500007038.32} [{62328: ProcessSample(rate=0.0, user=0.0, 
system=0.02, rss=2609152, vms=11919360, nice=0, status='sleeping', threads=1), 
866: ProcessSample(rate=0.0, user=332.69, system=588.16, rss=35430400, 
vms=1600204800, nice=0, status='sleeping', threads=20)}] [{62328: 
ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, vms=11919360, 
nice=0, status='sleeping', threads=1), 866: ProcessSample(rate=0.0, 
user=329.27, system=590.79, rss=34787328, vms=1600204800, nice=0, 
status='sleeping', threads=20)}]
D0719 22:11:07.052711 24474 process_collector_psutil.py:105] Calculated rate 
for pid=62328 and children: -0.0636723871193 (old: 920.080000, new: 901.940000) 
{62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866: 
1500007038.32} [{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, 
rss=2609152, vms=11919360, nice=0, status='sleeping', threads=1), 866: 
ProcessSample(rate=0.0, user=329.27, system=590.79, rss=34787328, 
vms=1600204800, nice=0, status='sleeping', threads=20)}] [{62328: 
ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, vms=11919360, 
nice=0, status='sleeping', threads=1), 866: ProcessSample(rate=0.0, 
user=326.72, system=575.2, rss=35430400, vms=1600204800, nice=0, 
status='sleeping', threads=20)}]
{noformat}



checking values directly. Also added date so that we can compare times.
{noformat}
~ $ date; cat /proc/866/stat | awk '{print $14, $15}'
Wed Jul 19 22:01:41 UTC 2017
52733 96241
~ $ date; cat /proc/866/stat | awk '{print $14, $15}'
Wed Jul 19 22:06:27 UTC 2017
52766 96296
{noformat}



> Thermos landing (host) page reports incorrect CPU rates when it is busy
> -----------------------------------------------------------------------
>
>                 Key: AURORA-1939
>                 URL: https://issues.apache.org/jira/browse/AURORA-1939
>             Project: Aurora
>          Issue Type: Bug
>            Reporter: Reza Motamedi
>            Priority: Minor
>
> Thermos Observer uses `psutil` to monitor resource consumption of Thermos 
> Processes. On a busy machine, I have noticed negative CPU values when 
> visiting the Thermos landing page.
> In my test I reproduced this by starting many processes that constantly 
> create short lived children. This indicates that in time between 
> `process_collector_psutil` looks up the Process children and the time it 
> calculates the CPU time the pid of the child is actually reused by another 
> much younger process, which leads to negative CPU times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (AURORA-1939) Thermos landing (host) page reports incorrect CPU rates when it is busy

Reply via email to