[ 
https://issues.apache.org/jira/browse/AURORA-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095046#comment-16095046
 ] 

Reza Motamedi commented on AURORA-1939:
---------------------------------------

Added more logging to check the returned value for a process.cpu_times(). The 
numbers simply don't add up. What I see the same pid sampled by psutil showing 
a different name and CPU samples that fluctuate (decrease).

Note the in this example pid:14712 was created on July 16th, and the log is 
from July 20th. So there is no chance that what we see in the log is because 
pid reused.

{noformat}
~ $ grep 'pid=14712' thermos-observer.DEBUG
D0720 17:01:04.148628 61755 process_collector_psutil.py:35] 
process:psutil.Process(pid=14712, name='mesos-slave') cpu times 
pcputimes(user=603.88, system=1104.31, children_user=0.0, children_system=0.0)
D0720 17:06:02.358989 61755 process_collector_psutil.py:35] 
process:psutil.Process(pid=14712, name='mesos-slave') cpu times 
pcputimes(user=591.16, system=1097.5, children_user=0.0, children_system=0.0)
D0720 17:10:55.258080 61755 process_collector_psutil.py:35] 
process:psutil.Process(pid=14712, name='python2.7') cpu times 
pcputimes(user=44.81, system=7.29, children_user=0.0, children_system=0.0)
D0720 17:16:23.156296 61755 process_collector_psutil.py:35] 
process:psutil.Process(pid=14712, name='python2.7') cpu times 
pcputimes(user=596.01, system=1104.11, children_user=0.0, children_system=0.0)
D0720 17:21:21.552978 61755 process_collector_psutil.py:35] 
process:psutil.Process(pid=14712, name='python2.7') cpu times 
pcputimes(user=44.9, system=7.3, children_user=0.0, children_system=0.0)

~ $ ps -o lstart= -p 14712
Sun Jul 16 11:39:19 2017
{noformat}
 

> Thermos landing (host) page reports incorrect CPU rates when it is busy
> -----------------------------------------------------------------------
>
>                 Key: AURORA-1939
>                 URL: https://issues.apache.org/jira/browse/AURORA-1939
>             Project: Aurora
>          Issue Type: Bug
>            Reporter: Reza Motamedi
>            Priority: Minor
>
> Thermos Observer uses `psutil` to monitor resource consumption of Thermos 
> Processes. On a busy machine, I have noticed negative CPU values when 
> visiting the Thermos landing page.
> In my test I reproduced this by starting many processes that constantly 
> create short lived children. This indicates that in time between 
> `process_collector_psutil` looks up the Process children and the time it 
> calculates the CPU time the pid of the child is actually reused by another 
> much younger process, which leads to negative CPU times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to