[ 
https://issues.apache.org/jira/browse/AURORA-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093761#comment-16093761
 ] 

Reza Motamedi commented on AURORA-1939:
---------------------------------------

I see this problem when the host is super busy, and resource collection is 
backlogged. In this case I also see much more errors of this kind in the log:
{noformat}
D0719 20:18:28.064794 24474 process_collector_psutil.py:84] Error during 
process sampling: psutil.NoSuchProcess process no longer exists (pid=62193)
D0719 20:18:35.351458 24474 process_collector_psutil.py:84] Error during 
process sampling: psutil.NoSuchProcess process no longer exists (pid=14711)
D0719 20:18:35.552953 24474 process_collector_psutil.py:84] Error during 
process sampling: psutil.NoSuchProcess process no longer exists (pid=62331)
D0719 20:18:42.857400 24474 task_observer.py:142] TaskObserver: finished 
checkpoint refresh in 0.01s
D0719 20:18:43.753732 24474 process_collector_psutil.py:84] Error during 
process sampling: psutil.NoSuchProcess process no longer exists (pid=62338)
D0719 20:18:48.454077 24474 mesos_vars.py:384] Metrics collection took 6506.1ms
D0719 20:18:50.253031 24474 process_collector_psutil.py:84] Error during 
process sampling: psutil.NoSuchProcess process no longer exists (pid=62345)
D0719 20:18:57.861535 24474 task_observer.py:142] TaskObserver: finished 
checkpoint refresh in 0.00s
D0719 20:19:12.955235 24474 task_observer.py:142] TaskObserver: finished 
checkpoint refresh in 0.01s
D0719 20:19:14.959180 24474 process_collector_psutil.py:84] Error during 
process sampling: psutil.NoSuchProcess process no longer exists (pid=62361)
D0719 20:19:14.960768 24474 process_collector_psutil.py:84] Error during 
process sampling: psutil.NoSuchProcess process no longer exists (pid=62232)
D0719 20:19:18.056128 24474 mesos_vars.py:384] Metrics collection took 6008.0ms
D0719 20:19:22.856868 24474 process_collector_psutil.py:84] Error during 
process sampling: psutil.NoSuchProcess process no longer exists (pid=62366)
D0719 20:19:28.048165 24474 task_observer.py:142] TaskObserver: finished 
checkpoint refresh in 0.09s
D0719 20:19:28.660691 24474 process_collector_psutil.py:84] Error during 
process sampling: psutil.NoSuchProcess process no longer exists (pid=62374)
D0719 20:19:43.051047 24474 task_observer.py:142] TaskObserver: finished 
checkpoint refresh in 0.00s
D0719 20:19:48.355678 24474 mesos_vars.py:384] Metrics collection took 6299.8ms
D0719 20:19:58.148663 24474 task_observer.py:142] TaskObserver: finished 
checkpoint refresh in 0.10s
D0719 20:20:11.449485 24474 process_collector_psutil.py:84] Error during 
process sampling: psutil.NoSuchProcess process no longer exists (pid=62271)
D0719 20:20:13.155102 24474 task_observer.py:142] TaskObserver: finished 
checkpoint refresh in 0.01s
D0719 20:20:18.249528 24474 mesos_vars.py:384] Metrics collection took 6179.9ms
D0719 20:20:23.354832 24474 process_collector_psutil.py:84] Error during 
process sampling: psutil.NoSuchProcess process no longer exists (pid=11317)
D0719 20:20:27.060431 24474 process_collector_psutil.py:84] Error during 
process sampling: psutil.NoSuchProcess process no longer exists (pid=62281)
D0719 20:20:28.160298 24474 task_observer.py:142] TaskObserver: finished 
checkpoint refresh in 0.00s
D0719 20:20:35.452637 24474 process_collector_psutil.py:84] Error during 
process sampling: psutil.NoSuchProcess process no longer exists (pid=62289)
D0719 20:20:43.252589 24474 task_observer.py:142] TaskObserver: finished 
checkpoint refresh in 0.09s
D0719 20:20:48.151144 24474 mesos_vars.py:384] Metrics collection took 6058.3ms
D0719 20:20:55.254796 24474 process_collector_psutil.py:84] Error during 
process sampling: psutil.NoSuchProcess process no longer exists (pid=62428)
D0719 20:20:58.257311 24474 task_observer.py:142] TaskObserver: finished 
checkpoint refresh in 0.00s
D0719 20:21:13.352955 24474 task_observer.py:142] TaskObserver: finished 
checkpoint refresh in 0.10s
D0719 20:21:17.555244 24474 process_collector_psutil.py:84] Error during 
process sampling: psutil.NoSuchProcess process no longer exists (pid=62124)
{noformat}

> Thermos landing (host) page reports incorrect CPU rates when it is busy
> -----------------------------------------------------------------------
>
>                 Key: AURORA-1939
>                 URL: https://issues.apache.org/jira/browse/AURORA-1939
>             Project: Aurora
>          Issue Type: Bug
>            Reporter: Reza Motamedi
>            Priority: Minor
>
> Thermos Observer uses `psutil` to monitor resource consumption of Thermos 
> Processes. On a busy machine, I have noticed negative CPU values when 
> visiting the Thermos landing page.
> In my test I reproduced this by starting many processes that constantly 
> create short lived children. This indicates that in time between 
> `process_collector_psutil` looks up the Process children and the time it 
> calculates the CPU time the pid of the child is actually reused by another 
> much younger process, which leads to negative CPU times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to