[ https://issues.apache.org/jira/browse/AURORA-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093761#comment-16093761 ]
Reza Motamedi commented on AURORA-1939: --------------------------------------- I see this problem when the host is super busy, and resource collection is backlogged. In this case I also see much more errors of this kind in the log: {noformat} D0719 20:18:28.064794 24474 process_collector_psutil.py:84] Error during process sampling: psutil.NoSuchProcess process no longer exists (pid=62193) D0719 20:18:35.351458 24474 process_collector_psutil.py:84] Error during process sampling: psutil.NoSuchProcess process no longer exists (pid=14711) D0719 20:18:35.552953 24474 process_collector_psutil.py:84] Error during process sampling: psutil.NoSuchProcess process no longer exists (pid=62331) D0719 20:18:42.857400 24474 task_observer.py:142] TaskObserver: finished checkpoint refresh in 0.01s D0719 20:18:43.753732 24474 process_collector_psutil.py:84] Error during process sampling: psutil.NoSuchProcess process no longer exists (pid=62338) D0719 20:18:48.454077 24474 mesos_vars.py:384] Metrics collection took 6506.1ms D0719 20:18:50.253031 24474 process_collector_psutil.py:84] Error during process sampling: psutil.NoSuchProcess process no longer exists (pid=62345) D0719 20:18:57.861535 24474 task_observer.py:142] TaskObserver: finished checkpoint refresh in 0.00s D0719 20:19:12.955235 24474 task_observer.py:142] TaskObserver: finished checkpoint refresh in 0.01s D0719 20:19:14.959180 24474 process_collector_psutil.py:84] Error during process sampling: psutil.NoSuchProcess process no longer exists (pid=62361) D0719 20:19:14.960768 24474 process_collector_psutil.py:84] Error during process sampling: psutil.NoSuchProcess process no longer exists (pid=62232) D0719 20:19:18.056128 24474 mesos_vars.py:384] Metrics collection took 6008.0ms D0719 20:19:22.856868 24474 process_collector_psutil.py:84] Error during process sampling: psutil.NoSuchProcess process no longer exists (pid=62366) D0719 20:19:28.048165 24474 task_observer.py:142] TaskObserver: finished checkpoint refresh in 0.09s D0719 20:19:28.660691 24474 process_collector_psutil.py:84] Error during process sampling: psutil.NoSuchProcess process no longer exists (pid=62374) D0719 20:19:43.051047 24474 task_observer.py:142] TaskObserver: finished checkpoint refresh in 0.00s D0719 20:19:48.355678 24474 mesos_vars.py:384] Metrics collection took 6299.8ms D0719 20:19:58.148663 24474 task_observer.py:142] TaskObserver: finished checkpoint refresh in 0.10s D0719 20:20:11.449485 24474 process_collector_psutil.py:84] Error during process sampling: psutil.NoSuchProcess process no longer exists (pid=62271) D0719 20:20:13.155102 24474 task_observer.py:142] TaskObserver: finished checkpoint refresh in 0.01s D0719 20:20:18.249528 24474 mesos_vars.py:384] Metrics collection took 6179.9ms D0719 20:20:23.354832 24474 process_collector_psutil.py:84] Error during process sampling: psutil.NoSuchProcess process no longer exists (pid=11317) D0719 20:20:27.060431 24474 process_collector_psutil.py:84] Error during process sampling: psutil.NoSuchProcess process no longer exists (pid=62281) D0719 20:20:28.160298 24474 task_observer.py:142] TaskObserver: finished checkpoint refresh in 0.00s D0719 20:20:35.452637 24474 process_collector_psutil.py:84] Error during process sampling: psutil.NoSuchProcess process no longer exists (pid=62289) D0719 20:20:43.252589 24474 task_observer.py:142] TaskObserver: finished checkpoint refresh in 0.09s D0719 20:20:48.151144 24474 mesos_vars.py:384] Metrics collection took 6058.3ms D0719 20:20:55.254796 24474 process_collector_psutil.py:84] Error during process sampling: psutil.NoSuchProcess process no longer exists (pid=62428) D0719 20:20:58.257311 24474 task_observer.py:142] TaskObserver: finished checkpoint refresh in 0.00s D0719 20:21:13.352955 24474 task_observer.py:142] TaskObserver: finished checkpoint refresh in 0.10s D0719 20:21:17.555244 24474 process_collector_psutil.py:84] Error during process sampling: psutil.NoSuchProcess process no longer exists (pid=62124) {noformat} > Thermos landing (host) page reports incorrect CPU rates when it is busy > ----------------------------------------------------------------------- > > Key: AURORA-1939 > URL: https://issues.apache.org/jira/browse/AURORA-1939 > Project: Aurora > Issue Type: Bug > Reporter: Reza Motamedi > Priority: Minor > > Thermos Observer uses `psutil` to monitor resource consumption of Thermos > Processes. On a busy machine, I have noticed negative CPU values when > visiting the Thermos landing page. > In my test I reproduced this by starting many processes that constantly > create short lived children. This indicates that in time between > `process_collector_psutil` looks up the Process children and the time it > calculates the CPU time the pid of the child is actually reused by another > much younger process, which leads to negative CPU times. -- This message was sent by Atlassian JIRA (v6.4.14#64029)