[ https://issues.apache.org/jira/browse/AURORA-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097798#comment-16097798 ]
Stephan Erb commented on AURORA-1939: ------------------------------------- This is now on master. Thanks for the patch! {code} commit cdc5b8efd5bb86d38f73cca6d91903078b120333 Author: Reza Motamedi reza.motam...@gmail.com Date: Sat Jul 22 20:28:50 2017 +0200 Remove psutil's oneshot After a lot of testing on busy machines, I realized that psutil's oneshot is not threadsafe. I contacted the developer however, have not recevied a conceret fix. Please read https://issues.apache.org/jira/browse/AURORA-1939 and https://github.com/giampaolo/psutil/issues/1110 for more information. These inconsistencies disappear after removing oneshot. Reviewed at https://reviews.apache.org/r/61016/ src/main/python/apache/thermos/monitoring/process_collector_psutil.py | 23 +++++++++++------------ 1 file changed, 11 insertions(+), 12 deletions(-) {code} > Thermos landing (host) page reports incorrect CPU rates when it is busy > ----------------------------------------------------------------------- > > Key: AURORA-1939 > URL: https://issues.apache.org/jira/browse/AURORA-1939 > Project: Aurora > Issue Type: Bug > Reporter: Reza Motamedi > Assignee: Reza Motamedi > Priority: Minor > > Thermos Observer uses `psutil` to monitor resource consumption of Thermos > Processes. On a busy machine, I have noticed negative CPU values when > visiting the Thermos landing page. > In my test I reproduced this by starting many processes that constantly > create short lived children. This indicates that in time between > `process_collector_psutil` looks up the Process children and the time it > calculates the CPU time the pid of the child is actually reused by another > much younger process, which leads to negative CPU times. -- This message was sent by Atlassian JIRA (v6.4.14#64029)