[ https://issues.apache.org/jira/browse/AURORA-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095732#comment-16095732 ]
Reza Motamedi edited comment on AURORA-1939 at 7/21/17 3:42 AM: ---------------------------------------------------------------- Following [~StephanErb]'s suggestion tried to guard the psutil's `oneshot` as a critical section. It does not seem to work however: {code} ... from threading import Lock ... oneshot_lock = Lock() def process_to_sample(process): """ Given a psutil.Process, return a current ProcessSample """ try: with oneshot_lock: with process.oneshot(): # the nonblocking get_cpu_percent call is stateful on a particular Process object, and hence # >2 consecutive calls are required before it will return a non-zero value rate = process.cpu_percent(0.0) / 100.0 cpu_times = process.cpu_times() ... {code} was (Author: rezam): Following [~StephanErb]'s suggestion tried to guard the psutil's `oneshot` as a critical section. It does not seem to work however: ``` ... from threading import Lock ... oneshot_lock = Lock() def process_to_sample(process): """ Given a psutil.Process, return a current ProcessSample """ try: with oneshot_lock: with process.oneshot(): # the nonblocking get_cpu_percent call is stateful on a particular Process object, and hence # >2 consecutive calls are required before it will return a non-zero value rate = process.cpu_percent(0.0) / 100.0 cpu_times = process.cpu_times() ... ``` > Thermos landing (host) page reports incorrect CPU rates when it is busy > ----------------------------------------------------------------------- > > Key: AURORA-1939 > URL: https://issues.apache.org/jira/browse/AURORA-1939 > Project: Aurora > Issue Type: Bug > Reporter: Reza Motamedi > Priority: Minor > > Thermos Observer uses `psutil` to monitor resource consumption of Thermos > Processes. On a busy machine, I have noticed negative CPU values when > visiting the Thermos landing page. > In my test I reproduced this by starting many processes that constantly > create short lived children. This indicates that in time between > `process_collector_psutil` looks up the Process children and the time it > calculates the CPU time the pid of the child is actually reused by another > much younger process, which leads to negative CPU times. -- This message was sent by Atlassian JIRA (v6.4.14#64029)