[ 
https://issues.apache.org/jira/browse/AURORA-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093763#comment-16093763
 ] 

Reza Motamedi edited comment on AURORA-1939 at 7/19/17 8:56 PM:
----------------------------------------------------------------

I added more debug logs to trace what is kept in the process tree:
{noformat}
D0719 20:23:05.758181 24474 process_collector_psutil.py:105] Calculated rate 
for pid=62328 and children: 0.000365496747142 (old: 885.570000, new: 885.670000)
{62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866: 
1500007038.32}
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, 
vms=11919360, nice=0, status='sleeping', threads=1), 866: 
ProcessSample(rate=0.0, user=315.39, system=570.16, rss=36704256, 
vms=1601019904, nice=0, status='sleeping', threads=20)}] 
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, 
vms=11919360, nice=0, status='sleeping', threads=1), 866: 
ProcessSample(rate=0.0, user=314.96, system=570.69, rss=36134912, 
vms=1601019904, nice=0, status='sleeping', threads=20)}]

D0719 20:27:44.454570 24474 process_collector_psutil.py:105] Calculated rate 
for pid=62328 and children: 2.10562464635 (old: 885.670000, new: 1472.500000)
{62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866: 
1500007038.32}
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, 
vms=11919360, nice=0, status='sleeping', threads=1), 866: 
ProcessSample(rate=0.0, user=314.96, system=570.69, rss=36134912, 
vms=1601019904, nice=0, status='sleeping', threads=20)}] 
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, 
vms=11919360, nice=0, status='sleeping', threads=1), 866: 
ProcessSample(rate=0.0, user=520.93, system=951.55, rss=35430400, 
vms=1600204800, nice=0, status='sleeping', threads=20)}]

D0719 20:32:29.451004 24474 process_collector_psutil.py:105] Calculated rate 
for pid=62328 and children: 0.00294740555899 (old: 1472.500000, new: 
1473.340000)
{62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866: 
1500007038.32}
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, 
vms=11919360, nice=0, status='sleeping', threads=1), 866: 
ProcessSample(rate=0.0, user=520.93, system=951.55, rss=35430400, 
vms=1600204800, nice=0, status='sleeping', threads=20)}] 
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, 
vms=11919360, nice=0, status='sleeping', threads=1), 866: 
ProcessSample(rate=0.0, user=521.23, system=952.09, rss=35430400, 
vms=1600204800, nice=0, status='sleeping', threads=20)}]

D0719 20:37:10.152525 24474 process_collector_psutil.py:105] Calculated rate 
for pid=62328 and children: 0.00299250250927 (old: 1473.340000, new: 
1474.180000)
{62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866: 
1500007038.32}
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, 
vms=11919360, nice=0, status='sleeping', threads=1), 866: 
ProcessSample(rate=0.0, user=521.23, system=952.09, rss=35430400, 
vms=1600204800, nice=0, status='sleeping', threads=20)}] 
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, 
vms=11919360, nice=0, status='sleeping', threads=1), 866: 
ProcessSample(rate=0.0, user=521.55, system=952.61, rss=35430400, 
vms=1600204800, nice=0, status='sleeping', threads=20)}]

D0719 20:42:58.449567 24474 process_collector_psutil.py:105] Calculated rate 
for pid=62328 and children: -1.64913818086 (old: 1474.180000, new: 899.790000)
{62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866: 
1500007038.32}
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, 
vms=11919360, nice=0, status='sleeping', threads=1), 866: 
ProcessSample(rate=0.0, user=521.55, system=952.61, rss=35430400, 
vms=1600204800, nice=0, status='sleeping', threads=20)}] 
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2732032, 
vms=11919360, nice=0, status='sleeping', threads=1), 866: 
ProcessSample(rate=0.0, user=322.65, system=577.12, rss=35430400, 
vms=1600204800, nice=0, status='sleeping', threads=20)}]
{noformat}

Note: 
{noformat}
{62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866: 
1500007038.32} 
{noformat}
means what processes where in the old and new sample, as in old -> new. The map 
also includes the pid creation time. Therefor, I can role out the problem with 
pid reuse.
The line after that is the ProcessSamples in the old pstree, kept in the object 
as "pid => ProcessSample", and the line after is the new sample.



was (Author: rezam):
I added more debug logs to trace what is kept in the process tree:
{noformat}
D0719 20:23:05.758181 24474 process_collector_psutil.py:105] Calculated rate 
for pid=62328 and children: 0.000365496747142 (old: 885.570000, new: 885.670000)
{62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866: 
1500007038.32}
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, 
vms=11919360, nice=0, status='sleeping', threads=1), 866: 
ProcessSample(rate=0.0, user=315.39, system=570.16, rss=36704256, 
vms=1601019904, nice=0, status='sleeping', threads=20)}] 
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, 
vms=11919360, nice=0, status='sleeping', threads=1), 866: 
ProcessSample(rate=0.0, user=314.96, system=570.69, rss=36134912, 
vms=1601019904, nice=0, status='sleeping', threads=20)}]

D0719 20:27:44.454570 24474 process_collector_psutil.py:105] Calculated rate 
for pid=62328 and children: 2.10562464635 (old: 885.670000, new: 1472.500000)
{62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866: 
1500007038.32}
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, 
vms=11919360, nice=0, status='sleeping', threads=1), 866: 
ProcessSample(rate=0.0, user=314.96, system=570.69, rss=36134912, 
vms=1601019904, nice=0, status='sleeping', threads=20)}] 
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, 
vms=11919360, nice=0, status='sleeping', threads=1), 866: 
ProcessSample(rate=0.0, user=520.93, system=951.55, rss=35430400, 
vms=1600204800, nice=0, status='sleeping', threads=20)}]

D0719 20:32:29.451004 24474 process_collector_psutil.py:105] Calculated rate 
for pid=62328 and children: 0.00294740555899 (old: 1472.500000, new: 
1473.340000)
{62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866: 
1500007038.32}
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, 
vms=11919360, nice=0, status='sleeping', threads=1), 866: 
ProcessSample(rate=0.0, user=520.93, system=951.55, rss=35430400, 
vms=1600204800, nice=0, status='sleeping', threads=20)}] 
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, 
vms=11919360, nice=0, status='sleeping', threads=1), 866: 
ProcessSample(rate=0.0, user=521.23, system=952.09, rss=35430400, 
vms=1600204800, nice=0, status='sleeping', threads=20)}]

D0719 20:37:10.152525 24474 process_collector_psutil.py:105] Calculated rate 
for pid=62328 and children: 0.00299250250927 (old: 1473.340000, new: 
1474.180000)
{62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866: 
1500007038.32}
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, 
vms=11919360, nice=0, status='sleeping', threads=1), 866: 
ProcessSample(rate=0.0, user=521.23, system=952.09, rss=35430400, 
vms=1600204800, nice=0, status='sleeping', threads=20)}] 
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, 
vms=11919360, nice=0, status='sleeping', threads=1), 866: 
ProcessSample(rate=0.0, user=521.55, system=952.61, rss=35430400, 
vms=1600204800, nice=0, status='sleeping', threads=20)}]

D0719 20:42:58.449567 24474 process_collector_psutil.py:105] Calculated rate 
for pid=62328 and children: -1.64913818086 (old: 1474.180000, new: 899.790000)
{62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866: 
1500007038.32}
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, 
vms=11919360, nice=0, status='sleeping', threads=1), 866: 
ProcessSample(rate=0.0, user=521.55, system=952.61, rss=35430400, 
vms=1600204800, nice=0, status='sleeping', threads=20)}] 
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2732032, 
vms=11919360, nice=0, status='sleeping', threads=1), 866: 
ProcessSample(rate=0.0, user=322.65, system=577.12, rss=35430400, 
vms=1600204800, nice=0, status='sleeping', threads=20)}]
{noformat}

Note: {62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866: 
1500007038.32} what processes where in the old and new sample, as in old -> 
new. The line after that is the ProcessSamples in the old pstree, kept in the 
object as "pid => ProcessSample", and the line after is the new sample.


> Thermos landing (host) page reports incorrect CPU rates when it is busy
> -----------------------------------------------------------------------
>
>                 Key: AURORA-1939
>                 URL: https://issues.apache.org/jira/browse/AURORA-1939
>             Project: Aurora
>          Issue Type: Bug
>            Reporter: Reza Motamedi
>            Priority: Minor
>
> Thermos Observer uses `psutil` to monitor resource consumption of Thermos 
> Processes. On a busy machine, I have noticed negative CPU values when 
> visiting the Thermos landing page.
> In my test I reproduced this by starting many processes that constantly 
> create short lived children. This indicates that in time between 
> `process_collector_psutil` looks up the Process children and the time it 
> calculates the CPU time the pid of the child is actually reused by another 
> much younger process, which leads to negative CPU times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to