Re: top(1), ps(1): per-process CPU time accounting wrong?

2015-09-07 Thread Ted Unangst
Timo Buhrmester wrote:
> On Sun, Sep 06, 2015 at 06:34:39PM -0400, Michael McConville wrote:
> > When building software, you usually have a lot of compiler processes
> > coming and going. The CPU utilization stats (in the header) are more
> > averaged than the process list stats. So, when building you're likely to
> > see a lot of CPU utilization in the per-CPU stats but no offending
> > process in the list.
> 
> Hm, I guess that would explain the low apparent CPU usage *if* processes are 
> shown in top(1)'s list, but most of the time, the list (of non-idle 
> processes) was empty.  That still strikes me as odd, because regardless of 
> averaging, there's definitely a few processes running or runnable at any 
> given time, while compiling.
> Or does top(1) do a second pass to verify a process is still alive, before 
> printing it (but after getting the statistics in the first place)?

The minutiae of process cpu% calculations are in sys/kern/sched_bsd.c



Re: top(1), ps(1): per-process CPU time accounting wrong?

2015-09-07 Thread Timo Buhrmester
On Sun, Sep 06, 2015 at 06:34:39PM -0400, Michael McConville wrote:
> When building software, you usually have a lot of compiler processes
> coming and going. The CPU utilization stats (in the header) are more
> averaged than the process list stats. So, when building you're likely to
> see a lot of CPU utilization in the per-CPU stats but no offending
> process in the list.

Hm, I guess that would explain the low apparent CPU usage *if* processes are 
shown in top(1)'s list, but most of the time, the list (of non-idle processes) 
was empty.  That still strikes me as odd, because regardless of averaging, 
there's definitely a few processes running or runnable at any given time, while 
compiling.
Or does top(1) do a second pass to verify a process is still alive, before 
printing it (but after getting the statistics in the first place)?

Thanks for your reply.



Re: top(1), ps(1): per-process CPU time accounting wrong?

2015-09-06 Thread Michael McConville
Timo Buhrmester wrote:
> On -current amd64 (GENERIC and GENERIC.MP), per-process CPU time
> accounting seems wrong to me, judging from watching top(1) and ps(1)
> while compiling stuff.
> 
> [...]
> 
> Occasionally the pertinent programs do show up, but with *very* little 
> apparent CPU usage:
> | load averages:  2.87,  2.00,  1.63 flap.localdomain 
> 23:47:19
> | 38 processes: 36 idle, 2 on processor
> | CPU0 states: 60.0% user,  0.0% nice,  0.0% system,  0.0% interrupt, 40.0% 
> idle
> | CPU1 states: 42.3% user,  0.0% nice,  0.0% system,  0.0% interrupt, 57.7% 
> idle
> | Memory: Real: 49M/1139M act/tot Free: 2077M Cache: 643M Swap: 0K/2224M
> | 
> |   PID USERNAME PRI NICE  SIZE   RES STATE WAIT  TIMECPU COMMAND
> | 11425 root  640   19M   20M onproc- 0:00  0.29% cc1
> | 14278 root  -60 4380K 5172K sleep piperd0:00  0.10% as
> |  7894 root  180 2268K 4388K sleep pause 0:01  0.05% make
> | 19935 root  100  528K 1492K sleep wait  0:00  0.05% cc

When building software, you usually have a lot of compiler processes
coming and going. The CPU utilization stats (in the header) are more
averaged than the process list stats. So, when building you're likely to
see a lot of CPU utilization in the per-CPU stats but no offending
process in the list.



Re: top(1), ps(1): per-process CPU time accounting wrong?

2015-09-05 Thread Michael McConville
Timo Buhrmester wrote:
> On -current amd64 (GENERIC and GENERIC.MP), per-process CPU time
> accounting seems wrong to me, judging from watching top(1) and ps(1)
> while compiling stuff.
> 
> The system is under load, building an OpenBSD release, but top(1) and
> ps(1) look like there's not much going on: Most of the time, top(1)
> (with idle processes hidden) shows the load and CPU usage, but no
> processes that are actually consuming the CPU time:
> | load averages:  2.97,  2.06,  1.66 flap.localdomain 
> 23:47:04
> | 38 processes: 36 idle, 2 on processor
> | CPU0 states: 50.7% user,  0.0% nice, 15.4% system,  0.2% interrupt, 33.7% 
> idle
> | CPU1 states: 34.1% user,  0.0% nice,  9.4% system,  0.0% interrupt, 56.5% 
> idle
> | Memory: Real: 36M/1127M act/tot Free: 2088M Cache: 643M Swap: 0K/2224M
> | 
> |   PID USERNAME PRI NICE  SIZE   RES STATE WAIT  TIMECPU COMMAND

There's just over one core being consumed here. If there's one hungry
single-threaded process, its load can appear split between multiple
cores because it gets context switched a bunch of times in each sampling
interval.



top(1), ps(1): per-process CPU time accounting wrong?

2015-09-05 Thread Timo Buhrmester
On -current amd64 (GENERIC and GENERIC.MP), per-process CPU time accounting 
seems wrong to me, judging from watching top(1) and ps(1) while compiling stuff.

The system is under load, building an OpenBSD release, but top(1) and ps(1) 
look like there's not much going on:
Most of the time, top(1) (with idle processes hidden) shows the load and CPU 
usage, but no processes that are actually consuming the CPU time:
| load averages:  2.97,  2.06,  1.66 flap.localdomain 
23:47:04
| 38 processes: 36 idle, 2 on processor
| CPU0 states: 50.7% user,  0.0% nice, 15.4% system,  0.2% interrupt, 33.7% idle
| CPU1 states: 34.1% user,  0.0% nice,  9.4% system,  0.0% interrupt, 56.5% idle
| Memory: Real: 36M/1127M act/tot Free: 2088M Cache: 643M Swap: 0K/2224M
| 
|   PID USERNAME PRI NICE  SIZE   RES STATE WAIT  TIMECPU COMMAND


Occasionally the pertinent programs do show up, but with *very* little apparent 
CPU usage:
| load averages:  2.87,  2.00,  1.63 flap.localdomain 
23:47:19
| 38 processes: 36 idle, 2 on processor
| CPU0 states: 60.0% user,  0.0% nice,  0.0% system,  0.0% interrupt, 40.0% idle
| CPU1 states: 42.3% user,  0.0% nice,  0.0% system,  0.0% interrupt, 57.7% idle
| Memory: Real: 49M/1139M act/tot Free: 2077M Cache: 643M Swap: 0K/2224M
| 
|   PID USERNAME PRI NICE  SIZE   RES STATE WAIT  TIMECPU COMMAND
| 11425 root  640   19M   20M onproc- 0:00  0.29% cc1
| 14278 root  -60 4380K 5172K sleep piperd0:00  0.10% as
|  7894 root  180 2268K 4388K sleep pause 0:01  0.05% make
| 19935 root  100  528K 1492K sleep wait  0:00  0.05% cc



ps aux seems to get it wrong, too:
| # ps aux
| USER   PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED   TIME COMMAND
| [...]
| root 19561  0.0  0.0   836   904 p0  Is11:03PM0:00.18 -ksh (ksh)
| root 11801  0.0  0.1   764  1700 p0  I+11:40PM0:00.01 make release
| root  7894  0.1  0.1  2296   p0  S+11:44PM0:00.65 make
| root 22634  0.0  0.0   628   696 p0  S+11:47PM0:00.00 /bin/sh -ec 
cc [...]
| root  2566  0.8  0.5 18672 18556 p0  R+11:47PM0:00.37 
/usr/lib/gcc-lib/amd64-unknown-openbsd5.8/4.2.1/cc1 [...]
| root 30930  0.0  0.0   524  1484 p0  S+11:47PM0:00.00 cc [...]
| root 16823  0.0  0.2  4384  5176 p0  S+11:47PM0:00.01 as -Qy -o 
trees.o -
| [...]


ps axl does show at least some `short term' usage:
| # ps axl 
|   UID   PID  PPID CPU PRI  NI   VSZ   RSS WCHAN   STAT  TT   TIME COMMAND
| [...]
| 0 19561 14740   0  18   0   836   904 pause   Isp00:00.18 -ksh 
(ksh)
| 0 11801 19561  27  18   0   764  1700 pause   I+p00:00.01 make 
release
| 0  7894 11801  36  18   0  2340  4520 pause   S+p00:00.89 make
| 0   621 21260  36  10   0   528  1476 waitS+p00:00.00 cc [...]
| 0 21260  7894  36  18   0   624   688 pause   S+p00:00.00 /bin/sh 
-ec cc  [...]
| 0  9291   621  36  64   0 11944 12180 -   R+p00:00.07 
/usr/lib/gcc-lib/amd64-unknown-openbsd5.8/4.2.1/cc1 [...]
| 0  5156   621  36  -6   0  4388  5140 piperd  S+p00:00.01 as -Qy 
-o uhci_pci.o -
| [...]


Any idea as for what's happening here?  Or is this as-expected an I'm 
misinterpreting the values?


Cheers,
Timo Buhrmester


PS: Attached the output of dmesg(8) just in case it matters
# dmesg
OpenBSD 5.8-current (GENERIC.MP) #0: Sat Sep  5 23:00:31 CEST 2015
fstd@flap.localdomain:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 3472392192 (3311MB)
avail mem = 3363315712 (3207MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC MCFG OEMX ECDT DBGP BOOT OEMB HPET SSDT
acpi0: wakeup devices SBAZ(S4) P0PC(S4) OHC0(S4) OHC1(S4) EHC0(S4) OHC2(S4) 
OHC3(S4) EHC1(S4) OHC4(S4) PCE2(S4) PCE5(S4) GLAN(S4) PCE6(S4) PCE7(S4) 
PWRB(S4) SLPB(S4)
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Athlon(tm) II Dual-Core M300, 2001.03 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,PAGE1GB,LONG,3DNOW2,3DNOW,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,3DNOWP,OSVW,IBS,SKINIT,ITSC
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu0: ITLB 32 4KB entries fully associative, 16 4MB entries fully associative
cpu0: DTLB 48 4KB entries fully associative, 48 4MB entries fully associative
cpu0: AMD erratum 721 detected and fixed
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 200MHz
cpu0: mwait min=64, max=64, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: AMD Athlon(tm) II Dual-Core M300, 2000.06 MH