Hello --

I'm running into an issue where sar, mpstat, top, and other tools show
less cpu utilization compared to emon [1]. Sar uses /proc/stat as its
source, and was configured to collect in 1s intervals. Emon reads
hardware counter MSRs in the PMU in timer intervals, 0.1s for this
scenario.

The platform is based on Xeon E5-2699 v3 (Haswell) 2.3GHz, 2_sockets,
18_cores/socket, 36_cores in total, running Ubuntu 16.04, Linux
4.4.0-128-generic. A network micro workload, ntttcp-for-linux [2],
sends packets from client to server, through a 40GbE direct link.
Numbers below are from server side.

                 total %util
           CPU11    CPU21    CPU22    CPU25
emon       99.99    15.90    36.22    36.82
sar        99.99     0.06     0.36     0.35

                 interrupts/sec
           CPU11    CPU21    CPU22    CPU25
intrs/sec    846    28923    12844     6304
    Contributors to /proc/interrupts:
    CPU11: Local timer interrupts and Rescheduling interrupts
    CPU21-CPU25: PCI MSI vector from network driver

                 softirqs/sec
           CPU11    CPU21    CPU22    CPU25
TIMER        198        1        2        1
NET_RX         1    28889    23553    18546
TASKLET        0    28889    11676     6249


Somehow hardware irqs and softirqs do not have an effect on the core's
utilization. Another observation is that as more cores are used to
process packets, the emon/sar gap increases.

Kernels used default HZ=250. I also tried HZ=1000, which helped improve
throughput, but difference in util is still there. Same for newer
kernels 4.13, 4.15. I would appreciate pointers to debug this, or
insights as what could cause this behavior.

[1] https://software.intel.com/en-us/download/emon-users-guide
[2] https://github.com/simonxiaoss/ntttcp-for-linux

Thanks,
-Solio

Reply via email to