On 12/04/2013 02:10 AM, Willy Tarreau wrote:
We happen to have another CPU we purchased to be good with highly
>threaded Java apps: Intel Xeon CPU E5-2670 0 @ 2.60GHz
>
>It also has a L2 cache per core. This CPU has performed significantly
>better in both "many" and "a few" threaded workloads. Somewhat
>surprisingly with a single haproxy I'm only able to get to around 23 k
>req/s (vs 20k with the much older Opteron).
Could you please run "top" during the test, and press "1" to have the
per-cpu measures ? Simply copy-paste it and send it here so that we can
check what to improve.
This is with the simple no nbproc setup on the E5-2670. So cpu0 has
interprets, and haproxy is pinned cpu1. SSL terminators and whatnot run
on the other cores. top snapshot was taken at the point the lb could
handle no more load and the Queue was growing.
top - 11:51:42 up 1:55, 3 users, load average: 0.68, 0.55, 0.39
Tasks: 212 total, 2 running, 210 sleeping, 0 stopped, 0 zombie
Cpu0 : 19.7%us, 1.7%sy, 0.0%ni, 10.5%id, 0.0%wa, 0.0%hi, 68.1%si,
0.0%st
Cpu1 : 26.9%us, 72.1%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 1.0%si,
0.0%st
Cpu2 : 24.7%us, 2.0%sy, 0.0%ni, 73.3%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Cpu3 : 17.7%us, 1.3%sy, 0.0%ni, 80.9%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Cpu4 : 8.4%us, 1.0%sy, 0.0%ni, 90.6%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Cpu5 : 3.3%us, 1.0%sy, 0.0%ni, 95.7%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Cpu6 : 2.7%us, 0.3%sy, 0.0%ni, 97.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Cpu7 : 2.0%us, 0.3%sy, 0.0%ni, 97.7%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 32838864k total, 5523920k used, 27314944k free, 26876k buffers
Swap: 67108856k total, 0k used, 67108856k free, 700888k cached