Hi Emeric,
On 4/10/19 2:20 PM, Emeric Brun wrote:
On 4/10/19 1:02 PM, Marcin Deranek wrote:
Hi Emeric,
Our process limit in QAT configuration is quite high (128) and I was able to
run 100+ openssl processes without a problem. According to Joel from Intel
problem is in cleanup code - presumably when HAProxy exits and frees up QAT
resources. Will try to see if I can get more debug information.
I've just take a look.
Engines deinit ar called:
haproxy/src/ssl_sock.c
#ifndef OPENSSL_NO_ENGINE
void ssl_free_engines(void) {
struct ssl_engine_list *wl, *wlb;
/* free up engine list */
list_for_each_entry_safe(wl, wlb, &openssl_engines, list) {
ENGINE_finish(wl->e);
ENGINE_free(wl->e);
LIST_DEL(&wl->list);
free(wl);
}
}
#endif
...
#ifndef OPENSSL_NO_ENGINE
hap_register_post_deinit(ssl_free_engines);
#endif
I don't know how many haproxy processes you are running but if I describe the
complete scenario of processes you may note that we reach a limit:
It's very unlikely it's the limit as I lowered number of HAProxy
processes (from 10 to 4) while keeping QAT NumProcesses equal 32.
HAProxy would have problem with this limit while spawning new instances
and not tearing down old ones. In such a case QAT would not be
initialized for some HAProxy instances (you would see 1 thread vs 2
thread). About threads read below.
- the master sends a signal to older processes, those process will unbind and
stop to accept new conns but continue to serve remaining sessions until the end.
- new processes are started and immediately and init the engine and accept
newconns.
- When no more sessions remains on an old process, it calls the deinit function
of the engine before exiting
What I noticed is that each HAProxy with QAT enabled has 2 threads (LWP)
- looks like QAT adds extra thread to the process itself. Would adding
extra thread possibly mess up HAProxy termination sequence ?
Our setup is to run HAProxy in multi process mode - no threads (or 1
thread per process if you wish).
I'm also supposed that old processes are stucked because there is some sessions
which never ended, perhaps I'm wrong but a strace on an old process
could be interesting to know why those processes are stucked.
strace only shows these:
[pid 11392] 23:24:43.164619 epoll_wait(4, <unfinished ...>
[pid 11392] 23:24:43.164687 <... epoll_wait resumed> [], 200, 0) = 0
[pid 11392] 23:24:43.164761 epoll_wait(4, <unfinished ...>
[pid 11392] 23:24:43.953203 <... epoll_wait resumed> [], 200, 788) = 0
[pid 11392] 23:24:43.953286 epoll_wait(4, <unfinished ...>
[pid 11392] 23:24:43.953355 <... epoll_wait resumed> [], 200, 0) = 0
[pid 11392] 23:24:43.953419 epoll_wait(4, <unfinished ...>
[pid 11392] 23:24:44.010508 <... epoll_wait resumed> [], 200, 57) = 0
[pid 11392] 23:24:44.010589 epoll_wait(4, <unfinished ...>
There are no connections: stucked process only has UDP socket on random
port:
[root@externallb-124 ~]# lsof -p 6307|fgrep IPv4
hapee-lb 6307 lbengine 83u IPv4 3598779351 0t0
UDP *:19573
You can also use the 'master CLI' using '-S' and you could check if it remains
sessions on those older processes (doc is available in management.txt)
Before reload
* systemd
Main PID: 33515 (hapee-lb)
Memory: 1.6G
CGroup: /system.slice/hapee-1.8-lb.service
├─33515 /opt/hapee-1.8/sbin/hapee-lb -Ws -f
/etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234
├─34858 /opt/hapee-1.8/sbin/hapee-lb -Ws -f
/etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234
├─34859 /opt/hapee-1.8/sbin/hapee-lb -Ws -f
/etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234
├─34860 /opt/hapee-1.8/sbin/hapee-lb -Ws -f
/etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234
└─34861 /opt/hapee-1.8/sbin/hapee-lb -Ws -f
/etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234
* master CLI
show proc
#<PID> <type> <relative PID> <reloads> <uptime>
33515 master 0 0 0d 00h00m31s
# workers
34858 worker 1 0 0d 00h00m31s
34859 worker 2 0 0d 00h00m31s
34860 worker 3 0 0d 00h00m31s
34861 worker 4 0 0d 00h00m31s
After reload:
* systemd
Main PID: 33515 (hapee-lb)
Memory: 3.1G
CGroup: /system.slice/hapee-1.8-lb.service
├─33515 /opt/hapee-1.8/sbin/hapee-lb -Ws -f
/etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234 -sf
34858 34859 34860 34861 -x /run/lb_engine/process-1.sock
├─34858 /opt/hapee-1.8/sbin/hapee-lb -Ws -f
/etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234
├─34859 /opt/hapee-1.8/sbin/hapee-lb -Ws -f
/etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234
├─34860 /opt/hapee-1.8/sbin/hapee-lb -Ws -f
/etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234
├─34861 /opt/hapee-1.8/sbin/hapee-lb -Ws -f
/etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234
├─41871 /opt/hapee-1.8/sbin/hapee-lb -Ws -f
/etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234 -sf
34858 34859 34860 34861 -x /run/lb_engine/process-1.sock
├─41872 /opt/hapee-1.8/sbin/hapee-lb -Ws -f
/etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234 -sf
34858 34859 34860 34861 -x /run/lb_engine/process-1.sock
├─41873 /opt/hapee-1.8/sbin/hapee-lb -Ws -f
/etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234 -sf
34858 34859 34860 34861 -x /run/lb_engine/process-1.sock
└─41874 /opt/hapee-1.8/sbin/hapee-lb -Ws -f
/etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234 -sf
34858 34859 34860 34861 -x /run/lb_engine/process-1.sock
* master CLI
show proc
#<PID> <type> <relative PID> <reloads> <uptime>
33515 master 0 1 0d 00h01m33s
# workers
41871 worker 1 0 0d 00h00m45s
41872 worker 2 0 0d 00h00m45s
41873 worker 3 0 0d 00h00m45s
41874 worker 4 0 0d 00h00m45s
# old workers
34858 worker [was: 1] 1 0d 00h01m33s
34859 worker [was: 2] 1 0d 00h01m33s
34860 worker [was: 3] 1 0d 00h01m33s
34861 worker [was: 4] 1 0d 00h01m33s
and
@!34858 show info
Name: HAProxy
Version: 1.8.0-2.0.0-195.793
Release_date: 2019/03/19
Nbthread: 1
Nbproc: 4
Process_num: 1
Pid: 34858
Uptime: 0d 0h03m24s
Uptime_sec: 204
Memmax_MB: 0
PoolAlloc_MB: 1
PoolUsed_MB: 1
PoolFailed: 0
Ulimit-n: 2006423
CurrConns: 0
CumConns: 354
CumReq: 342
CurrSslConns: 20
CumSslConns: 35928
Maxpipes: 0
PipesUsed: 0
PipesFree: 0
ConnRate: 0
ConnRateLimit: 0
MaxConnRate: 65
SessRate: 0
SessRateLimit: 0
MaxSessRate: 62
SslRate: 0
SslRateLimit: 0
MaxSslRate: 52
SslFrontendKeyRate: 0
SslFrontendMaxKeyRate: 52
SslFrontendSessionReuse_pct: 0
SslBackendKeyRate: 0
SslBackendMaxKeyRate: 2988
SslCacheLookups: 0
SslCacheMisses: 0
CompressBpsIn: 0
CompressBpsOut: 0
CompressBpsRateLim: 0
Tasks: 5849
Run_queue: 1
Idle_pct: 100
Stopping: 1
Jobs: 25
Unstoppable Jobs: 4
Listeners: 4
DroppedLogs: 0
Regards,
Marcin Deranek