Hi Emeric,

On 4/10/19 2:20 PM, Emeric Brun wrote:

On 4/10/19 1:02 PM, Marcin Deranek wrote:
Hi Emeric,

Our process limit in QAT configuration is quite high (128) and I was able to 
run 100+ openssl processes without a problem. According to Joel from Intel 
problem is in cleanup code - presumably when HAProxy exits and frees up QAT 
resources. Will try to see if I can get more debug information.

I've just take a look.

Engines deinit ar called:

haproxy/src/ssl_sock.c
#ifndef OPENSSL_NO_ENGINE
void ssl_free_engines(void) {
         struct ssl_engine_list *wl, *wlb;
         /* free up engine list */
         list_for_each_entry_safe(wl, wlb, &openssl_engines, list) {
                 ENGINE_finish(wl->e);
                 ENGINE_free(wl->e);
                 LIST_DEL(&wl->list);
                 free(wl);
         }
}
#endif
...
#ifndef OPENSSL_NO_ENGINE
         hap_register_post_deinit(ssl_free_engines);
#endif

I don't know how many haproxy processes you are running but if I describe the 
complete scenario of processes you may note that we reach a limit:

It's very unlikely it's the limit as I lowered number of HAProxy processes (from 10 to 4) while keeping QAT NumProcesses equal 32. HAProxy would have problem with this limit while spawning new instances and not tearing down old ones. In such a case QAT would not be initialized for some HAProxy instances (you would see 1 thread vs 2 thread). About threads read below.

- the master sends a signal to older processes, those process will unbind and 
stop to accept new conns but continue to serve remaining sessions until the end.
- new processes are started and immediately and init the engine and accept 
newconns.
- When no more sessions remains on an old process, it calls the deinit function 
of the engine before exiting

What I noticed is that each HAProxy with QAT enabled has 2 threads (LWP) - looks like QAT adds extra thread to the process itself. Would adding extra thread possibly mess up HAProxy termination sequence ? Our setup is to run HAProxy in multi process mode - no threads (or 1 thread per process if you wish).

I'm also supposed that old processes are stucked because there is some sessions 
which never ended, perhaps I'm wrong but a strace on an old process
could be interesting to know why those processes are stucked.

strace only shows these:

[pid 11392] 23:24:43.164619 epoll_wait(4,  <unfinished ...>
[pid 11392] 23:24:43.164687 <... epoll_wait resumed> [], 200, 0) = 0
[pid 11392] 23:24:43.164761 epoll_wait(4,  <unfinished ...>
[pid 11392] 23:24:43.953203 <... epoll_wait resumed> [], 200, 788) = 0
[pid 11392] 23:24:43.953286 epoll_wait(4,  <unfinished ...>
[pid 11392] 23:24:43.953355 <... epoll_wait resumed> [], 200, 0) = 0
[pid 11392] 23:24:43.953419 epoll_wait(4,  <unfinished ...>
[pid 11392] 23:24:44.010508 <... epoll_wait resumed> [], 200, 57) = 0
[pid 11392] 23:24:44.010589 epoll_wait(4,  <unfinished ...>

There are no connections: stucked process only has UDP socket on random port:

[root@externallb-124 ~]# lsof -p 6307|fgrep IPv4
hapee-lb 6307 lbengine 83u IPv4 3598779351 0t0 UDP *:19573


You can also use the 'master CLI' using '-S' and you could check if it remains 
sessions on those older processes (doc is available in management.txt)

Before reload
* systemd
 Main PID: 33515 (hapee-lb)
   Memory: 1.6G
   CGroup: /system.slice/hapee-1.8-lb.service
├─33515 /opt/hapee-1.8/sbin/hapee-lb -Ws -f /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234 ├─34858 /opt/hapee-1.8/sbin/hapee-lb -Ws -f /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234 ├─34859 /opt/hapee-1.8/sbin/hapee-lb -Ws -f /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234 ├─34860 /opt/hapee-1.8/sbin/hapee-lb -Ws -f /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234 └─34861 /opt/hapee-1.8/sbin/hapee-lb -Ws -f /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234
* master CLI
show proc
#<PID>          <type>          <relative PID>  <reloads>       <uptime>
33515           master          0               0               0d 00h00m31s
# workers
34858           worker          1               0               0d 00h00m31s
34859           worker          2               0               0d 00h00m31s
34860           worker          3               0               0d 00h00m31s
34861           worker          4               0               0d 00h00m31s

After reload:
* systemd
 Main PID: 33515 (hapee-lb)
   Memory: 3.1G
   CGroup: /system.slice/hapee-1.8-lb.service
├─33515 /opt/hapee-1.8/sbin/hapee-lb -Ws -f /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234 -sf 34858 34859 34860 34861 -x /run/lb_engine/process-1.sock ├─34858 /opt/hapee-1.8/sbin/hapee-lb -Ws -f /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234 ├─34859 /opt/hapee-1.8/sbin/hapee-lb -Ws -f /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234 ├─34860 /opt/hapee-1.8/sbin/hapee-lb -Ws -f /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234 ├─34861 /opt/hapee-1.8/sbin/hapee-lb -Ws -f /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234 ├─41871 /opt/hapee-1.8/sbin/hapee-lb -Ws -f /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234 -sf 34858 34859 34860 34861 -x /run/lb_engine/process-1.sock ├─41872 /opt/hapee-1.8/sbin/hapee-lb -Ws -f /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234 -sf 34858 34859 34860 34861 -x /run/lb_engine/process-1.sock ├─41873 /opt/hapee-1.8/sbin/hapee-lb -Ws -f /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234 -sf 34858 34859 34860 34861 -x /run/lb_engine/process-1.sock └─41874 /opt/hapee-1.8/sbin/hapee-lb -Ws -f /etc/lb_engine/haproxy.cfg -p /run/hapee-lb.pid -S 127.0.0.1:1234 -sf 34858 34859 34860 34861 -x /run/lb_engine/process-1.sock
* master CLI
show proc
#<PID>          <type>          <relative PID>  <reloads>       <uptime>
33515           master          0               1               0d 00h01m33s
# workers
41871           worker          1               0               0d 00h00m45s
41872           worker          2               0               0d 00h00m45s
41873           worker          3               0               0d 00h00m45s
41874           worker          4               0               0d 00h00m45s
# old workers
34858           worker          [was: 1]        1               0d 00h01m33s
34859           worker          [was: 2]        1               0d 00h01m33s
34860           worker          [was: 3]        1               0d 00h01m33s
34861           worker          [was: 4]        1               0d 00h01m33s

and

@!34858 show info
Name: HAProxy
Version: 1.8.0-2.0.0-195.793
Release_date: 2019/03/19
Nbthread: 1
Nbproc: 4
Process_num: 1
Pid: 34858
Uptime: 0d 0h03m24s
Uptime_sec: 204
Memmax_MB: 0
PoolAlloc_MB: 1
PoolUsed_MB: 1
PoolFailed: 0
Ulimit-n: 2006423
CurrConns: 0
CumConns: 354
CumReq: 342
CurrSslConns: 20
CumSslConns: 35928
Maxpipes: 0
PipesUsed: 0
PipesFree: 0
ConnRate: 0
ConnRateLimit: 0
MaxConnRate: 65
SessRate: 0
SessRateLimit: 0
MaxSessRate: 62
SslRate: 0
SslRateLimit: 0
MaxSslRate: 52
SslFrontendKeyRate: 0
SslFrontendMaxKeyRate: 52
SslFrontendSessionReuse_pct: 0
SslBackendKeyRate: 0
SslBackendMaxKeyRate: 2988
SslCacheLookups: 0
SslCacheMisses: 0
CompressBpsIn: 0
CompressBpsOut: 0
CompressBpsRateLim: 0
Tasks: 5849
Run_queue: 1
Idle_pct: 100
Stopping: 1
Jobs: 25
Unstoppable Jobs: 4
Listeners: 4
DroppedLogs: 0

Regards,

Marcin Deranek

Reply via email to