Hi Marcin,

On 3/6/19 3:23 PM, Marcin Deranek wrote:
> Hi,
> 
> In a process of evaluating performance of Intel Quick Assist Technology in 
> conjunction with HAProxy software I acquired Intel C62x Chipset card for 
> testing. I configured QAT engine in the following manner:
> 
> * /etc/qat/c6xx_dev[012].conf
> 
> [GENERAL]
> ServicesEnabled = cy
> ConfigVersion = 2
> CyNumConcurrentSymRequests = 512
> CyNumConcurrentAsymRequests = 64
> statsGeneral = 1
> statsDh = 1
> statsDrbg = 1
> statsDsa = 1
> statsEcc = 1
> statsKeyGen = 1
> statsDc = 1
> statsLn = 1
> statsPrime = 1
> statsRsa = 1
> statsSym = 1
> KptEnabled = 0
> StorageEnabled = 0
> PkeServiceDisabled = 0
> DcIntermediateBufferSizeInKB = 64
> 
> [KERNEL]
> NumberCyInstances = 0
> NumberDcInstances = 0
> 
> [SHIM]
> NumberCyInstances = 1
> NumberDcInstances = 0
> NumProcesses = 16
> LimitDevAccess = 0
> 
> Cy0Name = "UserCY0"
> Cy0IsPolled = 1
> Cy0CoreAffinity = 0
> 
> OpenSSL produces good results without warnings / errors:
> 
> * No QAT involved
> 
> $ openssl speed -elapsed rsa2048
> You have chosen to measure elapsed time instead of user CPU time.
> Doing 2048 bits private rsa's for 10s: 10858 2048 bits private RSA's in 10.00s
> Doing 2048 bits public rsa's for 10s: 361207 2048 bits public RSA's in 10.00s
> OpenSSL 1.1.1a FIPS  20 Nov 2018
> built on: Tue Jan 22 20:43:41 2019 UTC
> options:bn(64,64) md2(char) rc4(16x,int) des(int) aes(partial) idea(int) 
> blowfish(ptr)
> compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -O2 -g -pipe 
> -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong 
> --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic 
> -Wa,--noexecstack -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC 
> -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT 
> -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM 
> -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM 
> -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM 
> -DPOLY1305_ASM -DZLIB -DNDEBUG -DPURIFY -DDEVRANDOM="\"/dev/urandom\"" 
> -DSYSTEM_CIPHERS_FILE="/opt/openssl/etc/crypto-policies/back-ends/openssl.config"
>                   sign    verify    sign/s verify/s
> rsa 2048 bits 0.000921s 0.000028s   1085.8  36120.7
> 
> * QAT enabled
> 
> $ openssl speed -elapsed -engine qat -async_jobs 32 rsa2048
> engine "qat" set.
> You have chosen to measure elapsed time instead of user CPU time.
> Doing 2048 bits private rsa's for 10s: 205425 2048 bits private RSA's in 
> 10.00s
> Doing 2048 bits public rsa's for 10s: 2150270 2048 bits public RSA's in 10.00s
> OpenSSL 1.1.1a FIPS  20 Nov 2018
> built on: Tue Jan 22 20:43:41 2019 UTC
> options:bn(64,64) md2(char) rc4(16x,int) des(int) aes(partial) idea(int) 
> blowfish(ptr)
> compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -O2 -g -pipe 
> -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong 
> --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic 
> -Wa,--noexecstack -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC 
> -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT 
> -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM 
> -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM 
> -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM 
> -DPOLY1305_ASM -DZLIB -DNDEBUG -DPURIFY -DDEVRANDOM="\"/dev/urandom\"" 
> -DSYSTEM_CIPHERS_FILE="/opt/openssl/etc/crypto-policies/back-ends/openssl.config"
>                   sign    verify    sign/s verify/s
> rsa 2048 bits 0.000049s 0.000005s  20542.5 215027.0
> 
> So far so good. Unfortunately HAProxy 1.8 iwth QAT engine enabled 
> periodically fail with SSL checks of backend servers. The simplest 
> configuration I could get to reproduce it:
> 
> * /etc/haproxy/haproxy.cfg
> 
> global
>     user lbengine
>     group lbengine
>     daemon
>     ssl-mode-async
>     ssl-engine qat
>     ssl-server-verify none
>     stats   socket     /run/lb_engine/process-1.sock user lbengine group 
> lbengine mode 660 level admin expose-fd listeners process 1
> 
> defaults
>     mode http
>     timeout check 5s
>     timeout connect 4s
> 
> backend pool_all
>     default-server inter 5s
> 
>     server server1 ip1:443 check ssl
>     server server2 ip2:443 check ssl
>     ...
>     server serverN ipN:443 check ssl
> 
> Without QAT enabled everything works just fine - healthchecks do not flap. 
> With QAT engine enabled random server healtchecks flap: they fail and then 
> shortly after they recover eg.
> 
> 2019-03-06T15:06:22+01:00 localhost hapee-lb[1832]: Server pool_all/server1 
> is DOWN, reason: Layer6 timeout, check duration: 4000ms. 110 active and 0 
> backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
> 2019-03-06T15:06:32+01:00 localhost hapee-lb[1832]: Server pool_all/server1 
> is UP, reason: Layer6 check passed, check duration: 13ms. 117 active and 0 
> backup servers online. 0 sessions requeued, 0 total in queue.
> 
> Increasing check frequency (lowering check interval) makes the problem occur 
> more frequently. Anybody has a clue why this is happening ? Has anybody seen 
> such behavior ?
> Regards,
> 
> Marcin Deranek
> 

According to the documentation:

ssl-mode-async
  Adds SSL_MODE_ASYNC mode to the SSL context. This enables asynchronous TLS
  I/O operations if asynchronous capable SSL engines are used. The current
  implementation supports a maximum of 32 engines. The Openssl ASYNC API
  doesn't support moving read/write buffers and is not compliant with
  haproxy's buffer management. So the asynchronous mode is disabled on
  read/write  operations (it is only enabled during initial and reneg
  handshakes).

Asynchronous mode is disabled on the read/write operation and is only enabled 
during handshake.

It means that for the ciphering process the engine will be used in blocking 
mode (not async) which could result to
unpredictable behavior on timers because the haproxy process will sporadically 
fully blocked waiting for the engine.

To avoid this issue, you should ensure to use QAT only for the asymmetric 
computing algorithm (such as RSA DSA ECDSA).
and not for ciphering ones (AES and everything else ...) 

The ssl engine statement allow you to filter such algos:

ssl-engine <name> [algo <comma-separated list of algorithms>]

R,
Emeric

Reply via email to