Re: [External] Re: QAT intermittent healthcheck errors

Marcin Deranek Wed, 10 Apr 2019 04:03:28 -0700

Hi Emeric,

Our process limit in QAT configuration is quite high (128) and I wasable to run 100+ openssl processes without a problem. According to Joelfrom Intel problem is in cleanup code - presumably when HAProxy exitsand frees up QAT resources. Will try to see if I can get more debuginformation.

Regards,


Marcin Deranek

On 4/9/19 5:17 PM, Emeric Brun wrote:

Hi Marcin,

On 4/9/19 3:07 PM, Marcin Deranek wrote:

Hi Emeric,

I have followed all instructions and I got to the point where HAProxy starts and does the job using 
QAT (backend healthchecks work and I frontend can provide content over HTTPS). The problems starts 
when HAProxy gets reloaded. With our current configuration on reload old HAProxy processes do not 
exit, so after reload you end up with 2 generations of HAProxy processes: before reload and after 
reload. I tried to find out what are conditions in which HAProxy processes get "stuck" 
and I was not able to replicate it consistently. In one case it was related to amount of backend 
servers with 'ssl' on their line, but trying to add 'ssl' to some other servers in other place had 
no effect. Interestingly in some cases for example with simple configuration (1 frontend + 1 
backend) HAProxy produced errors on reload (see attachment) - in those cases processes rarely got 
"stuck" even though errors were present.
/dev/qat_adf_ctl is group writable for the group HAProxy runs on. Any help to 
get this fixed / resolved would be welcome.
Regards,

Marcin Deranek


I've checked the errors.txt and all the messages were written by the engine and 
are not part of the haproxy code. I can only do supposition for now but I think 
we face a first error due to a limitation of the amount of processes trying to 
access the engine: the reload will double the number of processes trying to 
attach the engine. Perhaps this issue can be bypassed tweaking the qat 
configuration file (some advise, from intel would be wellcome).

For the old stucked processes: I think the grow of processes also triggers 
errors on already attached ones in the qat engine but currently I ignore the 
way this errors are/should be raised to the application, it appears that they 
are currently not handled and that's why processes would be stuck (sessions may 
appear still valid for haproxy so the old process continues to wait for their 
end). We expected they were raised by the openssl API but it appears to not be 
the case. We have to check if we miss to handle an error  polling events on the 
file descriptor used to communicate with engine.


So we have to dig deeper and any help from Intel's guy or Qat aware devs will 
be appreciate.

Emeric

Re: [External] Re: QAT intermittent healthcheck errors

Reply via email to