Hi Marcin, On 4/9/19 3:07 PM, Marcin Deranek wrote: > Hi Emeric, > > I have followed all instructions and I got to the point where HAProxy starts > and does the job using QAT (backend healthchecks work and I frontend can > provide content over HTTPS). The problems starts when HAProxy gets reloaded. > With our current configuration on reload old HAProxy processes do not exit, > so after reload you end up with 2 generations of HAProxy processes: before > reload and after reload. I tried to find out what are conditions in which > HAProxy processes get "stuck" and I was not able to replicate it > consistently. In one case it was related to amount of backend servers with > 'ssl' on their line, but trying to add 'ssl' to some other servers in other > place had no effect. Interestingly in some cases for example with simple > configuration (1 frontend + 1 backend) HAProxy produced errors on reload (see > attachment) - in those cases processes rarely got "stuck" even though errors > were present. > /dev/qat_adf_ctl is group writable for the group HAProxy runs on. Any help to > get this fixed / resolved would be welcome. > Regards, > > Marcin Deranek
I've checked the errors.txt and all the messages were written by the engine and are not part of the haproxy code. I can only do supposition for now but I think we face a first error due to a limitation of the amount of processes trying to access the engine: the reload will double the number of processes trying to attach the engine. Perhaps this issue can be bypassed tweaking the qat configuration file (some advise, from intel would be wellcome). For the old stucked processes: I think the grow of processes also triggers errors on already attached ones in the qat engine but currently I ignore the way this errors are/should be raised to the application, it appears that they are currently not handled and that's why processes would be stuck (sessions may appear still valid for haproxy so the old process continues to wait for their end). We expected they were raised by the openssl API but it appears to not be the case. We have to check if we miss to handle an error polling events on the file descriptor used to communicate with engine. So we have to dig deeper and any help from Intel's guy or Qat aware devs will be appreciate. Emeric