Hi
On 05/10/2023 10:41, Aleš Rygl via dnsdist wrote:
Thanks for your response. After some deep documentation reading
and config tweaking I am nearly on the previous values regarding CPU
load, apart from latency, which is still higher (1.3ms -> 2.3ms). I
suspect a different way the latency is likely computed (I noticed a
new set of latency counters for TLS, TCP, etc.) here. The key
configuration parameter is setMaxTCPClientThreads(). Changing
anything else (cache shards, number of listeners, etc.) has nearly no
impact. We had 256 with 1.7.4. now it is 16. Going up here means a
rapid increase of CPU load, having less than 16 means dropping TCP
connections in showTCPStats(), where Queued hits Max Queued. Insane
values like 1024 kills the CPU. We have a physical server with 16
phys. cores, OS sees 32 cores.
OK, this is clearly unexpected. I mean, since 1.4.0 you should not be
needing more TCP worker threads than the number of cores, since a
single worker can handle a lot (easily thousands) of TCP connections,
but having a larger value should not kill the CPU so I'm wondering if
we are busy-looping somewhere. I have not been able to reproduce that
so far, so I would be really interested in seeing the perf output if
you can get it.
Update: after some testing I can say that dnsdist 1.7.4 on Bookworm has
the same issue as 1.8.1. The reason is apparently here:
https://github.com/openssl/openssl/issues/17064. There is a safe
workaround - lowering setMaxTCPClientThreads(). Watch out TCP queueing -
use showTCPStats(). And improving TLS performance using STEK file can
help as well.
I'd like to thank Remi for his excellent support.
Ales
_______________________________________________
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist