September 22, 2021 3:03 PM, "Andrey Sedletsky via Pdns-users" <pdns-users@mailman.powerdns.com> wrote:
> Good afternoon! Hi Andrey, > After restarting the pdns-recursor process, the number of "outgoing > query timeout" and "over capacity drops" sharply increases, which leads > to serious degradation of the service. > This behavior manifests itself at times of high load on the server (more > than 400 thousand requests per second). With a lower load, restarting > the process does not lead to such consequences. Have you considered the possibility that 400 thousand queries per second is a load that is taxing your server to the brink of resource exhaustion? That sure is a lot of queries. According to https://pc.nanog.org/static/published/meetings/NANOG77/2142/20191029_Spacek_Lightning_Talk_Dns_v2.pdf they were able to achieve a lot less than that in 2019. > We are interested in what could be the reason for this behavior Upon the hunch that your setup might be in an overload scenario i followed 'over-capacity-drops' in the code and ended up at https://github.com/PowerDNS/pdns/blob/97a4cff6fc7b3da1ff44d42b950cfc17d2fd95cf/pdns/pdns_recursor.cc#L3146 so it seems that you have exhausted your thread capacity when that happens. See https://doc.powerdns.com/recursor/performance.html on how to tune the recursor however if that is not benchmark traffic but real world i would strongly suggest getting more servers installed. The SERVFAIL response is just what i would expect in such a case. See https://www.rfc-editor.org/rfc/rfc1035.html#section-4.1.1 . kinds regards, Stefan _______________________________________________ Pdns-users mailing list Pdns-users@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/pdns-users