On 7 Aug 2025, at 20:53, brent saner via NANOG wrote:
> On Thu, Aug 7, 2025, 20:45 DurgaPrasad - DatasoftComnet via NANOG <
> [email protected]> wrote:
>
>> Hello all,
>> Do you have any recommendations for recursive DNS servers for a medium
>> sized (20-30k users) ISP.
>> We have used powerdns and unbound but sometimes find the caching times a
>> bit on upper side. Any suggestions between these two or anything new?
>> Also need points on how much we tune the settings
>> pros and cons if any.
>>
>> Thank you /DP
>
> <https://lists.nanog.org/archives/list/[email protected]/message/SUTKDISSISPWQY3YGF25FBQNN2JD5HDP/>
>
>
> It's surprising that you didn't get the performance you hoped for out of
> PowerDNS. You already tried the suggestions in their tuning guide[0], I'm
> assuming?
>
> You may also want to load in entire zones to the hot cache[1].
>
> And there's always horizontal scaling; sometimes you just plain hit limits
> on vertical scale.
>
> I haven't tried it yet, but dnsdist[2] should let you do this.
> (Or keepalived and/or HAproxy, or... etc. Any loadbalancer that can handle
> raw TCP and UDP.)
> Dnsdist in particular seems explicitly targeted towards a large set of
> untrusted clients with additional optional "safeguarding/consumer
> protection" features. Quad9 uses it in some fashion, if I recall correctly.
>
> [0] https://doc.powerdns.com/recursor/performance.html
> [1] https://docs.powerdns.com/recursor/lua-config/ztc.html
> [2] https://www.dnsdist.org/index.html
You beat me to it - dnsdist is an exceptionally robust solution for
front-ending recursive (or authoritative) servers. Quad9 is indeed using it for
all our recursive systems, and we split traffic on the "back-end" between
PowerDNS recursor and Unbound. It (dnsdist) has a "packet cache" feature which
handles much of the load once warmed, and it answers on DOT/DOH as well as
providing for a very rich set of tooling that allows management of unwanted
behaviors. The combination of dnsdist plus a good recursive resolver should
easily be able to handle 30k users on a single modest chassis with ease, though
of course it there are very good reasons to have several systems similarly
configured in fail-over models using ECMP or your favorite routing protocol.
Hot caches work better - try not to spread load too much.) At this point, I
can't imagine running a recursive system that is open to anything other than a
tiny number of users without ensuring that dnsdist is in front of it - it's exa
ctly the right thing and has been sandblasted by a lot of trial-and-error to
make it fast and reliable with lots of features for ISP environments.
If a decent-sized system doesn't seem fast, there may be some other underlying
issue that is at the root of a perceived speed issue. There is useful data that
can be pulled out of dnsdist with prometheus-style outputs - I would suggest
instrumenting things and seeing where the problems are.
Now, the original question of "points on how much we tune the settings" - that
is a much longer discussion, but honestly you can get to 80% goodput without
too much fiddling.
JT
_______________________________________________
NANOG mailing list
https://lists.nanog.org/archives/list/[email protected]/message/J4WSKWYCIV7KTCVWXDWT64IGHKQZHERB/