Hi Lucas,

On 6/26/25 13:02, Lucas Rolff via dnsdist wrote:
dnsdist by default uses leastOutstanding load balancing policy which in certain 
cases takes the lowest measured latency into account based on the last 128 
queries answered by the downstream

Correct, if several servers have the same number of outstanding queries their latency is used to break the tie.

My first (somewhat simple) question is, is there a way to make health checks 
count towards the latency measurements, currently it doesn't seem to take the 
health check queries into account in the latency metric. While I understand not 
everyone may want this, I wonder if there's some way (even if custom Lua) to 
make that happen.

Not without patching the code, I'm afraid.
My second question, is more about a custom policy in Lua
Since latency based load balancing isn't currently a thing, this can be 
implemented into Lua, so that the selected downstream server will be the lowest 
latency (online) server.

This can be done by looping over the servers available, checking if the server 
is up using :isUp() and then using the :getLatency() to figure out the latency, 
this works great most of the time, however:
1: If dnsdist restarts, the latency across all nodes will be super low, because it seems 
to use a fixed size list, where every "empty" value is `0`. As a result when 
the average is calculated across 128 values (many of which are zero initially), this may 
cause some weird routing.

True, it takes a few queries for the value to become useful.

I wonder if there's a way to get (currently in Lua) the number of downstream 
queries (e.g. as exposed in `showServers()` for each individual server. I see 
there's a :getDrops() method available, but seemingly no :getQueries() - is 
there another way we can somehow get these, while still being fast enough to 
execute on every upstream query (when the load balancing takes place).

Not without patching the code. It would only take a few lines of C++ to make it available, though.
2: A bit related to the first question, if we then decide to select the lowest 
latency server, because the other downstreams no longer get queries, we also 
don't get updated latency metrics, as you know sometimes routing on the 
interwebs change, and this may affect the latency. Thus if we could e.g. take 
the health checking measurements into account, this would at the same time be 
resolved, since we'd always have fresh data effectively.

The reason I'm not very fond of this idea is that health-check queries are often not representative of actual traffic, and would thus skew the latency metrics in many cases. But I get your point, while most deployments get a lot of traffic and therefore don't really care about the short time it takes to get useful metrics, it might be different for low-traffic deployments or for backup servers. Do you think it might work if dnsdist were to update the latency from health-check queries if, and only if, there was no "regular" query processed by the server in a fixed interval (let's say 60 seconds? I have not really thought about it). The first health-check query would then of course automatically update the latency unless a "regular" query was processed before the health-check succeeded.

Best regards,
--
Remi Gacogne
PowerDNS.COM BV - https://www.powerdns.com/

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

_______________________________________________
dnsdist mailing list
[email protected]
https://mailman.powerdns.com/mailman/listinfo/dnsdist

Reply via email to