Hi Damien!
On Wed, Oct 29, 2025 at 09:56:34AM +0000, Damien Claisse wrote:
> Previous fixes restored round robin iteration, but an imbalance remains
> when the response tree contains record types other than A or AAAA. Let's
> take the following example: the DNS answers two A records and a CNAME.
> The response "tree" (which is actually flat, more like a list) may look
> as follows, ordered by hash:
> - 1st item: first A record with IP 1
> - 2nd item: second A record with IP 2
> - 3rd item: CNAME record
> As a consequence, resolv_get_ip_from_response will iterate as follows,
> while the TTL is still valid:
> - 1st call: DNS request is done, response tree is created, iteration
> starts at the first item, IP 1 is returned.
> - 2nd call: cached response tree is used, iteration starts at the second
> item, IP 2 is returned.
> - 3rd call: cached response tree is used, iteration starts at the third
> item, but it's a CNAME, so we continue to the next item, which restarts
> iteration at the first item, and IP 1 is returned.
> - 4th call: cached response tree is used and iteration restarts at the
> beginning, returning IP 1 again.
> The 1-2-1-1-2-1-1-2 sequence will repeat, so IP 1 will be used twice as
> often as IP 2, creating a strong imbalance. Even with more IP addresses,
> the first one by hashing order in the tree will always receive twice the
> traffic of the others.
> To fix this, set the next iteration item to the one following the selected
> IP record, if any. This ensures we never use the same IP twice in a row.
>
> This commit should be backported where 3023e9819 ("BUG/MINOR: resolvers:
> Restore round-robin selection on records in DNS answers") is, so as far
> as 2.6.
Nice analysis. Patch now applied, thank you!
Willy