Attached my bigger attempt to solve this problem. 7 separate patches included.

Because in rare circumstances servers might have changed since TCP connection process started, I serialize domain, server address and its last position in server array. Hacked a redirection with special -2 value from cache inserting, where it always reads just one server. It then tries server at reported position first, iterating on all servers for this domain.

In rare cases there might be a problem, because it does not send source address or interface, which might identify correct server. But I doubt those would make it different in any real world examples. We have no simple identification for changed servers. It works in basic testing well.

Added also separation of TCP and UDP last servers. It should be able to forward UDP to server responding just over UDP and TCP to server responding just TCP. That should be quite rare case, more teoretical than real-world. Maybe change of UDP server should change also TCP, because UDP test can be done in parallel.

I have found also unwanted difference from UDP queries. If the response is REFUSED, even that were accepted as valid last_server response. Now it sets TCP last_server just after non-refused response, not just successful connection.

I have tried to look into glibc, that does not seem to set any timeout for TCP (vc) queries. Default timeout in dig tool is 10 seconds, it does not seem to tweak number of SYN packets sent. I think it just measures time before reply arrives. I think ideally we should be able to spawn another TCP connection to the other server if it didn't respond in few seconds. And wait for fastest response from any of those. But that would require quite significant rework of current code.

Did just a basic testing, but those changes improve tested situation.

What do you think about it?

Cheers,
Petr

On 26. 05. 23 18:19, Simon Kelley wrote:


On 25/05/2023 20:32, Petr Menšík wrote:
This problem is best tested by an example, taken from [2] but a bit modified.

Let's create hepothetical network issue with one forwarder, which worked fine a while ago.

$ sudo iptables -I INPUT -i lo -d 127.0.0.255 -j DROP

Now start dnsmasq and send tcp query to it

$ dnsmasq -d --log-queries --port 2053 --no-resolv --conf-file=/dev/null --server=127.0.0.255 --server=127.0.0.1
$ dig +tcp @localhost -p 2053 test

;; communications error to ::1#2053: timed out
;; communications error to ::1#2053: timed out
;; communications error to ::1#2053: timed out
;; communications error to 127.0.0.1#2053: timed out

; <<>> DiG 9.18.15 <<>> +tcp @localhost -p 2053 test
; (2 servers found)
;; global options: +cmd
;; no servers could be reached

Because dig waits much shorter time than dnsmasq does, it never receives any reply. Even when the other server is responding just fine. That is main advantage of having local cache running, isn't it? It should improve things!

Now lets be persistent and keep trying:

$ time for TRY in {1..6}; do dig +tcp @localhost -p 2053 test; done

After few timeouts, it will finally notice something is wrong and tries also the second server, which will answer fast. However this works only with dnsmasq -d, which is not used in production. If I replace it with dnsmasq -k, it will not answer at all!

$ dnsmasq -k --log-queries --port 2053 --no-resolv --conf-file=/dev/null --server=127.0.0.255 --server=127.0.0.1
$ time for TRY in {1..8}; do dig +tcp @localhost -p 2053 test; done

...
;; communications error to ::1#2053: timed out
;; communications error to ::1#2053: timed out
;; communications error to ::1#2053: timed out
;; communications error to 127.0.0.1#2053: timed out

; <<>> DiG 9.18.15 <<>> +tcp @localhost -p 2053 test
; (2 servers found)
;; global options: +cmd
;; no servers could be reached


real    5m20,602s
user    0m0,094s
sys    0m0,115s

This is because with -k it spawns tcp workers, which start always with whatever last_server prepared by last UDP. And until any UDP query arrives to save the day, it will stubbornly try non-responding server first. Even when the other one answers in miliseconds. Notice it have been trying 5 minutes without success.

I think this has to be fixed somehow. This is corner case, because TCP queries are usually caused by UDP queries with TC bit set. But there exist real-world examples, where TCP only query makes sense. But dnsmasq does not handle them well. Summarized this at [3].

My proposal would be sending UDP query + EDNS0 header in case sending query failed to the main process, which can then trigger forwarders responsiveness and change the last_server to a working one. So subsequent attempts do not fall into the blackhole again and again. EDNS0 header would be there to increase chance for a positive reply from upstream, which can be cached.

Would you have other ideas, how to solve this problem?

Cheers,
Petr



The long delay awaiting a connection from a non-responsive server may be improved by reducing the value of the TCP_SYNCNT socket option, at least on Linux.


I think it's pretty easy to pass back the identity of a server which is responding to TCP connections to the main process using the same mechanism that passes back cache entries. The only wrinkle is if the list of servers changes between forking the child process and is sending back data about which server worked, for instance is the srever list gets reconfigured. Detecting that just needs an "epoch" counter to be included. It's rare, so just rejecting a "use this server" update from a child that was spawned in a different epoch from the current one should avoid problems. Provided the epoch is the same, indices into the server[] array are valid to send across the pipe.

I like the idea of using a different valid server for TCP and UDP.

Note that the TCP code does try to pick a good server. It's not currently much good with long connection delays, but it does cope with ignoring a server which accepts connections and then immediately closes them. I guess that must have been a real-world problem sometime.

Cheers,

Simon.
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2160466#c6
[3] https://bugzilla.redhat.com/show_bug.cgi?id=2160466#c13

On 19. 05. 23 13:40, Petr Menšík wrote:
When analysing report [1] for non-responding queries over TCP, I have found forwarded TCP connections have quite high timeout. If for whatever reason the forwarder currently set as a last used forwarder is dropping packets without reject, the TCP will timeout for about 120 seconds on my system. That is way too much, I think any TCP clients will give up far before that. This is just quick workaround to improve the situation, not final fix.

...

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2160466


_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss

_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss

--
Petr Menšík
Software Engineer, RHEL
Red Hat, http://www.redhat.com/
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
From c02cfcb0a358e04636ffd2bcc595860b25b3a440 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Petr=20Men=C5=A1=C3=ADk?= <pemen...@redhat.com>
Date: Wed, 17 May 2023 21:40:19 +0200
Subject: [PATCH 1/7] Add --dns-tcp-timeout option

Changes send timeout of outgoing TCP connections. Allows waiting just
short time for successful connection before trying another server.
Makes possible faster switch to working server if previous is not
responding. Default socket timeout seems too high for DNS connections.
---
 man/dnsmasq.8 |  4 ++++
 src/config.h  |  1 +
 src/dnsmasq.h |  1 +
 src/forward.c | 10 ++++++++--
 src/option.c  | 10 +++++++++-
 5 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/man/dnsmasq.8 b/man/dnsmasq.8
index 30429df..94fd5b1 100644
--- a/man/dnsmasq.8
+++ b/man/dnsmasq.8
@@ -860,6 +860,10 @@ where this needs to be increased is when using web-server log file
 resolvers, which can generate large numbers of concurrent queries. This
 parameter actually controls the number of concurrent queries per server group, where a server group is the set of server(s) associated with a single domain. So if a domain has it's own server via --server=/example.com/1.2.3.4 and 1.2.3.4 is not responding, but queries for *.example.com cannot go elsewhere, then other queries will not be affected. On configurations with many such server groups and tight resources, this value may need to be reduced.
 .TP
+.B --dns-tcp-timeout=<seconds>
+Sets send timeout for forwarded TCP connections. Can be used to reduce time of waiting
+for successful TCP connection. Default value 0 skips the change of it.
+.TP
 .B --dnssec
 Validate DNS replies and cache DNSSEC data. When forwarding DNS queries, dnsmasq requests the 
 DNSSEC records needed to validate the replies. The replies are validated and the result returned as 
diff --git a/src/config.h b/src/config.h
index 88cf72e..5fd5cdf 100644
--- a/src/config.h
+++ b/src/config.h
@@ -61,6 +61,7 @@
 #define LOOP_TEST_TYPE T_TXT
 #define DEFAULT_FAST_RETRY 1000 /* ms, default delay before fast retry */
 #define STALE_CACHE_EXPIRY 86400 /* 1 day in secs, default maximum expiry time for stale cache data */
+#define TCP_TIMEOUT 0 /* timeout of tcp outgoing connections */
  
 /* compile-time options: uncomment below to enable or do eg.
    make COPTS=-DHAVE_BROKEN_RTC
diff --git a/src/dnsmasq.h b/src/dnsmasq.h
index 2f95c12..4113ccb 100644
--- a/src/dnsmasq.h
+++ b/src/dnsmasq.h
@@ -1256,6 +1256,7 @@ extern struct daemon {
   int tcp_pipes[MAX_PROCS];
   int pipe_to_parent;
   int numrrand;
+  int tcp_timeout;
   struct randfd *randomsocks;
   struct randfd_list *rfl_spare, *rfl_poll;
   int v6pktinfo; 
diff --git a/src/forward.c b/src/forward.c
index ecfeebd..f323fee 100644
--- a/src/forward.c
+++ b/src/forward.c
@@ -1929,8 +1929,14 @@ static ssize_t tcp_talk(int first, int last, int start, unsigned char *packet,
 	  /* Copy connection mark of incoming query to outgoing connection. */
 	  if (have_mark)
 	    setsockopt(serv->tcpfd, SOL_SOCKET, SO_MARK, &mark, sizeof(unsigned int));
-#endif			  
-	  
+#endif
+
+	  if (daemon->tcp_timeout>0)
+	    {
+	      struct timeval tv = {daemon->tcp_timeout, 0};
+	      setsockopt(serv->tcpfd, SOL_SOCKET, SO_SNDTIMEO, &tv, sizeof(tv));
+	    }
+
 	  if ((!local_bind(serv->tcpfd,  &serv->source_addr, serv->interface, 0, 1)))
 	    {
 	      close(serv->tcpfd);
diff --git a/src/option.c b/src/option.c
index 8322725..b94a5ff 100644
--- a/src/option.c
+++ b/src/option.c
@@ -190,6 +190,7 @@ struct myoption {
 #define LOPT_FILTER_RR     381
 #define LOPT_NO_DHCP6      382
 #define LOPT_NO_DHCP4      383
+#define LOPT_TCP_TIMEOUT   384
 
 #ifdef HAVE_GETOPT_LONG
 static const struct option opts[] =  
@@ -279,6 +280,7 @@ static const struct myoption opts[] =
     { "leasefile-ro", 0, 0, '9' },
     { "script-on-renewal", 0, 0, LOPT_SCRIPT_TIME},
     { "dns-forward-max", 1, 0, '0' },
+    { "dns-tcp-timeout", 1, 0, LOPT_TCP_TIMEOUT },
     { "clear-on-reload", 0, 0, LOPT_RELOAD },
     { "dhcp-ignore-names", 2, 0, LOPT_NO_NAMES },
     { "enable-tftp", 2, 0, LOPT_TFTP },
@@ -3391,7 +3393,12 @@ static int one_opt(int option, char *arg, char *errstr, char *gen_err, int comma
     case '0':  /* --dns-forward-max */
       if (!atoi_check(arg, &daemon->ftabsize))
 	ret_err(gen_err);
-      break;  
+      break;
+
+    case LOPT_TCP_TIMEOUT: /* --dns-tcp-timeout */
+      if (!atoi_check(arg, &daemon->tcp_timeout))
+	 ret_err(gen_err);
+      break;
     
     case 'q': /* --log-queries */
       set_option_bool(OPT_LOG);
@@ -5833,6 +5840,7 @@ void read_opts(int argc, char **argv, char *compile_opts)
   daemon->soa_expiry = SOA_EXPIRY;
   daemon->randport_limit = 1;
   daemon->host_index = SRC_AH;
+  daemon->tcp_timeout = TCP_TIMEOUT;
   
   /* See comment above make_servers(). Optimises server-read code. */
   mark_servers(0);
-- 
2.40.1

From d9a8d33f195c6c406ff28a0084d1be8b46583b08 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Petr=20Men=C5=A1=C3=ADk?= <pemen...@redhat.com>
Date: Tue, 23 May 2023 21:31:05 +0200
Subject: [PATCH 2/7] Reduce few TCP related repeated code

Provide special function for repeated setting of ede code in
pseudoheader. Introduce read_tcp reusing query length and payload.
---
 src/dnsmasq.h |   2 +
 src/edns0.c   |   9 ++
 src/forward.c | 223 ++++++++++++++++++++------------------------------
 3 files changed, 100 insertions(+), 134 deletions(-)

diff --git a/src/dnsmasq.h b/src/dnsmasq.h
index 4113ccb..77296ed 100644
--- a/src/dnsmasq.h
+++ b/src/dnsmasq.h
@@ -1852,6 +1852,8 @@ unsigned char *find_pseudoheader(struct dns_header *header, size_t plen,
 				   size_t *len, unsigned char **p, int *is_sign, int *is_last);
 size_t add_pseudoheader(struct dns_header *header, size_t plen, unsigned char *limit, 
 			unsigned short udp_sz, int optno, unsigned char *opt, size_t optlen, int set_do, int replace);
+size_t add_pseudoheader_ede(struct dns_header *header, size_t plen, unsigned char *limit,
+			unsigned short udp_sz, int ede, int set_do, int replace);
 size_t add_do_bit(struct dns_header *header, size_t plen, unsigned char *limit);
 size_t add_edns0_config(struct dns_header *header, size_t plen, unsigned char *limit, 
 			union mysockaddr *source, time_t now, int *cacheable);
diff --git a/src/edns0.c b/src/edns0.c
index 800c51f..433796b 100644
--- a/src/edns0.c
+++ b/src/edns0.c
@@ -243,6 +243,15 @@ size_t add_pseudoheader(struct dns_header *header, size_t plen, unsigned char *l
   return p - (unsigned char *)header;
 }
 
+size_t add_pseudoheader_ede(struct dns_header *header, size_t plen, unsigned char *limit,
+			unsigned short udp_sz, int ede, int set_do, int replace)
+{
+  u16 swap = htons((u16)ede);
+
+  return add_pseudoheader(header, plen, limit, udp_sz,
+		  EDNS0_OPTION_EDE, (unsigned char *)&swap, 2, set_do, replace);
+}
+
 size_t add_do_bit(struct dns_header *header, size_t plen, unsigned char *limit)
 {
   return add_pseudoheader(header, plen, (unsigned char *)limit, PACKETSZ, 0, NULL, 0, 1, 0);
diff --git a/src/forward.c b/src/forward.c
index f323fee..37696c8 100644
--- a/src/forward.c
+++ b/src/forward.c
@@ -581,10 +581,8 @@ static int forward_query(int udpfd, union mysockaddr *udpaddr,
       
       if (oph)
 	{
-	  u16 swap = htons((u16)ede);
-
 	  if (ede != EDE_UNSET)
-	    plen = add_pseudoheader(header, plen, (unsigned char *)limit, daemon->edns_pktsz, EDNS0_OPTION_EDE, (unsigned char *)&swap, 2, do_bit, 0);
+	    plen = add_pseudoheader_ede(header, plen, (unsigned char *)limit, daemon->edns_pktsz, ede, do_bit, 0);
 	  else
 	    plen = add_pseudoheader(header, plen, (unsigned char *)limit, daemon->edns_pktsz, 0, NULL, 0, do_bit, 0);
 	}
@@ -873,10 +871,7 @@ static size_t process_reply(struct dns_header *header, time_t now, struct server
   n = resize_packet(header, n, pheader, plen);
 
   if (pheader && ede != EDE_UNSET)
-    {
-      u16 swap = htons((u16)ede);
-      n = add_pseudoheader(header, n, limit, daemon->edns_pktsz, EDNS0_OPTION_EDE, (unsigned char *)&swap, 2, do_bit, 1);
-    }
+    n = add_pseudoheader_ede(header, n, limit, daemon->edns_pktsz, ede, do_bit, 1);
 
   if (RCODE(header) == NXDOMAIN)
     server->nxdomain_replies++;
@@ -1440,6 +1435,57 @@ static size_t answer_disallowed(struct dns_header *header, size_t qlen, u32 mark
 }
 #endif
 
+/* We can be configured to only accept queries from at-most-one-hop-away addresses. */
+static int is_local_network_address(union mysockaddr *source_addr)
+{
+  struct addrlist *addr = daemon->interface_addrs;
+
+  if (source_addr->sa.sa_family == AF_INET6)
+    {
+      for (; addr; addr = addr->next)
+	if ((addr->flags & ADDRLIST_IPV6) &&
+	    is_same_net6(&addr->addr.addr6, &source_addr->in6.sin6_addr, addr->prefixlen))
+	  break;
+    }
+  else
+    {
+      struct in_addr netmask;
+      for (; addr; addr = addr->next)
+	{
+	  netmask.s_addr = htonl(~(in_addr_t)0 << (32 - addr->prefixlen));
+	  if (!(addr->flags & ADDRLIST_IPV6) &&
+	      is_same_net(addr->addr.addr4, source_addr->in.sin_addr, netmask))
+	    break;
+	}
+    }
+  if (!addr)
+    {
+      static int warned = 0;
+      if (!warned)
+	{
+	  prettyprint_addr(source_addr, daemon->addrbuff);
+	  my_syslog(LOG_WARNING, _("ignoring query from non-local network %s (logged only once)"), daemon->addrbuff);
+	  warned = 1;
+	}
+      return 0;
+    }
+  return 1;
+}
+
+#ifdef HAVE_AUTH
+/* find queries for zones we're authoritative for, and answer them directly */
+static int is_auth_zone(char *name)
+{
+  struct auth_zone *zone;
+
+  if (!option_bool(OPT_LOCALISE))
+    for (zone = daemon->auth_zones; zone; zone = zone->next)
+      if (in_zone(zone, name, NULL))
+	return 1;
+  return 0;
+}
+#endif
+
 void receive_query(struct listener *listen, time_t now)
 {
   struct dns_header *header = (struct dns_header *)daemon->packet;
@@ -1537,40 +1583,8 @@ void receive_query(struct listener *listen, time_t now)
     }
   
   /* We can be configured to only accept queries from at-most-one-hop-away addresses. */
-  if (option_bool(OPT_LOCAL_SERVICE))
-    {
-      struct addrlist *addr;
-
-      if (family == AF_INET6) 
-	{
-	  for (addr = daemon->interface_addrs; addr; addr = addr->next)
-	    if ((addr->flags & ADDRLIST_IPV6) &&
-		is_same_net6(&addr->addr.addr6, &source_addr.in6.sin6_addr, addr->prefixlen))
-	      break;
-	}
-      else
-	{
-	  struct in_addr netmask;
-	  for (addr = daemon->interface_addrs; addr; addr = addr->next)
-	    {
-	      netmask.s_addr = htonl(~(in_addr_t)0 << (32 - addr->prefixlen));
-	      if (!(addr->flags & ADDRLIST_IPV6) &&
-		  is_same_net(addr->addr.addr4, source_addr.in.sin_addr, netmask))
-		break;
-	    }
-	}
-      if (!addr)
-	{
-	  static int warned = 0;
-	  if (!warned)
-	    {
-	      prettyprint_addr(&source_addr, daemon->addrbuff);
-	      my_syslog(LOG_WARNING, _("ignoring query from non-local network %s (logged only once)"), daemon->addrbuff);
-	      warned = 1;
-	    }
-	  return;
-	}
-    }
+  if (option_bool(OPT_LOCAL_SERVICE) && !is_local_network_address(&source_addr))
+    return;
 		
   if (check_dst)
     {
@@ -1694,7 +1708,6 @@ void receive_query(struct listener *listen, time_t now)
   if (extract_request(header, (size_t)n, daemon->namebuff, &type))
     {
 #ifdef HAVE_AUTH
-      struct auth_zone *zone;
 #endif
       log_query_mysockaddr(F_QUERY | F_FORWARD, daemon->namebuff,
 			   &source_addr, auth_dns ? "auth" : "query", type);
@@ -1704,15 +1717,8 @@ void receive_query(struct listener *listen, time_t now)
 #endif
 
 #ifdef HAVE_AUTH
-      /* find queries for zones we're authoritative for, and answer them directly */
-      if (!auth_dns && !option_bool(OPT_LOCALISE))
-	for (zone = daemon->auth_zones; zone; zone = zone->next)
-	  if (in_zone(zone, daemon->namebuff, NULL))
-	    {
-	      auth_dns = 1;
-	      local_auth = 1;
-	      break;
-	    }
+      if (!auth_dns && is_auth_zone(daemon->namebuff))
+	auth_dns = local_auth = 1;
 #endif
       
 #ifdef HAVE_LOOP
@@ -1759,14 +1765,12 @@ void receive_query(struct listener *listen, time_t now)
 #ifdef HAVE_CONNTRACK
   else if (!allowed)
     {
-      u16 swap = htons(EDE_BLOCKED);
-
       m = answer_disallowed(header, (size_t)n, (u32)mark, is_single_query ? daemon->namebuff : NULL);
       
       if (have_pseudoheader && m != 0)
-	m = add_pseudoheader(header,  m,  ((unsigned char *) header) + udp_size, daemon->edns_pktsz,
-			     EDNS0_OPTION_EDE, (unsigned char *)&swap, 2, do_bit, 0);
-      
+	m = add_pseudoheader_ede(header,  m,  ((unsigned char *) header) + udp_size, daemon->edns_pktsz,
+				 EDE_BLOCKED, do_bit, 0);
+
       if (m >= 1)
 	{
 #ifdef HAVE_DUMPFILE
@@ -1781,7 +1785,7 @@ void receive_query(struct listener *listen, time_t now)
 #ifdef HAVE_AUTH
   else if (auth_dns)
     {
-      m = answer_auth(header, ((char *) header) + udp_size, (size_t)n, now, &source_addr, 
+      m = answer_auth(header, ((char *) header) + udp_size, (size_t)n, now, &source_addr,
 		      local_auth, do_bit, have_pseudoheader);
       if (m >= 1)
 	{
@@ -1814,7 +1818,7 @@ void receive_query(struct listener *listen, time_t now)
       if (header->hb4 & HB4_AD)
 	ad_reqd = 1;
 
-      m = answer_request(header, ((char *) header) + udp_size, (size_t)n, 
+      m = answer_request(header, ((char *) header) + udp_size, (size_t)n,
 			 dst_addr_4, netmask, now, ad_reqd, do_bit, have_pseudoheader, &stale, &filtered);
       
       if (m >= 1)
@@ -1829,12 +1833,8 @@ void receive_query(struct listener *listen, time_t now)
 		ede = EDE_STALE;
 
 	      if (ede != EDE_UNSET)
-		{
-		  u16 swap = htons(ede);
-		  
-		  m = add_pseudoheader(header,  m,  ((unsigned char *) header) + udp_size, daemon->edns_pktsz,
-				       EDNS0_OPTION_EDE, (unsigned char *)&swap, 2, do_bit, 0);
-		}
+		m = add_pseudoheader_ede(header,  m,  ((unsigned char *) header) + udp_size, daemon->edns_pktsz,
+				     ede, do_bit, 0);
 	    }
 	  
 #ifdef HAVE_DUMPFILE
@@ -1876,6 +1876,14 @@ void receive_query(struct listener *listen, time_t now)
     }
 }
 
+static int read_tcp(int tcpfd, unsigned char *payload, size_t *rsize)
+{
+  unsigned char c1, c2;
+  return (read_write(tcpfd, &c1, 1, 1) &&
+	  read_write(tcpfd, &c2, 1, 1) &&
+	  read_write(tcpfd, payload, (*rsize = (c1 << 8) | c2), 1));
+}
+
 /* Send query in packet, qsize to a server determined by first,last,start and
    get the reply. return reply size. */
 static ssize_t tcp_talk(int first, int last, int start, unsigned char *packet,  size_t qsize,
@@ -1885,9 +1893,8 @@ static ssize_t tcp_talk(int first, int last, int start, unsigned char *packet,
   u16 *length = (u16 *)packet;
   unsigned char *payload = &packet[2];
   struct dns_header *header = (struct dns_header *)payload;
-  unsigned char c1, c2;
   unsigned char hash[HASH_SIZE], *hashp;
-  unsigned int rsize;
+  size_t rsize;
   
   (void)mark;
   (void)have_mark;
@@ -1963,9 +1970,7 @@ static ssize_t tcp_talk(int first, int last, int start, unsigned char *packet,
 	}
       
       if ((!data_sent && !read_write(serv->tcpfd, packet, qsize + sizeof(u16), 0)) ||
-	  !read_write(serv->tcpfd, &c1, 1, 1) ||
-	  !read_write(serv->tcpfd, &c2, 1, 1) ||
-	  !read_write(serv->tcpfd, payload, (rsize = (c1 << 8) | c2), 1))
+	  !read_tcp(serv->tcpfd, payload, &rsize))
 	{
 	  close(serv->tcpfd);
 	  serv->tcpfd = -1;
@@ -2069,7 +2074,6 @@ static int tcp_key_recurse(time_t now, int status, struct dns_header *header, si
 }
 #endif
 
-
 /* The daemon forks before calling this: it should deal with one connection,
    blocking as necessary, and then return. Note, need to be a bit careful
    about resources for debug mode, when the fork is suppressed: that's
@@ -2094,9 +2098,9 @@ unsigned char *tcp_request(int confd, time_t now,
   /* Max TCP packet + slop + size */
   unsigned char *packet = whine_malloc(65536 + MAXDNAME + RRFIXEDSZ + sizeof(u16));
   unsigned char *payload = &packet[2];
-  unsigned char c1, c2;
   /* largest field in header is 16-bits, so this is still sufficiently aligned */
   struct dns_header *header = (struct dns_header *)payload;
+  unsigned char *hlimit = payload + 65536;
   u16 *length = (u16 *)packet;
   struct server *serv;
   struct in_addr dst_addr_4;
@@ -2128,35 +2132,8 @@ unsigned char *tcp_request(int confd, time_t now,
 #endif	
 
   /* We can be configured to only accept queries from at-most-one-hop-away addresses. */
-  if (option_bool(OPT_LOCAL_SERVICE))
-    {
-      struct addrlist *addr;
-
-      if (peer_addr.sa.sa_family == AF_INET6) 
-	{
-	  for (addr = daemon->interface_addrs; addr; addr = addr->next)
-	    if ((addr->flags & ADDRLIST_IPV6) &&
-		is_same_net6(&addr->addr.addr6, &peer_addr.in6.sin6_addr, addr->prefixlen))
-	      break;
-	}
-      else
-	{
-	  struct in_addr netmask;
-	  for (addr = daemon->interface_addrs; addr; addr = addr->next)
-	    {
-	      netmask.s_addr = htonl(~(in_addr_t)0 << (32 - addr->prefixlen));
-	      if (!(addr->flags & ADDRLIST_IPV6) && 
-		  is_same_net(addr->addr.addr4, peer_addr.in.sin_addr, netmask))
-		break;
-	    }
-	}
-      if (!addr)
-	{
-	  prettyprint_addr(&peer_addr, daemon->addrbuff);
-	  my_syslog(LOG_WARNING, _("ignoring query from non-local network %s"), daemon->addrbuff);
-	  return packet;
-	}
-    }
+  if (option_bool(OPT_LOCAL_SERVICE) && !is_local_network_address(&peer_addr))
+    return packet;
 
   while (1)
     {
@@ -2177,9 +2154,7 @@ unsigned char *tcp_request(int confd, time_t now,
 	  if (query_count == TCP_MAX_QUERIES)
 	    return packet;
 
-	  if (!read_write(confd, &c1, 1, 1) || !read_write(confd, &c2, 1, 1) ||
-	      !(size = c1 << 8 | c2) ||
-	      !read_write(confd, payload, size, 1))
+	  if (!read_tcp(confd, payload, &size) || size == 0)
 	    return packet;
 	}
       
@@ -2203,10 +2178,6 @@ unsigned char *tcp_request(int confd, time_t now,
        
       if ((gotname = extract_request(header, (unsigned int)size, daemon->namebuff, &qtype)))
 	{
-#ifdef HAVE_AUTH
-	  struct auth_zone *zone;
-#endif
-
 #ifdef HAVE_CONNTRACK
 	  is_single_query = 1;
 #endif
@@ -2217,15 +2188,8 @@ unsigned char *tcp_request(int confd, time_t now,
 				   &peer_addr, auth_dns ? "auth" : "query", qtype);
 	      
 #ifdef HAVE_AUTH
-	      /* find queries for zones we're authoritative for, and answer them directly */
-	      if (!auth_dns && !option_bool(OPT_LOCALISE))
-		for (zone = daemon->auth_zones; zone; zone = zone->next)
-		  if (in_zone(zone, daemon->namebuff, NULL))
-		    {
-		      auth_dns = 1;
-		      local_auth = 1;
-		      break;
-		    }
+	      if (!auth_dns && is_auth_zone(daemon->namebuff))
+		auth_dns = local_auth = 1;
 #endif
 	    }
 	}
@@ -2263,18 +2227,15 @@ unsigned char *tcp_request(int confd, time_t now,
 #ifdef HAVE_CONNTRACK
       else if (!allowed)
 	{
-	  u16 swap = htons(EDE_BLOCKED);
-
 	  m = answer_disallowed(header, size, (u32)mark, is_single_query ? daemon->namebuff : NULL);
 	  
 	  if (have_pseudoheader && m != 0)
-	    m = add_pseudoheader(header,  m, ((unsigned char *) header) + 65536, daemon->edns_pktsz,
-				 EDNS0_OPTION_EDE, (unsigned char *)&swap, 2, do_bit, 0);
+	    m = add_pseudoheader_ede(header,  m, hlimit, daemon->edns_pktsz, EDE_BLOCKED, do_bit, 0);
 	}
 #endif
 #ifdef HAVE_AUTH
       else if (auth_dns)
-	m = answer_auth(header, ((char *) header) + 65536, (size_t)size, now, &peer_addr, 
+	m = answer_auth(header, (char *) hlimit, (size_t)size, now, &peer_addr, 
 			local_auth, do_bit, have_pseudoheader);
 #endif
       else
@@ -2298,7 +2259,7 @@ unsigned char *tcp_request(int confd, time_t now,
 		 }
 	       
 	       /* m > 0 if answered from cache */
-	       m = answer_request(header, ((char *) header) + 65536, (size_t)size, 
+	       m = answer_request(header, (char *) hlimit, (size_t)size, 
 				  dst_addr_4, netmask, now, ad_reqd, do_bit, have_pseudoheader, &stale, &filtered);
 	     }
 	  /* Do this by steam now we're not in the select() loop */
@@ -2335,12 +2296,12 @@ unsigned char *tcp_request(int confd, time_t now,
 		  else
 		    start = master->last_server;
 		  
-		  size = add_edns0_config(header, size, ((unsigned char *) header) + 65536, &peer_addr, now, &cacheable);
+		  size = add_edns0_config(header, size, hlimit, &peer_addr, now, &cacheable);
 		  
 #ifdef HAVE_DNSSEC
 		  if (option_bool(OPT_DNSSEC_VALID) && (master->flags & SERV_DO_DNSSEC))
 		    {
-		      size = add_do_bit(header, size, ((unsigned char *) header) + 65536);
+		      size = add_do_bit(header, size, hlimit);
 		      
 		      /* For debugging, set Checking Disabled, otherwise, have the upstream check too,
 			 this allows it to select auth servers when one is returning bad data. */
@@ -2425,17 +2386,15 @@ unsigned char *tcp_request(int confd, time_t now,
       if (m == 0)
 	{
 	  if (!(m = make_local_answer(flags, gotname, size, header, daemon->namebuff,
-				      ((char *) header) + 65536, first, last, ede)))
+				      (char *) hlimit, first, last, ede)))
 	    break;
 	  
 	  if (have_pseudoheader)
 	    {
-	      u16 swap = htons((u16)ede);
-	      
 	      if (ede != EDE_UNSET)
-		m = add_pseudoheader(header, m, ((unsigned char *) header) + 65536, daemon->edns_pktsz, EDNS0_OPTION_EDE, (unsigned char *)&swap, 2, do_bit, 0);
+		m = add_pseudoheader_ede(header, m, hlimit, daemon->edns_pktsz, ede, do_bit, 0);
 	      else
-		m = add_pseudoheader(header, m, ((unsigned char *) header) + 65536, daemon->edns_pktsz, 0, NULL, 0, do_bit, 0);
+		m = add_pseudoheader(header, m, hlimit, daemon->edns_pktsz, 0, NULL, 0, do_bit, 0);
 	    }
 	}
       else if (have_pseudoheader)
@@ -2448,11 +2407,7 @@ unsigned char *tcp_request(int confd, time_t now,
 	    ede = EDE_STALE;
 	  
 	  if (ede != EDE_UNSET)
-	    {
-	      u16 swap = htons((u16)ede);
-	      
-	      m = add_pseudoheader(header, m, ((unsigned char *) header) + 65536, daemon->edns_pktsz, EDNS0_OPTION_EDE, (unsigned char *)&swap, 2, do_bit, 0);
-	    }
+	    m = add_pseudoheader_ede(header, m, hlimit, daemon->edns_pktsz, ede, do_bit, 0);
 	}
 	  
       check_log_writer(1);
-- 
2.40.1

From 3a2927467691eab2139fb1df305bebfe9c142698 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Petr=20Men=C5=A1=C3=ADk?= <pemen...@redhat.com>
Date: Wed, 24 May 2023 19:13:22 +0200
Subject: [PATCH 3/7] Report changed TCP servers to master process

Send last_server change to master process via daemon->pipe_to_parent,
just like inserting into the cache does. Reports changed last_server in case
the originally used one did not respond to query.

Abort the previous idea for sending UDP query to the server. It would
not do anything if the response is already cached.
---
 src/cache.c   |   3 ++
 src/dnsmasq.h |   1 +
 src/forward.c | 119 +++++++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 116 insertions(+), 7 deletions(-)

diff --git a/src/cache.c b/src/cache.c
index ccbb9cd..36cb50a 100644
--- a/src/cache.c
+++ b/src/cache.c
@@ -864,6 +864,9 @@ int cache_recv_insert(time_t now, int fd)
 	  cache_end_insert();
 	  return 1;
 	}
+      else if (m == -2)
+	/* receive changed last_server from tcp children. */
+	return recv_server_on_parent(fd);
 
       if (!read_write(fd, (unsigned char *)daemon->namebuff, m, 1) ||
 	  !read_write(fd, (unsigned char *)&ttd, sizeof(ttd), 1) ||
diff --git a/src/dnsmasq.h b/src/dnsmasq.h
index 77296ed..69c5c53 100644
--- a/src/dnsmasq.h
+++ b/src/dnsmasq.h
@@ -1521,6 +1521,7 @@ void resend_query(void);
 int allocate_rfd(struct randfd_list **fdlp, struct server *serv);
 void free_rfds(struct randfd_list **fdlp);
 int fast_retry(time_t now);
+int recv_server_on_parent(int pipe_on_parent);
 
 /* network.c */
 int indextoname(int fd, int index, char *name);
diff --git a/src/forward.c b/src/forward.c
index 37696c8..6ea6f65 100644
--- a/src/forward.c
+++ b/src/forward.c
@@ -1887,7 +1887,7 @@ static int read_tcp(int tcpfd, unsigned char *payload, size_t *rsize)
 /* Send query in packet, qsize to a server determined by first,last,start and
    get the reply. return reply size. */
 static ssize_t tcp_talk(int first, int last, int start, unsigned char *packet,  size_t qsize,
-			int have_mark, unsigned int mark, struct server **servp)
+			int have_mark, unsigned int mark, struct server **servp, int *send_failed)
 {
   int firstsendto = -1;
   u16 *length = (u16 *)packet;
@@ -1899,6 +1899,8 @@ static ssize_t tcp_talk(int first, int last, int start, unsigned char *packet,
   (void)mark;
   (void)have_mark;
 
+  if (send_failed)
+    *send_failed = 0;
   if (!(hashp = hash_questions(header, (unsigned int)qsize, daemon->namebuff)))
     return 0;
 
@@ -1962,6 +1964,8 @@ static ssize_t tcp_talk(int first, int last, int start, unsigned char *packet,
 	    {
 	      close(serv->tcpfd);
 	      serv->tcpfd = -1;
+	      if (send_failed)
+		(*send_failed)++;
 	      continue;
 	    }
 	  
@@ -1974,6 +1978,8 @@ static ssize_t tcp_talk(int first, int last, int start, unsigned char *packet,
 	{
 	  close(serv->tcpfd);
 	  serv->tcpfd = -1;
+	  if (send_failed)
+	    (*send_failed)++;
 	  /* We get data then EOF, reopen connection to same server,
 	     else try next. This avoids DoS from a server which accepts
 	     connections and then closes them. */
@@ -2047,7 +2053,7 @@ static int tcp_key_recurse(time_t now, int status, struct dns_header *header, si
 				STAT_ISEQUAL(new_status, STAT_NEED_KEY) ? T_DNSKEY : T_DS, server->edns_pktsz);
       
       if ((start = dnssec_server(server, daemon->keyname, &first, &last)) == -1 ||
-	  (m = tcp_talk(first, last, start, packet, m, have_mark, mark, &server)) == 0)
+	  (m = tcp_talk(first, last, start, packet, m, have_mark, mark, &server, NULL)) == 0)
 	{
 	  new_status = STAT_ABANDONED;
 	  break;
@@ -2074,6 +2080,98 @@ static int tcp_key_recurse(time_t now, int status, struct dns_header *header, si
 }
 #endif
 
+/* Inspired by cache_end_insert in cache.c
+ * Serializes server information over pipe_to_parent required to switch current one.
+ * Sends always just one record. */
+static int send_server_to_parent(int pipe_to_parent, struct server *serv)
+{
+  ssize_t domainlen = serv->domain_len;
+  ssize_t m = -2;
+  size_t addrlen = sa_len(&serv->addr);
+
+  /* define this is a server feedback, not cache. */
+  if (!read_write(pipe_to_parent, (unsigned char *)&m, sizeof(m), 0) ||
+      !read_write(pipe_to_parent, (unsigned char *)&domainlen, sizeof(domainlen), 0))
+    return 0;
+  if (domainlen > 0 &&
+      !read_write(pipe_to_parent, (unsigned char *)serv->domain, domainlen, 0))
+    return 0;
+  if (!read_write(pipe_to_parent, (unsigned  char *)&serv->arrayposn, sizeof(serv->arrayposn), 0) ||
+      !read_write(pipe_to_parent, (unsigned char *)&addrlen, sizeof(addrlen), 0) ||
+      !read_write(pipe_to_parent, (unsigned char *)&serv->addr.sa, addrlen, 0))
+    return 0;
+  return 1;
+}
+
+/* limited server comparison, compares just domain and address. */
+static int server_addr_equal(struct server *s1, struct server *s2)
+{
+  return (s1->domain_len == s2->domain_len &&
+	 (s1->domain_len == 0 || strcmp(s1->domain, s2->domain) == 0) &&
+	 (s2->flags & (SERV_USE_RESOLV | SERV_LITERAL_ADDRESS)) == 0 &&
+	 sockaddr_isequal(&s1->addr, &s2->addr));
+}
+
+static struct server * server_next(struct server *s, int first, int last, int start)
+{
+  int i = s->arrayposn;
+  if (i == start || first == -1)
+    return NULL;
+  if (i >= last)
+    return daemon->serverarray[first];
+  return daemon->serverarray[i+1];
+}
+
+static int recv_one_server(int pipe_on_parent, struct server *s)
+{
+  ssize_t domainlen;
+  size_t addrlen;
+
+  if (!read_write(pipe_on_parent, (unsigned char *)&domainlen, sizeof(domainlen), 1) ||
+      domainlen < 0 || domainlen > MAXDNAME-1)
+    return 0;
+  if (domainlen > 0 &&
+      !read_write(pipe_on_parent, (unsigned char *)daemon->namebuff, domainlen, 1))
+    return 0;
+  daemon->namebuff[domainlen] = 0;
+  if (!read_write(pipe_on_parent, (unsigned char *)&s->arrayposn, sizeof(s->arrayposn), 1) ||
+      !read_write(pipe_on_parent, (unsigned char *)&addrlen, sizeof(addrlen), 1) ||
+      addrlen > sizeof(s->addr))
+    return 0;
+  
+  if (addrlen > 0 &&
+      !read_write(pipe_on_parent, (unsigned char *)&s->addr, addrlen, 1))
+    return 0;
+  s->domain_len = domainlen;
+  s->domain = daemon->namebuff;
+  return 1;
+}
+
+/* receive new TCP last_server from child. */
+int recv_server_on_parent(int pipe_on_parent)
+{
+  /* -2 were already read. */
+  int current = -1, first = -1, last = -1;
+  struct server *s = NULL;
+  struct server curs = {};
+
+  if (!recv_one_server(pipe_on_parent, &curs) ||
+      !lookup_domain(daemon->namebuff, F_SERVER, &first, &last))
+    return 0;
+
+  current = curs.arrayposn;
+  if (!(current >= first && current < daemon->serverarraysz && daemon->serverarray[current]))
+    current = first;
+
+  for (s = daemon->serverarray[current]; s; s = server_next(s, first, last, current))
+    if (server_addr_equal(&curs, s))
+      {
+	daemon->serverarray[first]->last_server = current;
+	return 1;
+      }
+  return 0;
+}
+
 /* The daemon forks before calling this: it should deal with one connection,
    blocking as necessary, and then return. Note, need to be a bit careful
    about resources for debug mode, when the fork is suppressed: that's
@@ -2102,7 +2200,7 @@ unsigned char *tcp_request(int confd, time_t now,
   struct dns_header *header = (struct dns_header *)payload;
   unsigned char *hlimit = payload + 65536;
   u16 *length = (u16 *)packet;
-  struct server *serv;
+  struct server *serv = NULL;
   struct in_addr dst_addr_4;
   union mysockaddr peer_addr;
   socklen_t peer_len = sizeof(union mysockaddr);
@@ -2137,7 +2235,7 @@ unsigned char *tcp_request(int confd, time_t now,
 
   while (1)
     {
-      int ede = EDE_UNSET;
+      int ede = EDE_UNSET, send_failed = 0;
 
       if (do_stale)
 	{
@@ -2268,7 +2366,6 @@ unsigned char *tcp_request(int confd, time_t now,
 	  if (m == 0)
 	    {
 	      struct server *master;
-	      int start;
 
 	      if (lookup_domain(daemon->namebuff, gotname, &first, &last))
 		flags = is_local_answer(now, first, daemon->namebuff);
@@ -2289,6 +2386,8 @@ unsigned char *tcp_request(int confd, time_t now,
 		
 	      if (!flags && ede != EDE_NOT_READY)
 		{
+		  int start;
+
 		  master = daemon->serverarray[first];
 		  
 		  if (option_bool(OPT_ORDER) || master->last_server == -1)
@@ -2314,9 +2413,15 @@ unsigned char *tcp_request(int confd, time_t now,
 		     strip it from the reply. */
 		  if (!have_pseudoheader && find_pseudoheader(header, size, NULL, NULL, NULL, NULL))
 		    added_pheader = 1;
-		  
+
 		  /* Loop round available servers until we succeed in connecting to one. */
-		  if ((m = tcp_talk(first, last, start, packet, size, have_mark, mark, &serv)) == 0)
+		  m = tcp_talk(first, last, start, packet, size, have_mark, mark, &serv, &send_failed);
+		  if (serv != NULL && serv->arrayposn != start && daemon->pipe_to_parent != -1)
+		    {
+			    /* used server has changed, forward that information to master process. */
+			    send_server_to_parent(daemon->pipe_to_parent, serv);
+		    }
+		  if (m == 0)
 		    {
 		      ede = EDE_NETERR;
 		      break;
-- 
2.40.1

From d5d1eab38fbe3eeaa51472bf543d0a094910a617 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Petr=20Men=C5=A1=C3=ADk?= <pemen...@redhat.com>
Date: Wed, 31 May 2023 11:51:08 +0200
Subject: [PATCH 4/7] Add logging of TCP server changes

It may lower even more poor performance in TCP handling. Log changes
just with special build-time define TCP_DEBUG, but it is helpful when
watching those changes. Because it will not be used with -d, those are
always logged to syslog.
---
 src/forward.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/src/forward.c b/src/forward.c
index 6ea6f65..0b11785 100644
--- a/src/forward.c
+++ b/src/forward.c
@@ -2167,8 +2167,14 @@ int recv_server_on_parent(int pipe_on_parent)
     if (server_addr_equal(&curs, s))
       {
 	daemon->serverarray[first]->last_server = current;
+#ifdef TCP_DEBUG
+	log_query_mysockaddr(F_HOSTS, curs.domain, &s->addr, "TCP server for ", 0);
+#endif
 	return 1;
       }
+#ifdef TCP_DEBUG
+  log_query_mysockaddr(F_HOSTS, curs.domain, &curs.addr, "TCP server not found for ", 0);
+#endif
   return 0;
 }
 
-- 
2.40.1

From 6d308a5293dce42af0c155186870f87cf4b70b03 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Petr=20Men=C5=A1=C3=ADk?= <pemen...@redhat.com>
Date: Wed, 31 May 2023 11:53:14 +0200
Subject: [PATCH 5/7] Initialize last_server right after allocation

Do not rely for build array to properly initialize last_server, do it
right after it were allocated. Because the value is not 0, it will not
have correct value.
---
 src/domain-match.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/domain-match.c b/src/domain-match.c
index 2393f38..26ffe39 100644
--- a/src/domain-match.c
+++ b/src/domain-match.c
@@ -739,6 +739,7 @@ int add_update_server(int flags,
 	serv->addr = *addr;
       if (source_addr)
 	serv->source_addr = *source_addr;
+      serv->last_server = -1;
     }
     
   serv->flags = flags;
-- 
2.40.1

From ac219850e71e70389dfd16925db098ea9dd5af4e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Petr=20Men=C5=A1=C3=ADk?= <pemen...@redhat.com>
Date: Wed, 31 May 2023 11:57:18 +0200
Subject: [PATCH 6/7] Set last_server for TCP similar way to UDP

If upstream server responded with REFUSED, try to not use that server
next time. It might help if that server is overloaded and is not willing
to accept more queries temporarily. Change the last_server only if we
received actual reply, not always after successful connection.
---
 src/forward.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/src/forward.c b/src/forward.c
index 0b11785..4e1b7cc 100644
--- a/src/forward.c
+++ b/src/forward.c
@@ -1072,6 +1072,15 @@ static void dnssec_validate(struct frec *forward, struct dns_header *header,
 }
 #endif
 
+/* set or reset last server used. */
+static void set_last_server(struct dns_header *header, int first, int current)
+{
+  if (RCODE(header) != REFUSED)
+    daemon->serverarray[first]->last_server = current;
+  else if (daemon->serverarray[first]->last_server == current)
+    daemon->serverarray[first]->last_server = -1;
+}
+
 /* sets new last_server */
 void reply_query(int fd, time_t now)
 {
@@ -1115,10 +1124,7 @@ void reply_query(int fd, time_t now)
 
   server = daemon->serverarray[c];
 
-  if (RCODE(header) != REFUSED)
-    daemon->serverarray[first]->last_server = c;
-  else if (daemon->serverarray[first]->last_server == c)
-    daemon->serverarray[first]->last_server = -1;
+  set_last_server(header, first, c);
 
   /* If sufficient time has elapsed, try and expand UDP buffer size again. */
   if (difftime(now, server->pktsz_reduced) > UDP_TEST_TIME)
@@ -1969,7 +1975,6 @@ static ssize_t tcp_talk(int first, int last, int start, unsigned char *packet,
 	      continue;
 	    }
 	  
-	  daemon->serverarray[first]->last_server = start;
 	  serv->flags &= ~SERV_GOT_TCP;
 	}
       
@@ -1995,7 +2000,8 @@ static ssize_t tcp_talk(int first, int last, int start, unsigned char *packet,
 	 Try another server, or give up */
       if (!(hashp = hash_questions(header, rsize, daemon->namebuff)) || memcmp(hash, hashp, HASH_SIZE) != 0)
 	continue;
-      
+
+      set_last_server(header, first, start);
       serv->flags |= SERV_GOT_TCP;
       
       *servp = serv;
-- 
2.40.1

From a259779e7718ce8f374a022eb13b6044ab51f557 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Petr=20Men=C5=A1=C3=ADk?= <pemen...@redhat.com>
Date: Wed, 31 May 2023 12:15:20 +0200
Subject: [PATCH 7/7] Make last used DNS server separate for UDP and TCP

Keep last used servers separately. Allow to choose different TCP and UDP
servers, for example when firewall is misconfigured on some of them.
---
 src/dnsmasq.h      |  3 ++-
 src/domain-match.c | 10 ++++++----
 src/forward.c      | 22 +++++++++++-----------
 3 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/src/dnsmasq.h b/src/dnsmasq.h
index 69c5c53..fda7487 100644
--- a/src/dnsmasq.h
+++ b/src/dnsmasq.h
@@ -591,7 +591,8 @@ struct server {
   char *domain;
   struct server *next;
   int serial, arrayposn;
-  int last_server;
+  int last_udp_server;
+  int last_tcp_server;
   union mysockaddr addr, source_addr;
   char interface[IF_NAMESIZE+1];
   unsigned int ifindex; /* corresponding to interface, above */
diff --git a/src/domain-match.c b/src/domain-match.c
index 26ffe39..936ef3d 100644
--- a/src/domain-match.c
+++ b/src/domain-match.c
@@ -72,7 +72,8 @@ void build_server_array(void)
       {
 	daemon->serverarray[count] = serv;
 	serv->serial = count;
-	serv->last_server = -1;
+	serv->last_udp_server = -1;
+	serv->last_tcp_server = -1;
 	count++;
       }
   
@@ -468,8 +469,8 @@ int dnssec_server(struct server *server, char *keyname, int *firstp, int *lastp)
   /* No match to server used for original query.
      Use newly looked up set. */
   if (index == last)
-    index =  daemon->serverarray[first]->last_server == -1 ?
-      first : daemon->serverarray[first]->last_server;
+    index =  daemon->serverarray[first]->last_udp_server == -1 ?
+      first : daemon->serverarray[first]->last_udp_server;
 
   if (firstp)
     *firstp = first;
@@ -739,7 +740,8 @@ int add_update_server(int flags,
 	serv->addr = *addr;
       if (source_addr)
 	serv->source_addr = *source_addr;
-      serv->last_server = -1;
+      serv->last_udp_server = -1;
+      serv->last_tcp_server = -1;
     }
     
   serv->flags = flags;
diff --git a/src/forward.c b/src/forward.c
index 4e1b7cc..4ee63f0 100644
--- a/src/forward.c
+++ b/src/forward.c
@@ -352,14 +352,14 @@ static int forward_query(int udpfd, union mysockaddr *udpaddr,
 	{
 	  if (master->forwardcount++ > FORWARD_TEST ||
 	      difftime(now, master->forwardtime) > FORWARD_TIME ||
-	      master->last_server == -1)
+	      master->last_udp_server == -1)
 	    {
 	      master->forwardtime = now;
 	      master->forwardcount = 0;
 	      forward->forwardall = 1;
 	    }
 	  else
-	    start = master->last_server;
+	    start = master->last_udp_server;
 	}
     }
   else
@@ -1073,12 +1073,12 @@ static void dnssec_validate(struct frec *forward, struct dns_header *header,
 #endif
 
 /* set or reset last server used. */
-static void set_last_server(struct dns_header *header, int first, int current)
+static void set_last_server(struct dns_header *header, int current, int *last_server)
 {
   if (RCODE(header) != REFUSED)
-    daemon->serverarray[first]->last_server = current;
-  else if (daemon->serverarray[first]->last_server == current)
-    daemon->serverarray[first]->last_server = -1;
+    *last_server = current;
+  else if (*last_server == current)
+    *last_server = -1;
 }
 
 /* sets new last_server */
@@ -1124,7 +1124,7 @@ void reply_query(int fd, time_t now)
 
   server = daemon->serverarray[c];
 
-  set_last_server(header, first, c);
+  set_last_server(header, c, &daemon->serverarray[first]->last_udp_server);
 
   /* If sufficient time has elapsed, try and expand UDP buffer size again. */
   if (difftime(now, server->pktsz_reduced) > UDP_TEST_TIME)
@@ -2001,7 +2001,7 @@ static ssize_t tcp_talk(int first, int last, int start, unsigned char *packet,
       if (!(hashp = hash_questions(header, rsize, daemon->namebuff)) || memcmp(hash, hashp, HASH_SIZE) != 0)
 	continue;
 
-      set_last_server(header, first, start);
+      set_last_server(header, start, &daemon->serverarray[first]->last_tcp_server);
       serv->flags |= SERV_GOT_TCP;
       
       *servp = serv;
@@ -2172,7 +2172,7 @@ int recv_server_on_parent(int pipe_on_parent)
   for (s = daemon->serverarray[current]; s; s = server_next(s, first, last, current))
     if (server_addr_equal(&curs, s))
       {
-	daemon->serverarray[first]->last_server = current;
+	daemon->serverarray[first]->last_tcp_server = current;
 #ifdef TCP_DEBUG
 	log_query_mysockaddr(F_HOSTS, curs.domain, &s->addr, "TCP server for ", 0);
 #endif
@@ -2402,10 +2402,10 @@ unsigned char *tcp_request(int confd, time_t now,
 
 		  master = daemon->serverarray[first];
 		  
-		  if (option_bool(OPT_ORDER) || master->last_server == -1)
+		  if (option_bool(OPT_ORDER) || master->last_tcp_server == -1)
 		    start = first;
 		  else
-		    start = master->last_server;
+		    start = master->last_tcp_server;
 		  
 		  size = add_edns0_config(header, size, hlimit, &peer_addr, now, &cacheable);
 		  
-- 
2.40.1

_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss

Reply via email to