[Dnsmasq-discuss] [PATCH] Create temporary leases for DHCPOFFER actions

2022-07-08 Thread Petr Menšík

Hi!

I have already described similar problem back in year 2021 [1]. There 
exists race condition when higher count of clients starts at similar 
time and requests DHCP(v4). First two patches were already sent. I think 
I have sent also following patches already, but were not able to find them.


Anyway, some DHCP clients tend to handle badly situation, where dnsmasq 
first offers them some address. Then later in the process it realizes 
such address were offered also to someone else and denies acking it. We 
already have a workaround merged for IPv6 case, where it will ack 
instead different address. I think it would be nice also for IPv6 the 
same way, but I think it is not as urgent as for IPv4.


I have created a kind of half-lease. When it offers someone address, it 
makes it reserved for that address. That should prevent unnecessary race 
during startup, where some clients tend to receive NACK. When client 
continues with DHCPREQUEST, it converts a temporary lease to a permanent 
lease, which is then stored into lease file.


It solves a race in case more clients requests address than addresses on 
such network are available. When there are the same amount of temporary 
leases and remaining addresses, it reuses random temporary lease. Of 
course in such situation a failures are unavoidable. But I think it 
should attempt do the best available.


I am attaching also setup.sh, which I used to emulate starting multiple 
clients at similar time. I configured dnsmasq listening on virbr1 device 
and offering addresses. Then run that script as root and record 
communication in wireshark. Unlike current situation, there should not 
be any NACKs present. Even though ISC dhclient handles NACK well and is 
able to retry, unlike some netbooting firmware, which fails the boot in 
such situation.


Comments or testing would be welcome.

This is tracked on redhat bug #2028704 [2]. I have pushed those commits 
also to dhcp-temp-leases branch in github [3].


1. 
https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2021q4/015982.html

2. https://bugzilla.redhat.com/show_bug.cgi?id=2028704
3. https://github.com/InfrastructureServices/dnsmasq/tree/dhcp-temp-leases
From 3930e69abd272ea49e026ced78f8660564d34858 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Petr=20Men=C5=A1=C3=ADk?= 
Date: Wed, 8 Dec 2021 00:39:30 +0100
Subject: [PATCH 1/6] Add icmp ping hash also when pinging requested address

Previously requested address were pinged always with hash 0, but pinged
from address_allocate with different hash. Try using the same hash for
the same client, regardless what source place it were used.
---
 src/dhcp.c| 26 +-
 src/dnsmasq.h |  1 +
 src/rfc2131.c |  3 ++-
 3 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/src/dhcp.c b/src/dhcp.c
index 8e9c606..1cf0b5b 100644
--- a/src/dhcp.c
+++ b/src/dhcp.c
@@ -715,6 +715,21 @@ struct dhcp_config *config_find_by_address(struct dhcp_config *configs, struct i
   return NULL;
 }
 
+unsigned int ping_hash(unsigned char *hwaddr, int hw_len)
+{
+  int i;
+  unsigned int j = 0;
+  /* hash hwaddr: use the SDBM hashing algorithm.  Seems to give good
+ dispersal even with similarly-valued "strings". */
+  for (i = 0; i < hw_len; i++)
+j = hwaddr[i] + (j << 6) + (j << 16) - j;
+
+  /* j == 0 is marker */
+  if (j == 0)
+j = 1;
+  return j;
+}
+
 /* Check if and address is in use by sending ICMP ping.
This wrapper handles a cache and load-limiting.
Return is NULL is address in use, or a pointer to a cache entry
@@ -785,18 +800,11 @@ int address_allocate(struct dhcp_context *context,
 
   struct in_addr start, addr;
   struct dhcp_context *c, *d;
-  int i, pass;
+  int pass;
   unsigned int j; 
 
-  /* hash hwaddr: use the SDBM hashing algorithm.  Seems to give good
- dispersal even with similarly-valued "strings". */ 
-  for (j = 0, i = 0; i < hw_len; i++)
-j = hwaddr[i] + (j << 6) + (j << 16) - j;
+  j = ping_hash(hwaddr, hw_len);
 
-  /* j == 0 is marker */
-  if (j == 0)
-j = 1;
-  
   for (pass = 0; pass <= 1; pass++)
 for (c = context; c; c = c->current)
   if (c->flags & (CONTEXT_STATIC | CONTEXT_PROXY))
diff --git a/src/dnsmasq.h b/src/dnsmasq.h
index a8937ce..88ff980 100644
--- a/src/dnsmasq.h
+++ b/src/dnsmasq.h
@@ -1499,6 +1499,7 @@ struct dhcp_context *address_available(struct dhcp_context *context,
 struct dhcp_context *narrow_context(struct dhcp_context *context, 
 struct in_addr taddr,
 struct dhcp_netid *netids);
+unsigned int ping_hash(unsigned char *hwaddr, int hw_len);
 struct ping_result *do_icmp_ping(time_t now, struct in_addr addr,
  unsigned int hash, int loopback);
 int address_allocate(struct dhcp_context *context,
diff --git a/src/rfc2131.c b/src/rfc2131.c
index ecda2d3..a7ce872 100644
--- a/src/rfc2131.c
+++ b/src/rfc2131.c
@@ -1133,7 +1133,8 @@ size_t dhcp_reply(struct dhcp_context *context, char *iface_name, int int_index,
 		   !config_find_by_address(daemon->dh

[Dnsmasq-discuss] malformed DNS packets with edns0 enabled

2022-07-08 Thread James Brown via Dnsmasq-discuss
Hello:

We use dnsmasq as a local caching resolver (binding to ::1) and are
currently upgrading some systems to EL8 (Rocky Linux 8 specifically, which
is a rebuild of Red Hat 8). We've noticed that a fairly significant
fraction of name resolutions fail when `option edns0` is enabled in
/etc/resolv.conf and dnsmasq is being used; that is to say, when
resolv.conf looks like

option edns0
nameserver ::1

These failures manifest for queries issued very close to a TTL expiry (that
is to say, if you request a name X with a TTL of 300 seconds, then wait
299.99 seconds, then request X again, it will fail about ½ of the time).

I've tried backporting dnsmasq 2.86, but it shows the same behavior.

I used tcpdump to capture the actual request issued and the Wireshark
protocol analyzer says that dnsmasq is emitting malformed DNS queries.

The query from libc to dnsmasq looks correct and the "additional records"
portion of the packet contains the following bytes:

00 00 29 04 b0  00 00 00 00 00 00

Based on my reading of EDNS0, this looks right (domain name is 0 bytes long
and is the root domain; packet type is 0x29 == 41).

However, on failed requests, the packet sent from dnsmasq to the upstream
DNS server ends with the following "additional records" section:

c0 0c 00 05 00 01 00 00 0c e4 00

This looks like a compressed label, since it starts with 0xc...? Which
doesn't make any sense to put in the OPT section?

The rest of the query looks fine.

Neither add-mac nor add-subnet is set, and edns-packet-max is set to 4096.

If I turn off dnsmasq and send queries directly to the upstream nameserver,
I don't ever see any of these "c00c" packets emitted, so I am pretty
confident that these bad bytes are coming from dnsmasq itself.

Has anyone ever seen anything like this? I'm glad to privately share pcaps
if that would help.

James Brown
Infrastructure Architect
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss