Re: [Dnsmasq-discuss] DHCPv6 doesn't work on Linux interfaces enslaved to a VRF

Simon Kelley Mon, 09 Oct 2023 14:28:23 -0700

On 09/10/2023 11:40, Luci Stanescu wrote:

Hi Simon,
Thank you for your response and your openness to this issue! My thoughtsbelow, inline (and apologies for the rather long email).
On 9 Oct 2023, at 01:05, Simon Kelley <si...@thekelleys.org.uk> wrote:
1) Even if this is a kernel bug, kernel bugs fixes take a long time tospread, so working around them in dnsmasq is a good thing to do, aslong as it doesn't leave us with long-term technical debt. Thiswouldn't be the first time a kernel bug has been worked around.
I agree, that's why I've opened this discussion here.
2) https://docs.kernel.org/networking/vrf.html says:
Applications that are to work within a VRF need to bind their socketto the VRF device:
setsockopt(sd, SOL_SOCKET, SO_BINDTODEVICE, dev, strlen(dev)+1);
or to specify the output device using cmsg and IP_PKTINFO.
Which kind of implies that this might not be a kernel bug, ratherwe're just not doing what's required to work with VRF.
I'm not convinced this isn't a kernel bug. The VRF implementation hasbeen developed in stages over several years. It is indeed the case thatinitially the sockets had to be bound to the VRF device or to specify itvia IP_PKTINFO/IPV6_PKTINFO. But then came support fornet.ipv4.*_l3mdev_accept sysctls (which confusingly also affect AF_INET6sockets) as well as a series of patches in 2018 that allowed specifyinga VRF slave device for several operations. Before that series ofpatches, it certainly made sense for sin6_scope_id in msg_name forrecvmsg() to be the VRF device (it had to be) – but I'm not convinced itshouldn't have been changed after the rules for connect() and sendmsg()were relaxed. The thing is, as it stands, the kernel code works well foreverything except IPv6 link-local communication, so it wouldn't besurprising for this to be a simple oversight.
I had tracked this down while trying to figure out what's going on hereand detailed a bit in the kernel bug report, which you can find here:
https://lore.kernel.org/netdev/06798029-660d-454e-8628-3a9b9e1af...@safebits.tech/T/#u
 
<https://lore.kernel.org/netdev/06798029-660d-454e-8628-3a9b9e1af...@safebits.tech/T/#u>
Setting the device to send to using IP_PKTINFO rather than relying onthe flowinfo field in the destination address would be quite possible,and the above implies that it will work.
Apologies for being pernickety, but it's the scope_id field which isrelevant here, rather than flowinfo. And since we're talking AF_INET6,shouldn't it be IPv6_PKTINFO?


My bad. I meant scope_id. It was late :(

I *think* it should work. I have been unable to find a situation wherethe scope received in the IPV6_PKTINFO cmsg to recvmsg() cannot be usedto reliably send a response out the same interface (which I believe isexactly what DHCPv6 code will always want to do), but my word iscertainly no guarantee. More about this towards the end of the email.
However, it'll only work as long as you do NOT specify a scope in thedestination of the sendmsg() call or that scope specified is exactly thesame as in the IPV6_PKTINFO ancillary message. Specifically, you cannotspecify the VRF master device index. I've adapted my earlier scripts totest this and I've pasted them at the end of this email.
This brings us on to
3) IPv4. Does DHCPv4 work with VRF devices? It would be nice to test,and fix any similar problems in the same patch. Interestingly, theDHCPv4 code already sets the outgoing device via IP_PKTINFO (therebeing no flowinfo field in an IPv4 sockaddr) so it stands a chance ofjust working.
DHCPv4 works just fine. My dnsmasq configuration uses 'interface' tospecify the VRF slave interface (which in my case is a bridge) andDHCPv4 messages are sent out correctly.
Copying the inferface index into the flowinfo of the destination orsetting IP_PKTINFO are both easy patches to make and try. Thedifficult bit is being sure that they won't break existing installations.
My tests seem to imply that leaving the received scope_id field (whichis the VRF master device index) unchanged and setting IPV6_PKTINFO won'twork. Three options seem to work:1. Overwrite scope_id of source address from recvmsg() with theinterface index from the received IPV6_PKTINFO. 2. When performing the sendmsg(), set the scope_id of thedestination to 0 and add IPV6_PKTINFO with the the empty address (sincethe received IPV6_PKTINFO specifies the multicast address and that won'tdo as a source) and the interface index from the received IPV6_PKTINFO.3. If the socket is bound to an L3 interface (not the VRF masterdevice), just set the scope_id in the destination to 0 and IPV6_PKTINFOis not required. I'm not sure this'll work for dnsmasq, but I thought ofincluding it for the sake of completeness.


This is good information.

I've implemented option 1 here and it's currently running and dogfood onmy home network. There are no VRF interfaces there: this is a testmainly to check that nothing breaks. So far, so good.

The patch I used is attached. It would be interesting to see if itsolves the problem for you.

I've adapted my scripts slightly to allow easier testing of thebehaviour. The receiver socket now binds to the VRF device instead (youcan even not bind to any device and just set thenet.ipv4.udp_l3mdev_accept sysctl to 1). The interface configuration isas before:
ip link add myvrf type vrf table 42
ip link set myvrf up
ip link add veth1 type veth peer name veth2
ip link set veth1 master myvrf up
ip link set veth2 up

The receiver now echoes back any data that it gets:

import socket
import struct

s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
s.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_RECVPKTINFO, 1)
s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b'myvrf')
s.bind(('', 2000, 0, 0))
mreq = struct.pack('@16sI', socket.inet_pton(socket.AF_INET6,'ff02::1:2'), socket.if_nametoindex('veth1'))
s.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_JOIN_GROUP, mreq)

while True:
     data, cmsg_list, flags, source = s.recvmsg(4096, 4096)
     dest_scope = None
     for level, type, cmsg_data in cmsg_list:
         if level == socket.IPPROTO_IPV6 and type == socket.IPV6_PKTINFO:
             dest_address, dest_scope = struct.unpack('@16sI', cmsg_data)
             dest_address = socket.inet_ntop(socket.AF_INET6, dest_address)
print("PKTINFO destination {}%{}".format(dest_address,socket.if_indextoname(dest_scope)))
     source_address, source_port, source_flow, source_scope = source
print("receved message from {}%{}".format(source_address,socket.if_indextoname(source_scope))) ipinfo_cmsg = struct.pack('@16sI',socket.inet_pton(socket.AF_INET6, '::'), dest_scope or 0)
     s.sendmsg([data],
               [(socket.IPPROTO_IPV6, socket.IPV6_PKTINFO, ipinfo_cmsg)],
               0,
               (source_address, source_port, source_flow, source_scope))

The sender now waits for a response:

import socket

s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
dest = ('ff02::1:2', 2000, 0, socket.if_nametoindex('veth2'))
s.sendto(b'foo', dest)
data, source = s.recvfrom(4096)
source_address, source_port, source_flow, source_scope = source
print("Received response from '{}%{}': {}".format(source_address,socket.if_indextoname(source_scope), data))
If you run this example, the sendmsg() call from the receiver will failwith EINVAL. That's because the IPV6_PKTINFO ancillary message specifiesthe correct interfaces as a scope (the veth1 interface index), butsendmsg() specifies the VRF master device as a scope for the destination(as received from recvmsg()). You can get it to work by either:1. Specify 'dest_scope' in the last argument to sendmsg(), whichcorresponds to the sin6_scope_id field of the msg_name field. I guessyou can even specify 'dest_scope or source_scope' to have a fallback if(for whatever reason) recvmsg() doesn't get a IPV6_PKTINFO ancillarymessage. You can also not include the IPv6_PKTINFO ancillary message inthe sendmsg() call here, since it's now redundant.2. Specify '0' as the last argument to sendmsg() instead ofsource_scope, but now the IPV6_PKTINFO ancillary message in sendmsg()becomes mandatory.3. Bind the socket to 'veth1' instead of 'myvrf', don't include aIPV6_PKTINFO ancillary message in sendmsg() and specify '0' as the lastargument to sendmsg() instead of source_scope.
These three changes correspond to my earlier three options, respectively.
The difficult bit is being sure that they won't break existinginstallations.
Indeed. I think this boils down to:
1. Knowing that DHCPv6 should always send a message on the same L3interface on which it received a request. I believe this holds true, butyou'll definitely know more about the spec and real-world scenarios thanI do.2. Finding authoritative information that the interface index fromIPV6_PKTINFO is always set to the L3 interface on which a datagram wasreceived. The kernel mailing list might be start? I'd certainly be happyto help think about and test various scenarios.


I'm happy that 1. is true. Please enquire about 2.


Cheers,

Simon.

Cheers,

Luci


--
Luci Stanescu
Information Security Consultant

diff --git a/src/dhcp6.c b/src/dhcp6.c
index 7eeef03..87b6e07 100644
--- a/src/dhcp6.c
+++ b/src/dhcp6.c
@@ -118,11 +118,6 @@ void dhcp6_packet(time_t now)
   if ((sz = recv_dhcp_packet(daemon->dhcp6fd, &msg)) == -1)
     return;
   
-#ifdef HAVE_DUMPFILE
-  dump_packet_udp(DUMP_DHCPV6, (void *)daemon->dhcp_packet.iov_base, sz,
-		  (union mysockaddr *)&from, NULL, daemon->dhcp6fd);
-#endif
-  
   for (cmptr = CMSG_FIRSTHDR(&msg); cmptr; cmptr = CMSG_NXTHDR(&msg, cmptr))
     if (cmptr->cmsg_level == IPPROTO_IPV6 && cmptr->cmsg_type == daemon->v6pktinfo)
       {
@@ -136,6 +131,20 @@ void dhcp6_packet(time_t now)
 	dst_addr = p.p->ipi6_addr;
       }
 
+ #ifdef HAVE_LINUX_NETWORK
+ /* This works around a possible Linux kernel bug when using interfaces
+    enslaved to a VRF. The scope_id in the source address gets set
+    to the index of the VRF interface, not the slave. Fortunately,
+    the interface index returned by packetinfo is correct so we use
+    instead. */
+  from.sin6_scope_id = if_index;
+#endif
+
+#ifdef HAVE_DUMPFILE
+  dump_packet_udp(DUMP_DHCPV6, (void *)daemon->dhcp_packet.iov_base, sz,
+		  (union mysockaddr *)&from, NULL, daemon->dhcp6fd);
+#endif
+
   if (!indextoname(daemon->dhcp6fd, if_index, ifr.ifr_name))
     return;

_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss

Re: [Dnsmasq-discuss] DHCPv6 doesn't work on Linux interfaces enslaved to a VRF

Reply via email to