On 09/10/2023 11:40, Luci Stanescu wrote:
Hi Simon,

Thank you for your response and your openness to this issue! My thoughts below, inline (and apologies for the rather long email).

On 9 Oct 2023, at 01:05, Simon Kelley <si...@thekelleys.org.uk> wrote:
1) Even if this is a kernel bug, kernel bugs fixes take a long time to spread, so working around them in dnsmasq is a good thing to do, as long as it doesn't leave us with long-term technical debt. This wouldn't be the first time a kernel bug has been worked around.

I agree, that's why I've opened this discussion here.

2) https://docs.kernel.org/networking/vrf.html says:

Applications that are to work within a VRF need to bind their socket to the VRF device:
setsockopt(sd, SOL_SOCKET, SO_BINDTODEVICE, dev, strlen(dev)+1);
or to specify the output device using cmsg and IP_PKTINFO.

Which kind of implies that this might not be a kernel bug, rather we're just not doing what's required to work with VRF.

I'm not convinced this isn't a kernel bug. The VRF implementation has been developed in stages over several years. It is indeed the case that initially the sockets had to be bound to the VRF device or to specify it via IP_PKTINFO/IPV6_PKTINFO. But then came support for net.ipv4.*_l3mdev_accept sysctls (which confusingly also affect AF_INET6 sockets) as well as a series of patches in 2018 that allowed specifying a VRF slave device for several operations. Before that series of patches, it certainly made sense for sin6_scope_id in msg_name for recvmsg() to be the VRF device (it had to be) – but I'm not convinced it shouldn't have been changed after the rules for connect() and sendmsg() were relaxed. The thing is, as it stands, the kernel code works well for everything except IPv6 link-local communication, so it wouldn't be surprising for this to be a simple oversight.

I had tracked this down while trying to figure out what's going on here and detailed a bit in the kernel bug report, which you can find here:

https://lore.kernel.org/netdev/06798029-660d-454e-8628-3a9b9e1af...@safebits.tech/T/#u
 
<https://lore.kernel.org/netdev/06798029-660d-454e-8628-3a9b9e1af...@safebits.tech/T/#u>

Setting the device to send to using IP_PKTINFO rather than relying on the flowinfo field in the destination address would be quite possible, and the above implies that it will work.

Apologies for being pernickety, but it's the scope_id field which is relevant here, rather than flowinfo. And since we're talking AF_INET6, shouldn't it be IPv6_PKTINFO?

My bad. I meant scope_id. It was late :(



I *think* it should work. I have been unable to find a situation where the scope received in the IPV6_PKTINFO cmsg to recvmsg() cannot be used to reliably send a response out the same interface (which I believe is exactly what DHCPv6 code will always want to do), but my word is certainly no guarantee. More about this towards the end of the email.

However, it'll only work as long as you do NOT specify a scope in the destination of the sendmsg() call or that scope specified is exactly the same as in the IPV6_PKTINFO ancillary message. Specifically, you cannot specify the VRF master device index. I've adapted my earlier scripts to test this and I've pasted them at the end of this email.

This brings us on to

3) IPv4. Does DHCPv4 work with VRF devices? It would be nice to test, and fix any similar problems in the same patch. Interestingly, the DHCPv4 code already sets the outgoing device via IP_PKTINFO (there being no flowinfo field in an IPv4 sockaddr) so it stands a chance of just working.

DHCPv4 works just fine. My dnsmasq configuration uses 'interface' to specify the VRF slave interface (which in my case is a bridge) and DHCPv4 messages are sent out correctly.

Copying the inferface index into the flowinfo of the destination or setting IP_PKTINFO are both easy patches to make and try. The difficult bit is being sure that they won't break existing installations.

My tests seem to imply that leaving the received scope_id field (which is the VRF master device index) unchanged and setting IPV6_PKTINFO won't work. Three options seem to work: 1. Overwrite scope_id of source address from recvmsg() with the interface index from the received IPV6_PKTINFO.         2. When performing the sendmsg(), set the scope_id of the destination to 0 and add IPV6_PKTINFO with the the empty address (since the received IPV6_PKTINFO specifies the multicast address and that won't do as a source) and the interface index from the received IPV6_PKTINFO. 3. If the socket is bound to an L3 interface (not the VRF master device), just set the scope_id in the destination to 0 and IPV6_PKTINFO is not required. I'm not sure this'll work for dnsmasq, but I thought of including it for the sake of completeness.


This is good information.

I've implemented option 1 here and it's currently running and dogfood on my home network. There are no VRF interfaces there: this is a test mainly to check that nothing breaks. So far, so good.

The patch I used is attached. It would be interesting to see if it solves the problem for you.



I've adapted my scripts slightly to allow easier testing of the behaviour. The receiver socket now binds to the VRF device instead (you can even not bind to any device and just set the net.ipv4.udp_l3mdev_accept sysctl to 1). The interface configuration is as before:

ip link add myvrf type vrf table 42
ip link set myvrf up
ip link add veth1 type veth peer name veth2
ip link set veth1 master myvrf up
ip link set veth2 up

The receiver now echoes back any data that it gets:

import socket
import struct

s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
s.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_RECVPKTINFO, 1)
s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b'myvrf')
s.bind(('', 2000, 0, 0))
mreq = struct.pack('@16sI', socket.inet_pton(socket.AF_INET6, 'ff02::1:2'), socket.if_nametoindex('veth1'))
s.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_JOIN_GROUP, mreq)

while True:
     data, cmsg_list, flags, source = s.recvmsg(4096, 4096)
     dest_scope = None
     for level, type, cmsg_data in cmsg_list:
         if level == socket.IPPROTO_IPV6 and type == socket.IPV6_PKTINFO:
             dest_address, dest_scope = struct.unpack('@16sI', cmsg_data)
             dest_address = socket.inet_ntop(socket.AF_INET6, dest_address)
            print("PKTINFO destination {}%{}".format(dest_address, socket.if_indextoname(dest_scope)))
     source_address, source_port, source_flow, source_scope = source
    print("receved message from {}%{}".format(source_address, socket.if_indextoname(source_scope)))     ipinfo_cmsg = struct.pack('@16sI', socket.inet_pton(socket.AF_INET6, '::'), dest_scope or 0)
     s.sendmsg([data],
               [(socket.IPPROTO_IPV6, socket.IPV6_PKTINFO, ipinfo_cmsg)],
               0,
               (source_address, source_port, source_flow, source_scope))

The sender now waits for a response:

import socket

s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
dest = ('ff02::1:2', 2000, 0, socket.if_nametoindex('veth2'))
s.sendto(b'foo', dest)
data, source = s.recvfrom(4096)
source_address, source_port, source_flow, source_scope = source
print("Received response from '{}%{}': {}".format(source_address, socket.if_indextoname(source_scope), data))

If you run this example, the sendmsg() call from the receiver will fail with EINVAL. That's because the IPV6_PKTINFO ancillary message specifies the correct interfaces as a scope (the veth1 interface index), but sendmsg() specifies the VRF master device as a scope for the destination (as received from recvmsg()). You can get it to work by either: 1. Specify 'dest_scope' in the last argument to sendmsg(), which corresponds to the sin6_scope_id field of the msg_name field. I guess you can even specify 'dest_scope or source_scope' to have a fallback if (for whatever reason) recvmsg() doesn't get a IPV6_PKTINFO ancillary message. You can also not include the IPv6_PKTINFO ancillary message in the sendmsg() call here, since it's now redundant. 2. Specify '0' as the last argument to sendmsg() instead of source_scope, but now the IPV6_PKTINFO ancillary message in sendmsg() becomes mandatory. 3. Bind the socket to 'veth1' instead of 'myvrf', don't include a IPV6_PKTINFO ancillary message in sendmsg() and specify '0' as the last argument to sendmsg() instead of source_scope.

These three changes correspond to my earlier three options, respectively.

The difficult bit is being sure that they won't break existing installations.

Indeed. I think this boils down to:
1. Knowing that DHCPv6 should always send a message on the same L3 interface on which it received a request. I believe this holds true, but you'll definitely know more about the spec and real-world scenarios than I do. 2. Finding authoritative information that the interface index from IPV6_PKTINFO is always set to the L3 interface on which a datagram was received. The kernel mailing list might be start? I'd certainly be happy to help think about and test various scenarios.


I'm happy that 1. is true. Please enquire about 2.


Cheers,

Simon.

Cheers,

Luci


--
Luci Stanescu
Information Security Consultant
diff --git a/src/dhcp6.c b/src/dhcp6.c
index 7eeef03..87b6e07 100644
--- a/src/dhcp6.c
+++ b/src/dhcp6.c
@@ -118,11 +118,6 @@ void dhcp6_packet(time_t now)
   if ((sz = recv_dhcp_packet(daemon->dhcp6fd, &msg)) == -1)
     return;
   
-#ifdef HAVE_DUMPFILE
-  dump_packet_udp(DUMP_DHCPV6, (void *)daemon->dhcp_packet.iov_base, sz,
-		  (union mysockaddr *)&from, NULL, daemon->dhcp6fd);
-#endif
-  
   for (cmptr = CMSG_FIRSTHDR(&msg); cmptr; cmptr = CMSG_NXTHDR(&msg, cmptr))
     if (cmptr->cmsg_level == IPPROTO_IPV6 && cmptr->cmsg_type == daemon->v6pktinfo)
       {
@@ -136,6 +131,20 @@ void dhcp6_packet(time_t now)
 	dst_addr = p.p->ipi6_addr;
       }
 
+ #ifdef HAVE_LINUX_NETWORK
+ /* This works around a possible Linux kernel bug when using interfaces
+    enslaved to a VRF. The scope_id in the source address gets set
+    to the index of the VRF interface, not the slave. Fortunately,
+    the interface index returned by packetinfo is correct so we use
+    instead. */
+  from.sin6_scope_id = if_index;
+#endif
+
+#ifdef HAVE_DUMPFILE
+  dump_packet_udp(DUMP_DHCPV6, (void *)daemon->dhcp_packet.iov_base, sz,
+		  (union mysockaddr *)&from, NULL, daemon->dhcp6fd);
+#endif
+
   if (!indextoname(daemon->dhcp6fd, if_index, ifr.ifr_name))
     return;
 
_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss

Reply via email to