RE: Invalid transport_offset with AF_PACKET socket
> > > So it should return at least 18 and not 14. > > > > Yes, the function does its best to return at least 18, but it silently > expects > > skb_transport_offset to exceed 18. In normal conditions, it will be more > that > > 18, because it will be at least 14 + 20. But in my case, when I send a > packet > > via an AF_PACKET socket, skb_transport_offset returns 14 (which is > nonsense), > > and the driver uses this value, causing the hardware to fail, because it's > less > > than 18. > > > > Got it, so even if you copy 18 it is not sufficient ! if the packet is > ipv4 or ipv6 > and the inline mode is set to MLX5_INLINE_MODE_IP in the vport > context you must copy the IP headers as well ! Yes, I know that. That's why the driver uses skb_transport_offset - to include the network header, but as skb_transport_offset returns a wrong offset, the IP header doesn't get included. The thing is that with 14 bytes the driver even fails to send a packet, and with 18 bytes and missing L3 header it seems to send it, but it doesn't matter much, of course, we should pass the L3 header as well. > but what do you expect from AF_PACKET socket ? to parse each and every > packet and set skb_transport_offset ? Not exactly. First, AF_PACKET already parses each packet, and it even sets the transport offset correctly if I specify the protocol in the third parameter of the socket() call: socket(AF_PACKET, SOCK_RAW, htons(0x0800)); So, if called in a specific way, the AF_SOCKET does exactly that - it parses each packet and sets the transport offset, but if the protocol is not specified (0) in the third parameter, it fails to parse correctly. Second, I don't expect that. It's absolutely OK if AF_PACKET tells the driver that it failed to parse the packet, then the driver can guess the network protocol and restart parsing itself. The driver knows the format of its L2 header and can look at the necessary field. However, if AF_PACKET could ask the driver to guess the protocol, it would be a little bit faster, because this way skb_probe_transport_header is only called once, and the field lookup in the L2 header is O(1).
RE: Invalid transport_offset with AF_PACKET socket
Hi Saeed, > Can you elaborate more, what NIC? what configuration ? what do you mean > by confusion, anyway please see below ConnectX-4, after running `mlnx_qos -i eth1 --trust dscp`, which sets inline mode 2 (MLX5_INLINE_MODE_IP). I'll explain what I mean by confusion below. > in mlx5 with ConnectX4 or Connext4-LX there is a requirement to copy at > least the ethernet header to the tx descriptor otherwise this might > cause the packet to be dropped, and for RAW sockets the skb headers > offsets are not set, but the latest mlx5 upstream driver would know how > to handle this, and copy the minmum amount required > please see: > > static inline u16 mlx5e_calc_min_inline(enum mlx5_inline_modes mode, > struct sk_buff *skb) Yes, I know that, and what I do is debugging an issue with this function. > > it should default to: > > > case MLX5_INLINE_MODE_L2: > default: > hlen = mlx5e_skb_l2_header_offset(skb); The issue appears in MLX5_INLINE_MODE_IP. I haven't tested MLX5_INLINE_MODE_TCP_UDP yet, though. > So it should return at least 18 and not 14. Yes, the function does its best to return at least 18, but it silently expects skb_transport_offset to exceed 18. In normal conditions, it will be more that 18, because it will be at least 14 + 20. But in my case, when I send a packet via an AF_PACKET socket, skb_transport_offset returns 14 (which is nonsense), and the driver uses this value, causing the hardware to fail, because it's less than 18. > We had some issues with this in old driver such as kernels 4.14/15, and > it depends in the use case so i need some information first: No, it's not an old kernel. We actually have this bug in our internal bug tracking system, and I'm trying to resolve it. > 1. What Cards do you have ? (lspci) 03:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4] 03:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4] 81:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro] Testing with ConnectX-4. > 2. What kernel/driver version are you using ? I'm on net-next-mlx5, commit 66a4b5ef638a (the latest when I started the investigation). > 3. what is the current enum mlx5_inline_modes seen in > mlx5e_calc_min_inline or sq->min_inline_mode ? MLX5_INLINE_MODE_IP, as I said above. > 4. Firmware version ? (ethtool -i) 12.22.0238 (MT_2190110032) > can you share the packet format you are sending and seeing the bad > behavior with Here is the hexdump of the simplest packet that causes the problem when it's sent through AF_PACKET after `mlnx_qos -i eth1 --trust dscp`: : 11 22 33 44 55 66 77 88 99 aa bb cc 08 00 45 00 0010: 00 20 00 00 40 00 40 11 ae a5 c6 12 00 01 c6 12 0020: 00 02 00 00 4a 38 00 0c 29 82 61 62 63 64 (Please ignore the wrong UDP checksum and non-existing MACs, it doesn't matter at all, I tested it with completely valid packets as well. The wrong UDP checksum is due to a bug in our internal pypacket utility). Thanks, Max
RE: Invalid transport_offset with AF_PACKET socket
Hi Willem, > That is not what offset_hint 0 does. It also sets the transport header > to the same as the network header. No, it's not accurate. With offset_hint == 0 I have the transport header set to the offset of the mac header. It's impossible that both offset_hint == 0 and 14 set skb->transport_header to the same value. > This does leave the question whether the transport header would > be better of not being set if it cannot be probed. I have not checked > whether anything in the stack requires it to be set (i.e., not ~0U). Well, at least we can craft a packet without the transport header at all using AF_PACKET, and if some code expects transport_header to be set, it probably wants it to be some reasonable value, not just pointing to some random bytes. It's difficult to track down every usage of skb_transport_header, but if someone expects it to be set when it can be set to a wrong value, they are already having some trouble. To have it set to ~0U looks totally reasonable if offset == 0 is not really a special value, as I thought. However, it would be good to have two special values for the header offset: one would indicate it wasn't set, and the other would indicate that it has been probed, but no transport header exists. It would allow to avoid probing a second time, just because we don't know whether the transport header has been probed or not. Also, there are some places in kernel where skb_probe_transport_header is called with offset_hint == 0, probably meaning that there is no hint, but the offset is set anyway. Maybe it would be a good idea to allow probing without the offset hint, so that if probing fails, transport_header is not set to some buggy value? > Instrumenting the functions to see the offsets, I observe the > following offsets from skb->head: > > packet_snd, sock_raw: data=2, nh=16, th=16 > packet_snd, sock_dgram: data=2, nh=16, th=2 > tpacket_snd, sock_raw: data=2, nh=16, th=2 > > Recall that data has to point to the link layer header at this > point, so the ETH_HLEN difference between data and nh is correct. > > Of these, only the first looks remotely correct to me. IMO, in all of these cases, the transport offset shouldn't be set to the network or mac header. > Yet this > is the (only?) one that you observed causing problems for the > device. Perhaps because offset 0 is treated as a special valid > case in mlx5e_calc_min_inline. Yes, the second and third cases don't cause the bug for the device, exactly for this reason. Still, either some special value indicating "have probed, but no transport header" should be legalized, or ~0U should be used. Thanks, Max
Invalid transport_offset with AF_PACKET socket
Hi everyone, We are experiencing an issue with Mellanox mlx5 driver, and I tracked it down to the packet_snd function in net/packet/af_packet.c. Brief description: when a socket is created by calling `socket(AF_PACKET, SOCK_RAW, 0)`, the mlx5 driver receives an skb with wrong transport_offset, which can confuse the driver and cause the transmit to fail (depending on the configuration of the NIC). The flow is the following: 1. packet_snd is called. 2. dev->hard_header_len (which is 14) is assigned to reserve. 3. The value of the third parameter of the initial socket() call is assigned to skb->protocol. In our case, it's 0. 4. skb_probe_transport_header is called with offset_hint == reserve (which is 14). 5. __skb_flow_dissect fails, because skb->protocol is 0. 6. skb_probe_transport_header happily sets transport_header to 14. I find this behavior (defaulting to 14) strange, because network_header is also set to 14, and the transport_header value is just wrong. Moreover, there are two more calls to skb_probe_transport_header in this file with offset_hint == 0, which looks more reasonable (if we can't find the transport header, we indicate that there is none, instead of pointing to the network header). Does anyone know why offset_hint is set to 14 in this single place? Can it be replaced by 0 safely, and what can be the consequences? Also, what guarantees does kernel provide for the network and transport header offsets? Especially in raw sockets, where the headers are not generated by different stack layers. Thanks, Max