On Tue, Nov 27, 2018 at 1:41 PM Maxim Mikityanskiy <maxi...@mellanox.com> wrote:
>
> Hi everyone,
>
> We are experiencing an issue with Mellanox mlx5 driver, and I tracked it down 
> to
> the packet_snd function in net/packet/af_packet.c.
>
> Brief description: when a socket is created by calling `socket(AF_PACKET,
> SOCK_RAW, 0)`, the mlx5 driver receives an skb with wrong transport_offset,
> which can confuse the driver and cause the transmit to fail (depending on the
> configuration of the NIC).
>
> The flow is the following:
>
> 1. packet_snd is called.
>
> 2. dev->hard_header_len (which is 14) is assigned to reserve.
>
> 3. The value of the third parameter of the initial socket() call is assigned 
> to
> skb->protocol. In our case, it's 0.
>
> 4. skb_probe_transport_header is called with offset_hint == reserve (which is
> 14).
>
> 5. __skb_flow_dissect fails, because skb->protocol is 0.
>
> 6. skb_probe_transport_header happily sets transport_header to 14.
>
> I find this behavior (defaulting to 14) strange, because network_header is 
> also
> set to 14, and the transport_header value is just wrong. Moreover, there are 
> two
> more calls to skb_probe_transport_header in this file with offset_hint == 0,
> which looks more reasonable (if we can't find the transport header, we 
> indicate
> that there is none, instead of pointing to the network header).

That is not what offset_hint 0 does. It also sets the transport header
to the same as the network header.

The difference with reserve is whether skb->data is pointing at the
link layer or network header at the time (SOCK_RAW vs SOCK_DGRAM).

Indicating that transport offset is not set would be setting it to
~0U. Perhaps that is indeed a better choice in these paths when
skb_flow_dissect_keys_basic fails to parse the headers.

> Does anyone know why offset_hint is set to 14 in this single place? Can it be
> replaced by 0 safely, and what can be the consequences?
>
> Also, what guarantees does kernel provide for the network and transport header
> offsets? Especially in raw sockets, where the headers are not generated by
> different stack layers.

>From the above, this appears to be best effort.

Note that the same is also used by tuntap and a few others.

Reply via email to