RE: Invalid transport_offset with AF_PACKET socket

2018-11-29 Thread Maxim Mikityanskiy
> > > So it should return at least 18 and not 14.
> >
> > Yes, the function does its best to return at least 18, but it silently
> expects
> > skb_transport_offset to exceed 18. In normal conditions, it will be more
> that
> > 18, because it will be at least 14 + 20. But in my case, when I send a
> packet
> > via an AF_PACKET socket, skb_transport_offset returns 14 (which is
> nonsense),
> > and the driver uses this value, causing the hardware to fail, because it's
> less
> > than 18.
> >
>
> Got it, so even if you copy 18 it is not sufficient ! if the packet is
> ipv4 or ipv6
> and the inline mode is set to  MLX5_INLINE_MODE_IP in the vport
> context you must copy the IP headers as well !

Yes, I know that. That's why the driver uses skb_transport_offset - to include
the network header, but as skb_transport_offset returns a wrong offset, the IP
header doesn't get included. The thing is that with 14 bytes the driver even
fails to send a packet, and with 18 bytes and missing L3 header it seems to send
it, but it doesn't matter much, of course, we should pass the L3 header as well.

> but what do you expect from AF_PACKET socket ? to parse each and every
> packet and set skb_transport_offset ?

Not exactly.

First, AF_PACKET already parses each packet, and it even sets the transport
offset correctly if I specify the protocol in the third parameter of the
socket() call:

socket(AF_PACKET, SOCK_RAW, htons(0x0800));

So, if called in a specific way, the AF_SOCKET does exactly that - it parses
each packet and sets the transport offset, but if the protocol is not specified
(0) in the third parameter, it fails to parse correctly.

Second, I don't expect that. It's absolutely OK if AF_PACKET tells the driver
that it failed to parse the packet, then the driver can guess the network
protocol and restart parsing itself. The driver knows the format of its L2
header and can look at the necessary field.

However, if AF_PACKET could ask the driver to guess the protocol, it would be a
little bit faster, because this way skb_probe_transport_header is only called
once, and the field lookup in the L2 header is O(1).


RE: Invalid transport_offset with AF_PACKET socket

2018-11-28 Thread Maxim Mikityanskiy
Hi Saeed,

> Can you elaborate more, what NIC? what configuration ? what do you mean
> by confusion, anyway please see below

ConnectX-4, after running `mlnx_qos -i eth1 --trust dscp`, which sets inline
mode 2 (MLX5_INLINE_MODE_IP). I'll explain what I mean by confusion below.

> in mlx5 with ConnectX4 or Connext4-LX there is a requirement to copy at
> least the ethernet header to the tx descriptor otherwise this might
> cause the packet to be dropped, and for RAW sockets the skb headers
> offsets are not set, but the latest mlx5 upstream driver would know how
> to handle this, and copy the minmum amount required
> please see:
>
> static inline u16 mlx5e_calc_min_inline(enum mlx5_inline_modes mode,
>   struct sk_buff *skb)

Yes, I know that, and what I do is debugging an issue with this function.

>
> it should default to:
>
>
> case MLX5_INLINE_MODE_L2:
>   default:
>   hlen = mlx5e_skb_l2_header_offset(skb);

The issue appears in MLX5_INLINE_MODE_IP. I haven't tested
MLX5_INLINE_MODE_TCP_UDP yet, though.

> So it should return at least 18 and not 14.

Yes, the function does its best to return at least 18, but it silently expects
skb_transport_offset to exceed 18. In normal conditions, it will be more that
18, because it will be at least 14 + 20. But in my case, when I send a packet
via an AF_PACKET socket, skb_transport_offset returns 14 (which is nonsense),
and the driver uses this value, causing the hardware to fail, because it's less
than 18.

> We had some issues with this in old driver such as kernels 4.14/15, and
> it depends in the use case so i need some information first:

No, it's not an old kernel. We actually have this bug in our internal bug
tracking system, and I'm trying to resolve it.

> 1. What Cards do you have ? (lspci)

03:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
03:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
81:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 
Pro]

Testing with ConnectX-4.

> 2. What kernel/driver version are you using ?

I'm on net-next-mlx5, commit 66a4b5ef638a (the latest when I started the
investigation).

> 3. what is the current enum mlx5_inline_modes seen in
> mlx5e_calc_min_inline or sq->min_inline_mode ?

MLX5_INLINE_MODE_IP, as I said above.

> 4. Firmware version ? (ethtool -i)

12.22.0238 (MT_2190110032)

> can you share the packet format you are sending and seeing the bad
> behavior with

Here is the hexdump of the simplest packet that causes the problem when it's
sent through AF_PACKET after `mlnx_qos -i eth1 --trust dscp`:

: 11 22 33 44 55 66 77 88 99 aa bb cc 08 00 45 00
0010: 00 20 00 00 40 00 40 11 ae a5 c6 12 00 01 c6 12
0020: 00 02 00 00 4a 38 00 0c 29 82 61 62 63 64

(Please ignore the wrong UDP checksum and non-existing MACs, it doesn't matter
at all, I tested it with completely valid packets as well. The wrong UDP
checksum is due to a bug in our internal pypacket utility).

Thanks,
Max


RE: Invalid transport_offset with AF_PACKET socket

2018-11-28 Thread Maxim Mikityanskiy
Hi Willem,

> That is not what offset_hint 0 does. It also sets the transport header
> to the same as the network header.

No, it's not accurate. With offset_hint == 0 I have the transport header set to
the offset of the mac header. It's impossible that both offset_hint == 0 and 14
set skb->transport_header to the same value.

> This does leave the question whether the transport header would
> be better of not being set if it cannot be probed. I have not checked
> whether anything in the stack requires it to be set (i.e., not ~0U).

Well, at least we can craft a packet without the transport header at all using
AF_PACKET, and if some code expects transport_header to be set, it probably
wants it to be some reasonable value, not just pointing to some random bytes.
It's difficult to track down every usage of skb_transport_header, but if someone
expects it to be set when it can be set to a wrong value, they are already
having some trouble.

To have it set to ~0U looks totally reasonable if offset == 0 is not really a
special value, as I thought.

However, it would be good to have two special values for the header offset: one
would indicate it wasn't set, and the other would indicate that it has been
probed, but no transport header exists. It would allow to avoid probing a second
time, just because we don't know whether the transport header has been probed
or not.

Also, there are some places in kernel where skb_probe_transport_header is called
with offset_hint == 0, probably meaning that there is no hint, but the offset is
set anyway. Maybe it would be a good idea to allow probing without the offset
hint, so that if probing fails, transport_header is not set to some buggy value?

> Instrumenting the functions to see the offsets, I observe the
> following offsets from skb->head:
>
> packet_snd, sock_raw:   data=2, nh=16, th=16
> packet_snd, sock_dgram:   data=2, nh=16, th=2
> tpacket_snd, sock_raw:  data=2, nh=16, th=2
>
> Recall that data has to point to the link layer header at this
> point, so the ETH_HLEN difference between data and nh is correct.
>
> Of these, only the first looks remotely correct to me.

IMO, in all of these cases, the transport offset shouldn't be set to the network
or mac header.

> Yet this
> is the (only?) one that you observed causing problems for the
> device. Perhaps because offset 0 is treated as a special valid
> case in mlx5e_calc_min_inline.

Yes, the second and third cases don't cause the bug for the device, exactly for
this reason. Still, either some special value indicating "have probed, but no
transport header" should be legalized, or ~0U should be used.

Thanks,
Max


Invalid transport_offset with AF_PACKET socket

2018-11-27 Thread Maxim Mikityanskiy
Hi everyone,

We are experiencing an issue with Mellanox mlx5 driver, and I tracked it down to
the packet_snd function in net/packet/af_packet.c.

Brief description: when a socket is created by calling `socket(AF_PACKET,
SOCK_RAW, 0)`, the mlx5 driver receives an skb with wrong transport_offset,
which can confuse the driver and cause the transmit to fail (depending on the
configuration of the NIC).

The flow is the following:

1. packet_snd is called.

2. dev->hard_header_len (which is 14) is assigned to reserve.

3. The value of the third parameter of the initial socket() call is assigned to
skb->protocol. In our case, it's 0.

4. skb_probe_transport_header is called with offset_hint == reserve (which is
14).

5. __skb_flow_dissect fails, because skb->protocol is 0.

6. skb_probe_transport_header happily sets transport_header to 14.

I find this behavior (defaulting to 14) strange, because network_header is also
set to 14, and the transport_header value is just wrong. Moreover, there are two
more calls to skb_probe_transport_header in this file with offset_hint == 0,
which looks more reasonable (if we can't find the transport header, we indicate
that there is none, instead of pointing to the network header).

Does anyone know why offset_hint is set to 14 in this single place? Can it be
replaced by 0 safely, and what can be the consequences?

Also, what guarantees does kernel provide for the network and transport header
offsets? Especially in raw sockets, where the headers are not generated by
different stack layers.

Thanks,
Max