RE: Invalid transport_offset with AF_PACKET socket

2018-11-29 Thread Maxim Mikityanskiy
> > > So it should return at least 18 and not 14.
> >
> > Yes, the function does its best to return at least 18, but it silently
> expects
> > skb_transport_offset to exceed 18. In normal conditions, it will be more
> that
> > 18, because it will be at least 14 + 20. But in my case, when I send a
> packet
> > via an AF_PACKET socket, skb_transport_offset returns 14 (which is
> nonsense),
> > and the driver uses this value, causing the hardware to fail, because it's
> less
> > than 18.
> >
>
> Got it, so even if you copy 18 it is not sufficient ! if the packet is
> ipv4 or ipv6
> and the inline mode is set to  MLX5_INLINE_MODE_IP in the vport
> context you must copy the IP headers as well !

Yes, I know that. That's why the driver uses skb_transport_offset - to include
the network header, but as skb_transport_offset returns a wrong offset, the IP
header doesn't get included. The thing is that with 14 bytes the driver even
fails to send a packet, and with 18 bytes and missing L3 header it seems to send
it, but it doesn't matter much, of course, we should pass the L3 header as well.

> but what do you expect from AF_PACKET socket ? to parse each and every
> packet and set skb_transport_offset ?

Not exactly.

First, AF_PACKET already parses each packet, and it even sets the transport
offset correctly if I specify the protocol in the third parameter of the
socket() call:

socket(AF_PACKET, SOCK_RAW, htons(0x0800));

So, if called in a specific way, the AF_SOCKET does exactly that - it parses
each packet and sets the transport offset, but if the protocol is not specified
(0) in the third parameter, it fails to parse correctly.

Second, I don't expect that. It's absolutely OK if AF_PACKET tells the driver
that it failed to parse the packet, then the driver can guess the network
protocol and restart parsing itself. The driver knows the format of its L2
header and can look at the necessary field.

However, if AF_PACKET could ask the driver to guess the protocol, it would be a
little bit faster, because this way skb_probe_transport_header is only called
once, and the field lookup in the L2 header is O(1).


Re: Invalid transport_offset with AF_PACKET socket

2018-11-29 Thread Saeed Mahameed
On Wed, Nov 28, 2018 at 3:10 AM Maxim Mikityanskiy  wrote:
>
> Hi Saeed,
>
> > Can you elaborate more, what NIC? what configuration ? what do you mean
> > by confusion, anyway please see below
>
> ConnectX-4, after running `mlnx_qos -i eth1 --trust dscp`, which sets inline
> mode 2 (MLX5_INLINE_MODE_IP). I'll explain what I mean by confusion below.
>
> > in mlx5 with ConnectX4 or Connext4-LX there is a requirement to copy at
> > least the ethernet header to the tx descriptor otherwise this might
> > cause the packet to be dropped, and for RAW sockets the skb headers
> > offsets are not set, but the latest mlx5 upstream driver would know how
> > to handle this, and copy the minmum amount required
> > please see:
> >
> > static inline u16 mlx5e_calc_min_inline(enum mlx5_inline_modes mode,
> >   struct sk_buff *skb)
>
> Yes, I know that, and what I do is debugging an issue with this function.
>
> >
> > it should default to:
> >
> >
> > case MLX5_INLINE_MODE_L2:
> >   default:
> >   hlen = mlx5e_skb_l2_header_offset(skb);
>
> The issue appears in MLX5_INLINE_MODE_IP. I haven't tested
> MLX5_INLINE_MODE_TCP_UDP yet, though.
>
> > So it should return at least 18 and not 14.
>
> Yes, the function does its best to return at least 18, but it silently expects
> skb_transport_offset to exceed 18. In normal conditions, it will be more that
> 18, because it will be at least 14 + 20. But in my case, when I send a packet
> via an AF_PACKET socket, skb_transport_offset returns 14 (which is nonsense),
> and the driver uses this value, causing the hardware to fail, because it's 
> less
> than 18.
>

Got it, so even if you copy 18 it is not sufficient ! if the packet is
ipv4 or ipv6
and the inline mode is set to  MLX5_INLINE_MODE_IP in the vport
context you must copy the IP headers as well !

but what do you expect from AF_PACKET socket ? to parse each and every
packet and set skb_transport_offset ?

> > We had some issues with this in old driver such as kernels 4.14/15, and
> > it depends in the use case so i need some information first:
>
> No, it's not an old kernel. We actually have this bug in our internal bug
> tracking system, and I'm trying to resolve it.
>
> > 1. What Cards do you have ? (lspci)
>
> 03:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
> 03:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
> 81:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 
> Pro]
>
> Testing with ConnectX-4.
>
> > 2. What kernel/driver version are you using ?
>
> I'm on net-next-mlx5, commit 66a4b5ef638a (the latest when I started the
> investigation).
>
> > 3. what is the current enum mlx5_inline_modes seen in
> > mlx5e_calc_min_inline or sq->min_inline_mode ?
>
> MLX5_INLINE_MODE_IP, as I said above.
>
> > 4. Firmware version ? (ethtool -i)
>
> 12.22.0238 (MT_2190110032)
>
> > can you share the packet format you are sending and seeing the bad
> > behavior with
>
> Here is the hexdump of the simplest packet that causes the problem when it's
> sent through AF_PACKET after `mlnx_qos -i eth1 --trust dscp`:
>
> : 11 22 33 44 55 66 77 88 99 aa bb cc 08 00 45 00
> 0010: 00 20 00 00 40 00 40 11 ae a5 c6 12 00 01 c6 12
> 0020: 00 02 00 00 4a 38 00 0c 29 82 61 62 63 64
>
> (Please ignore the wrong UDP checksum and non-existing MACs, it doesn't matter
> at all, I tested it with completely valid packets as well. The wrong UDP
> checksum is due to a bug in our internal pypacket utility).
>
> Thanks,
> Max


RE: Invalid transport_offset with AF_PACKET socket

2018-11-28 Thread Maxim Mikityanskiy
Hi Saeed,

> Can you elaborate more, what NIC? what configuration ? what do you mean
> by confusion, anyway please see below

ConnectX-4, after running `mlnx_qos -i eth1 --trust dscp`, which sets inline
mode 2 (MLX5_INLINE_MODE_IP). I'll explain what I mean by confusion below.

> in mlx5 with ConnectX4 or Connext4-LX there is a requirement to copy at
> least the ethernet header to the tx descriptor otherwise this might
> cause the packet to be dropped, and for RAW sockets the skb headers
> offsets are not set, but the latest mlx5 upstream driver would know how
> to handle this, and copy the minmum amount required
> please see:
>
> static inline u16 mlx5e_calc_min_inline(enum mlx5_inline_modes mode,
>   struct sk_buff *skb)

Yes, I know that, and what I do is debugging an issue with this function.

>
> it should default to:
>
>
> case MLX5_INLINE_MODE_L2:
>   default:
>   hlen = mlx5e_skb_l2_header_offset(skb);

The issue appears in MLX5_INLINE_MODE_IP. I haven't tested
MLX5_INLINE_MODE_TCP_UDP yet, though.

> So it should return at least 18 and not 14.

Yes, the function does its best to return at least 18, but it silently expects
skb_transport_offset to exceed 18. In normal conditions, it will be more that
18, because it will be at least 14 + 20. But in my case, when I send a packet
via an AF_PACKET socket, skb_transport_offset returns 14 (which is nonsense),
and the driver uses this value, causing the hardware to fail, because it's less
than 18.

> We had some issues with this in old driver such as kernels 4.14/15, and
> it depends in the use case so i need some information first:

No, it's not an old kernel. We actually have this bug in our internal bug
tracking system, and I'm trying to resolve it.

> 1. What Cards do you have ? (lspci)

03:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
03:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
81:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 
Pro]

Testing with ConnectX-4.

> 2. What kernel/driver version are you using ?

I'm on net-next-mlx5, commit 66a4b5ef638a (the latest when I started the
investigation).

> 3. what is the current enum mlx5_inline_modes seen in
> mlx5e_calc_min_inline or sq->min_inline_mode ?

MLX5_INLINE_MODE_IP, as I said above.

> 4. Firmware version ? (ethtool -i)

12.22.0238 (MT_2190110032)

> can you share the packet format you are sending and seeing the bad
> behavior with

Here is the hexdump of the simplest packet that causes the problem when it's
sent through AF_PACKET after `mlnx_qos -i eth1 --trust dscp`:

: 11 22 33 44 55 66 77 88 99 aa bb cc 08 00 45 00
0010: 00 20 00 00 40 00 40 11 ae a5 c6 12 00 01 c6 12
0020: 00 02 00 00 4a 38 00 0c 29 82 61 62 63 64

(Please ignore the wrong UDP checksum and non-existing MACs, it doesn't matter
at all, I tested it with completely valid packets as well. The wrong UDP
checksum is due to a bug in our internal pypacket utility).

Thanks,
Max


RE: Invalid transport_offset with AF_PACKET socket

2018-11-28 Thread Maxim Mikityanskiy
Hi Willem,

> That is not what offset_hint 0 does. It also sets the transport header
> to the same as the network header.

No, it's not accurate. With offset_hint == 0 I have the transport header set to
the offset of the mac header. It's impossible that both offset_hint == 0 and 14
set skb->transport_header to the same value.

> This does leave the question whether the transport header would
> be better of not being set if it cannot be probed. I have not checked
> whether anything in the stack requires it to be set (i.e., not ~0U).

Well, at least we can craft a packet without the transport header at all using
AF_PACKET, and if some code expects transport_header to be set, it probably
wants it to be some reasonable value, not just pointing to some random bytes.
It's difficult to track down every usage of skb_transport_header, but if someone
expects it to be set when it can be set to a wrong value, they are already
having some trouble.

To have it set to ~0U looks totally reasonable if offset == 0 is not really a
special value, as I thought.

However, it would be good to have two special values for the header offset: one
would indicate it wasn't set, and the other would indicate that it has been
probed, but no transport header exists. It would allow to avoid probing a second
time, just because we don't know whether the transport header has been probed
or not.

Also, there are some places in kernel where skb_probe_transport_header is called
with offset_hint == 0, probably meaning that there is no hint, but the offset is
set anyway. Maybe it would be a good idea to allow probing without the offset
hint, so that if probing fails, transport_header is not set to some buggy value?

> Instrumenting the functions to see the offsets, I observe the
> following offsets from skb->head:
>
> packet_snd, sock_raw:   data=2, nh=16, th=16
> packet_snd, sock_dgram:   data=2, nh=16, th=2
> tpacket_snd, sock_raw:  data=2, nh=16, th=2
>
> Recall that data has to point to the link layer header at this
> point, so the ETH_HLEN difference between data and nh is correct.
>
> Of these, only the first looks remotely correct to me.

IMO, in all of these cases, the transport offset shouldn't be set to the network
or mac header.

> Yet this
> is the (only?) one that you observed causing problems for the
> device. Perhaps because offset 0 is treated as a special valid
> case in mlx5e_calc_min_inline.

Yes, the second and third cases don't cause the bug for the device, exactly for
this reason. Still, either some special value indicating "have probed, but no
transport header" should be legalized, or ~0U should be used.

Thanks,
Max


Re: Invalid transport_offset with AF_PACKET socket

2018-11-27 Thread Saeed Mahameed
On Tue, 2018-11-27 at 10:00 +, Maxim Mikityanskiy wrote:
> Hi everyone,
> 
> We are experiencing an issue with Mellanox mlx5 driver, and I tracked
> it down to
> the packet_snd function in net/packet/af_packet.c.
> 
> Brief description: when a socket is created by calling
> `socket(AF_PACKET,
> SOCK_RAW, 0)`, the mlx5 driver receives an skb with wrong
> transport_offset,
> which can confuse the driver and cause the transmit to fail
> (depending on the
> configuration of the NIC).

Hi Max, 
Can you elaborate more, what NIC? what configuration ? what do you mean
by confusion, anyway please see below

> 
> The flow is the following:
> 
> 1. packet_snd is called.
> 
> 2. dev->hard_header_len (which is 14) is assigned to reserve.
> 
> 3. The value of the third parameter of the initial socket() call is
> assigned to
> skb->protocol. In our case, it's 0.
> 
> 4. skb_probe_transport_header is called with offset_hint == reserve
> (which is
> 14).
> 
> 5. __skb_flow_dissect fails, because skb->protocol is 0.
> 
> 6. skb_probe_transport_header happily sets transport_header to 14.
> 
> I find this behavior (defaulting to 14) strange, because
> network_header is also
> set to 14, and the transport_header value is just wrong. Moreover,
> there are two
> more calls to skb_probe_transport_header in this file with
> offset_hint == 0,
> which looks more reasonable (if we can't find the transport header,
> we indicate
> that there is none, instead of pointing to the network header).
> 
> Does anyone know why offset_hint is set to 14 in this single place?
> Can it be
> replaced by 0 safely, and what can be the consequences?
> 
> Also, what guarantees does kernel provide for the network and
> transport header
> offsets? Especially in raw sockets, where the headers are not
> generated by
> different stack layers.
> 

in mlx5 with ConnectX4 or Connext4-LX there is a requirement to copy at
least the ethernet header to the tx descriptor otherwise this might
cause the packet to be dropped, and for RAW sockets the skb headers
offsets are not set, but the latest mlx5 upstream driver would know how
to handle this, and copy the minmum amount required
please see:

static inline u16 mlx5e_calc_min_inline(enum mlx5_inline_modes mode,
struct sk_buff *skb)


it should default to:


case MLX5_INLINE_MODE_L2:
default:
hlen = mlx5e_skb_l2_header_offset(skb);


static inline int mlx5e_skb_l2_header_offset(struct sk_buff *skb)
{
#define MLX5E_MIN_INLINE (ETH_HLEN + VLAN_HLEN)

return max(skb_network_offset(skb), MLX5E_MIN_INLINE);
}

So it should return at least 18 and not 14.

We had some issues with this in old driver such as kernels 4.14/15, and
it depends in the use case so i need some information first:

1. What Cards do you have ? (lspci)

2. What kernel/driver version are you using ?

3. what is the current enum mlx5_inline_modes seen in
mlx5e_calc_min_inline or sq->min_inline_mode ?

4. Firmware version ? (ethtool -i)

can you share the packet format you are sending and seeing the bad
behavior with 

> Thanks,
> Max


Re: Invalid transport_offset with AF_PACKET socket

2018-11-27 Thread Willem de Bruijn
On Tue, Nov 27, 2018 at 6:08 PM Willem de Bruijn
 wrote:
>
> On Tue, Nov 27, 2018 at 3:32 PM Willem de Bruijn
>  wrote:
> >
> > On Tue, Nov 27, 2018 at 2:58 PM Willem de Bruijn
> >  wrote:
> > >
> > > On Tue, Nov 27, 2018 at 1:41 PM Maxim Mikityanskiy  
> > > wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > We are experiencing an issue with Mellanox mlx5 driver, and I tracked 
> > > > it down to
> > > > the packet_snd function in net/packet/af_packet.c.
> > > >
> > > > Brief description: when a socket is created by calling 
> > > > `socket(AF_PACKET,
> > > > SOCK_RAW, 0)`, the mlx5 driver receives an skb with wrong 
> > > > transport_offset,
> > > > which can confuse the driver and cause the transmit to fail (depending 
> > > > on the
> > > > configuration of the NIC).
> > > >
> > > > The flow is the following:
> > > >
> > > > 1. packet_snd is called.
> > > >
> > > > 2. dev->hard_header_len (which is 14) is assigned to reserve.
> > > >
> > > > 3. The value of the third parameter of the initial socket() call is 
> > > > assigned to
> > > > skb->protocol. In our case, it's 0.
> > > >
> > > > 4. skb_probe_transport_header is called with offset_hint == reserve 
> > > > (which is
> > > > 14).
> > > >
> > > > 5. __skb_flow_dissect fails, because skb->protocol is 0.
> > > >
> > > > 6. skb_probe_transport_header happily sets transport_header to 14.
> > > >
> > > > I find this behavior (defaulting to 14) strange, because network_header 
> > > > is also
> > > > set to 14, and the transport_header value is just wrong. Moreover, 
> > > > there are two
> > > > more calls to skb_probe_transport_header in this file with offset_hint 
> > > > == 0,
> > > > which looks more reasonable (if we can't find the transport header, we 
> > > > indicate
> > > > that there is none, instead of pointing to the network header).
> > >
> > > That is not what offset_hint 0 does. It also sets the transport header
> > > to the same as the network header.
> >
> > Actually, what you observe may be due to commit  b84bbaf7a6c8cc
> > ("packet: in packet_snd start writing at link layer allocation"). This
> > updated skb_set_network_header, but not the fall-back value for
> > skb_probe_transport_header. Let me take a closer look.
>
> No, before and after the transport header is set to the same offset
> as the network header.
>
> This does leave the question whether the transport header would
> be better of not being set if it cannot be probed. I have not checked
> whether anything in the stack requires it to be set (i.e., not ~0U).
>
> The result with SOCK_DGRAM is actually probably buggy. In that
> case reserve is 0. But as in the case of SOCK_RAW skb->data is
> pointing to the link layer header when passing to po->xmit. So,
> nh_off is ETH_HLEN, but th_off is 0.

The same happens with SOCK_RAW using PACKET_TX_RING.

Instrumenting the functions to see the offsets, I observe the
following offsets from skb->head:

packet_snd, sock_raw:   data=2, nh=16, th=16
packet_snd, sock_dgram:   data=2, nh=16, th=2
tpacket_snd, sock_raw:  data=2, nh=16, th=2

Recall that data has to point to the link layer header at this
point, so the ETH_HLEN difference between data and nh is correct.

Of these, only the first looks remotely correct to me. Yet this
is the (only?) one that you observed causing problems for the
device. Perhaps because offset 0 is treated as a special valid
case in mlx5e_calc_min_inline.


Re: Invalid transport_offset with AF_PACKET socket

2018-11-27 Thread Willem de Bruijn
On Tue, Nov 27, 2018 at 3:32 PM Willem de Bruijn
 wrote:
>
> On Tue, Nov 27, 2018 at 2:58 PM Willem de Bruijn
>  wrote:
> >
> > On Tue, Nov 27, 2018 at 1:41 PM Maxim Mikityanskiy  
> > wrote:
> > >
> > > Hi everyone,
> > >
> > > We are experiencing an issue with Mellanox mlx5 driver, and I tracked it 
> > > down to
> > > the packet_snd function in net/packet/af_packet.c.
> > >
> > > Brief description: when a socket is created by calling `socket(AF_PACKET,
> > > SOCK_RAW, 0)`, the mlx5 driver receives an skb with wrong 
> > > transport_offset,
> > > which can confuse the driver and cause the transmit to fail (depending on 
> > > the
> > > configuration of the NIC).
> > >
> > > The flow is the following:
> > >
> > > 1. packet_snd is called.
> > >
> > > 2. dev->hard_header_len (which is 14) is assigned to reserve.
> > >
> > > 3. The value of the third parameter of the initial socket() call is 
> > > assigned to
> > > skb->protocol. In our case, it's 0.
> > >
> > > 4. skb_probe_transport_header is called with offset_hint == reserve 
> > > (which is
> > > 14).
> > >
> > > 5. __skb_flow_dissect fails, because skb->protocol is 0.
> > >
> > > 6. skb_probe_transport_header happily sets transport_header to 14.
> > >
> > > I find this behavior (defaulting to 14) strange, because network_header 
> > > is also
> > > set to 14, and the transport_header value is just wrong. Moreover, there 
> > > are two
> > > more calls to skb_probe_transport_header in this file with offset_hint == 
> > > 0,
> > > which looks more reasonable (if we can't find the transport header, we 
> > > indicate
> > > that there is none, instead of pointing to the network header).
> >
> > That is not what offset_hint 0 does. It also sets the transport header
> > to the same as the network header.
>
> Actually, what you observe may be due to commit  b84bbaf7a6c8cc
> ("packet: in packet_snd start writing at link layer allocation"). This
> updated skb_set_network_header, but not the fall-back value for
> skb_probe_transport_header. Let me take a closer look.

No, before and after the transport header is set to the same offset
as the network header.

This does leave the question whether the transport header would
be better of not being set if it cannot be probed. I have not checked
whether anything in the stack requires it to be set (i.e., not ~0U).

The result with SOCK_DGRAM is actually probably buggy. In that
case reserve is 0. But as in the case of SOCK_RAW skb->data is
pointing to the link layer header when passing to po->xmit. So,
nh_off is ETH_HLEN, but th_off is 0.

Again, same behavior before and after the above patch, so
ignore that red herring.


Re: Invalid transport_offset with AF_PACKET socket

2018-11-27 Thread Willem de Bruijn
On Tue, Nov 27, 2018 at 2:58 PM Willem de Bruijn
 wrote:
>
> On Tue, Nov 27, 2018 at 1:41 PM Maxim Mikityanskiy  
> wrote:
> >
> > Hi everyone,
> >
> > We are experiencing an issue with Mellanox mlx5 driver, and I tracked it 
> > down to
> > the packet_snd function in net/packet/af_packet.c.
> >
> > Brief description: when a socket is created by calling `socket(AF_PACKET,
> > SOCK_RAW, 0)`, the mlx5 driver receives an skb with wrong transport_offset,
> > which can confuse the driver and cause the transmit to fail (depending on 
> > the
> > configuration of the NIC).
> >
> > The flow is the following:
> >
> > 1. packet_snd is called.
> >
> > 2. dev->hard_header_len (which is 14) is assigned to reserve.
> >
> > 3. The value of the third parameter of the initial socket() call is 
> > assigned to
> > skb->protocol. In our case, it's 0.
> >
> > 4. skb_probe_transport_header is called with offset_hint == reserve (which 
> > is
> > 14).
> >
> > 5. __skb_flow_dissect fails, because skb->protocol is 0.
> >
> > 6. skb_probe_transport_header happily sets transport_header to 14.
> >
> > I find this behavior (defaulting to 14) strange, because network_header is 
> > also
> > set to 14, and the transport_header value is just wrong. Moreover, there 
> > are two
> > more calls to skb_probe_transport_header in this file with offset_hint == 0,
> > which looks more reasonable (if we can't find the transport header, we 
> > indicate
> > that there is none, instead of pointing to the network header).
>
> That is not what offset_hint 0 does. It also sets the transport header
> to the same as the network header.

Actually, what you observe may be due to commit  b84bbaf7a6c8cc
("packet: in packet_snd start writing at link layer allocation"). This
updated skb_set_network_header, but not the fall-back value for
skb_probe_transport_header. Let me take a closer look.


Re: Invalid transport_offset with AF_PACKET socket

2018-11-27 Thread Willem de Bruijn
On Tue, Nov 27, 2018 at 1:41 PM Maxim Mikityanskiy  wrote:
>
> Hi everyone,
>
> We are experiencing an issue with Mellanox mlx5 driver, and I tracked it down 
> to
> the packet_snd function in net/packet/af_packet.c.
>
> Brief description: when a socket is created by calling `socket(AF_PACKET,
> SOCK_RAW, 0)`, the mlx5 driver receives an skb with wrong transport_offset,
> which can confuse the driver and cause the transmit to fail (depending on the
> configuration of the NIC).
>
> The flow is the following:
>
> 1. packet_snd is called.
>
> 2. dev->hard_header_len (which is 14) is assigned to reserve.
>
> 3. The value of the third parameter of the initial socket() call is assigned 
> to
> skb->protocol. In our case, it's 0.
>
> 4. skb_probe_transport_header is called with offset_hint == reserve (which is
> 14).
>
> 5. __skb_flow_dissect fails, because skb->protocol is 0.
>
> 6. skb_probe_transport_header happily sets transport_header to 14.
>
> I find this behavior (defaulting to 14) strange, because network_header is 
> also
> set to 14, and the transport_header value is just wrong. Moreover, there are 
> two
> more calls to skb_probe_transport_header in this file with offset_hint == 0,
> which looks more reasonable (if we can't find the transport header, we 
> indicate
> that there is none, instead of pointing to the network header).

That is not what offset_hint 0 does. It also sets the transport header
to the same as the network header.

The difference with reserve is whether skb->data is pointing at the
link layer or network header at the time (SOCK_RAW vs SOCK_DGRAM).

Indicating that transport offset is not set would be setting it to
~0U. Perhaps that is indeed a better choice in these paths when
skb_flow_dissect_keys_basic fails to parse the headers.

> Does anyone know why offset_hint is set to 14 in this single place? Can it be
> replaced by 0 safely, and what can be the consequences?
>
> Also, what guarantees does kernel provide for the network and transport header
> offsets? Especially in raw sockets, where the headers are not generated by
> different stack layers.

>From the above, this appears to be best effort.

Note that the same is also used by tuntap and a few others.


Invalid transport_offset with AF_PACKET socket

2018-11-27 Thread Maxim Mikityanskiy
Hi everyone,

We are experiencing an issue with Mellanox mlx5 driver, and I tracked it down to
the packet_snd function in net/packet/af_packet.c.

Brief description: when a socket is created by calling `socket(AF_PACKET,
SOCK_RAW, 0)`, the mlx5 driver receives an skb with wrong transport_offset,
which can confuse the driver and cause the transmit to fail (depending on the
configuration of the NIC).

The flow is the following:

1. packet_snd is called.

2. dev->hard_header_len (which is 14) is assigned to reserve.

3. The value of the third parameter of the initial socket() call is assigned to
skb->protocol. In our case, it's 0.

4. skb_probe_transport_header is called with offset_hint == reserve (which is
14).

5. __skb_flow_dissect fails, because skb->protocol is 0.

6. skb_probe_transport_header happily sets transport_header to 14.

I find this behavior (defaulting to 14) strange, because network_header is also
set to 14, and the transport_header value is just wrong. Moreover, there are two
more calls to skb_probe_transport_header in this file with offset_hint == 0,
which looks more reasonable (if we can't find the transport header, we indicate
that there is none, instead of pointing to the network header).

Does anyone know why offset_hint is set to 14 in this single place? Can it be
replaced by 0 safely, and what can be the consequences?

Also, what guarantees does kernel provide for the network and transport header
offsets? Especially in raw sockets, where the headers are not generated by
different stack layers.

Thanks,
Max