Re: [PATCH net] net: ipv6: Validate GSO SKB before finish IPv6 processing

2020-12-31 Thread Daniel Axtens
Hi Aya,

> Daniel Axtens, does this solve the issue referred in your commit?
> 8914a595110a bnx2x: disable GSO where gso_size is too big for hardware

No, because:
> Note: These cases are handled in the same manner in IPv4 output finish.
> This patch aligns the behavior of IPv6 and IPv4.
and the issue my patch addresses was hit on IPv4, not IPv6.

In my case, the issue was that a packet came in on a ibmveth interface,
GSO but with a gso_size that's very very large. Open vSwitch then
transferred it to the bnx2x interface, where the firmware promptly
paniced.

There's a nice long description at
https://patchwork.ozlabs.org/project/netdev/patch/20180111235905.10110-1-...@axtens.net/
however the description wasn't included in the commit message.

Kind regards,
Daniel

>
> Thanks,
> Aya
>
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 749ad72386b2..36466f2211bd 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -125,8 +125,43 @@ static int ip6_finish_output2(struct net *net, struct 
> sock *sk, struct sk_buff *
>   return -EINVAL;
>  }
>  
> +static int
> +ip6_finish_output_gso_slowpath_drop(struct net *net, struct sock *sk,
> + struct sk_buff *skb, unsigned int mtu)
> +{
> + struct sk_buff *segs, *nskb;
> + netdev_features_t features;
> + int ret;
> +
> + /* Please see corresponding comment in ip_finish_output_gso
> +  * describing the cases where GSO segment length exceeds the
> +  * egress MTU.
> +  */
> + features = netif_skb_features(skb);
> + segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
> + if (IS_ERR_OR_NULL(segs)) {
> + kfree_skb(skb);
> + return -ENOMEM;
> + }
> +
> + consume_skb(skb);
> +
> + skb_list_walk_safe(segs, segs, nskb) {
> + int err;
> +
> + skb_mark_not_on_list(segs);
> + err = ip6_fragment(net, sk, segs, ip6_finish_output2);
> + if (err && ret == 0)
> + ret = err;
> + }
> +
> + return ret;
> +}
> +
>  static int __ip6_finish_output(struct net *net, struct sock *sk, struct 
> sk_buff *skb)
>  {
> + unsigned int mtu;
> +
>  #if defined(CONFIG_NETFILTER) && defined(CONFIG_XFRM)
>   /* Policy lookup after SNAT yielded a new policy */
>   if (skb_dst(skb)->xfrm) {
> @@ -135,7 +170,11 @@ static int __ip6_finish_output(struct net *net, struct 
> sock *sk, struct sk_buff
>   }
>  #endif
>  
> - if ((skb->len > ip6_skb_dst_mtu(skb) && !skb_is_gso(skb)) ||
> + mtu = ip6_skb_dst_mtu(skb);
> + if (skb_is_gso(skb) && !skb_gso_validate_network_len(skb, mtu))
> + return ip6_finish_output_gso_slowpath_drop(net, sk, skb, mtu);
> +
> + if ((skb->len > mtu && !skb_is_gso(skb)) ||
>   dst_allfrag(skb_dst(skb)) ||
>   (IP6CB(skb)->frag_max_size && skb->len > IP6CB(skb)->frag_max_size))
>   return ip6_fragment(net, sk, skb, ip6_finish_output2);
> -- 
> 2.14.1


Re: GSO where gso_size is too big for hardware

2019-01-22 Thread Daniel Axtens
Willem de Bruijn  writes:

> On Tue, Jan 22, 2019 at 11:24 AM Ben Hutchings
>  wrote:
>>
>> Last year you applied these fixes for a potential denial-of-service in
>> the bnx2x driver:
>>
>> commit 2b16f048729bf35e6c28a40cbfad07239f9dcd90
>> Author: Daniel Axtens 
>> Date:   Wed Jan 31 14:15:33 2018 +1100
>>
>> net: create skb_gso_validate_mac_len()
>>
>> commit 8914a595110a6eca69a5e275b323f5d09e18f4f9
>> Author: Daniel Axtens 
>> Date:   Wed Jan 31 14:15:34 2018 +1100
>>
>> bnx2x: disable GSO where gso_size is too big for hardware
>>
>> However I don't understand why the check is done only in the bnx2x
>> driver.  Shouldn't the networking core ensure that gso_size + L3/L4
>> headers is <= the device MTU?  If not, is every driver that does TSO
>> expected to check this?
>>
>> Also, should these fixes go to stable?  I'm not sure whether you're
>> still handling stable patches for any of the unfixed versions (< 4.16)
>> now.
>>
>> Ben.
>
> Irrespective of the GSO issue, this sounds relevant to this other thread
>
>   Stack sends oversize UDP packet to the driver
>   https://www.mail-archive.com/netdev@vger.kernel.org/msg279006.html
>
> which discusses a specific cause of larger than MTU packets and its
> effect on the bnxt.
>
> Perhaps these patches were initially applied to the bnx2x driver only,
> because at the time that was the only nic known to lock up on such
> packets? Either way, a device independent validation is indeed
> probably preferable (independent of fixing that lo flapping root
> cause).

I did try to propose a more generic approach:

 1) "[PATCH 0/3] Check gso_size of packets when forwarding"
(https://www.spinics.net/lists/netdev/msg478634.html)

In this series I looked just at forwarding, where there is a check
against regular MTU but not against GSO size. I added checks to
is_skb_forwardable and the Open vSwitch forwarding path, but in feedback
it was pointed out that I missed other ways a packet could be forwarded
such as tc-mired and bpf. A more generic approach was desired, so I
proposed:

 2) "[PATCH v2 0/4] Check size of packets before sending"
(https://www.spinics.net/lists/netdev/msg480847.html)

This added a check to is_skb_forwardable and another check to
validate_xmit_skb. It was not considered desirable to include this as a
primary fix because it would be very hard to backport, so we included
the fix for the bnx2x card specifically instead. I think the idea was to
come back to this fix later.

I do remember then spending quite a bit of time trying to get the
generic fix sorted out after that. I remember working on the intricacies
of verifing various GSO stuff - I sent some fixes to GSO_BY_FRAGS
handling, and I know I got sidetracked by GSO_DODGY somehow as well. I
think the problem was that without dealing with dodgy, you end up with
edge cases where untrusted providers of dodgy packets can bypass a naive
length check. Anyway, then I got busy - my job at that point was mostly
support-driven and customers keep having new urgent issues! - so I
didn't get to finish it. This was about a year ago, so my recollection
may be wrong and I may have misunderstood things.

Regards,
Daniel


Re: Stack sends oversize UDP packet to the driver

2019-01-21 Thread Daniel Axtens
Hi Michael,

> I've received a bug report of oversized UDP packets sent to the
> bnxt_en driver for transmission.  There is no check for illegal length
> in the driver and it will send a corrupted BD to the NIC if the
> non-TSO length exceeds the maximum MTU supported by the driver.  This
> ultimately causes the driver to hang.
>
> Looking a little deeper, it looks like the route of the SKB was
> initially to "lo" and therefore no fragmentation was done.  And it
> looks like the route later got changed to the bnxt_en dev before
> transmission.  The user was doing multiple VM reboots and the bad
> length was happening on the Linux host.
>
> I can add a length check in the driver to prevent this.  But is there
> a better way to prevent this in the stack?  Thanks.

I hit a similar sounding issue on a bnx2x - see commit
8914a595110a6eca69a5e275b323f5d09e18f4f9

In that case, a GSO packet with gso_size too large for the firmware was
coming to the bnx2x driver from an ibmveth device via Open vSwitch. I
also toyed with a fix in the stack and ended up fixing just the driver.

I was hoping to get a generic fix in to the stack afterwards, but didn't
get anything finished. Looking back at old branches, it looks like I
considered adding MTU validation to validate_xmit_skb, but I never got
that upstream. My vague recollection is that I ended up caught by edge
cases: GSO_DODGY allows an untrusted source to set gso parameters, so
that needed to be validated first - and that was complex and potentially
slow, and I just got overtaken by more urgent work. (Note that this was
a year ago and was in many ways my introduction to TSO/GSO, so I could
be completely wrong.) Anyway, I can send you my partial work if it would
be helpful.

Regards,
Daniel


Re: bnxt: card intermittently hanging and dropping link

2018-08-19 Thread Daniel Axtens
Hi Michael,

>>> The main issue is the TX timeout.
>>> .
>>>
 [ 2682.911693] bnxt_en :3b:00.0 eth4: TX timeout detected, starting 
 reset task!
 [ 2683.782496] bnxt_en :3b:00.0 eth4: Resp cmpl intr err msg: 0x51
 [ 2683.783061] bnxt_en :3b:00.0 eth4: hwrm_ring_free tx failed. rc:-1
 [ 2684.634557] bnxt_en :3b:00.0 eth4: Resp cmpl intr err msg: 0x51
 [ 2684.635120] bnxt_en :3b:00.0 eth4: hwrm_ring_free tx failed. rc:-1
>>>
>>> and it is not recovering.
>>>
>>> Please provide ethtool -i eth4 which will show the firmware version on
>>> the NIC.  Let's see if the firmware is too old.
>>
>> driver: bnxt_en
>> version: 1.8.0
>> firmware-version: 20.6.151.0/pkg 20.06.05.11
>
> I believe the firmware should be updated.  My colleague will contact
> you on how to proceed.

Thank you very much, I'll follow up with them off-list.

Regards,
Daniel


Re: bnxt: card intermittently hanging and dropping link

2018-08-16 Thread Daniel Axtens
Hi Michael,

> The main issue is the TX timeout.
> .
>
>> [ 2682.911693] bnxt_en :3b:00.0 eth4: TX timeout detected, starting 
>> reset task!
>> [ 2683.782496] bnxt_en :3b:00.0 eth4: Resp cmpl intr err msg: 0x51
>> [ 2683.783061] bnxt_en :3b:00.0 eth4: hwrm_ring_free tx failed. rc:-1
>> [ 2684.634557] bnxt_en :3b:00.0 eth4: Resp cmpl intr err msg: 0x51
>> [ 2684.635120] bnxt_en :3b:00.0 eth4: hwrm_ring_free tx failed. rc:-1
>
> and it is not recovering.
>
> Please provide ethtool -i eth4 which will show the firmware version on
> the NIC.  Let's see if the firmware is too old.

driver: bnxt_en
version: 1.8.0
firmware-version: 20.6.151.0/pkg 20.06.05.11
expansion-rom-version: 
bus-info: :3b:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: no
supports-priv-flags: no

Thanks!

Regards,
Daniel


bnxt: card intermittently hanging and dropping link

2018-08-15 Thread Daniel Axtens
Hi Michael,

I have some user reports of issues with a Broadcom 57412 card with the
card intermittently hanging and dropping the link.

The problem has been observed on a Dell server with an Ubuntu 4.13
kernel (bnxt_en version 1.7.0) and with an Ubuntu 4.15 kernel (bnxt_en
version 1.8.0). It seems to occur while mounting an XFS volume over
iSCSI, although running blkid on the partition succeeds, and it seems to
be a different volume each time.

I've included an excerpt from the kernel log below.

It seems that other people have reported this with a RHEL7 kernel - I
have no idea what driver version that would be running, or what workload
they were operating, but the warnings and messages look the same:
https://www.dell.com/community/PowerEdge-Hardware-General/Critical-network-bnxt-en-module-crashes-on-14G-servers/td-p/6031769/highlight/true
The forum poster reported that disconnecting the card from power for 5
minutes was sufficient to get things working again and I have asked our
user to test that.

Is this a known issue?

Regards,
Daniel

[ 2662.526151] scsi host14: iSCSI Initiator over TCP/IP
[ 2662.538110] scsi host15: iSCSI Initiator over TCP/IP
[ 2662.547350] scsi host16: iSCSI Initiator over TCP/IP
[ 2662.554660] scsi host17: iSCSI Initiator over TCP/IP
[ 2662.813860] scsi 15:0:0:1: Direct-Access PURE FlashArray    
PQ: 0 ANSI: 6
[ 2662.813888] scsi 14:0:0:1: Direct-Access PURE FlashArray    
PQ: 0 ANSI: 6
[ 2662.813972] scsi 16:0:0:1: Direct-Access PURE FlashArray    
PQ: 0 ANSI: 6
[ 2662.814322] sd 15:0:0:1: Attached scsi generic sg1 type 0
[ 2662.814553] sd 14:0:0:1: Attached scsi generic sg2 type 0
[ 2662.814554] scsi 17:0:0:1: Direct-Access PURE FlashArray    
PQ: 0 ANSI: 6
[ 2662.814612] sd 16:0:0:1: Attached scsi generic sg3 type 0
[ 2662.815081] sd 15:0:0:1: [sdb] 10737418240 512-byte logical blocks: (5.50 
TB/5.00 TiB)
[ 2662.815195] sd 15:0:0:1: [sdb] Write Protect is off
[ 2662.815197] sd 15:0:0:1: [sdb] Mode Sense: 43 00 00 08
[ 2662.815229] sd 14:0:0:1: [sdc] 10737418240 512-byte logical blocks: (5.50 
TB/5.00 TiB)
[ 2662.815292] sd 17:0:0:1: Attached scsi generic sg4 type 0
[ 2662.815342] sd 14:0:0:1: [sdc] Write Protect is off
[ 2662.815343] sd 14:0:0:1: [sdc] Mode Sense: 43 00 00 08
[ 2662.815419] sd 15:0:0:1: [sdb] Write cache: disabled, read cache: enabled, 
doesn't support DPO or FUA
[ 2662.815447] sd 16:0:0:1: [sdd] 10737418240 512-byte logical blocks: (5.50 
TB/5.00 TiB)
[ 2662.815544] sd 14:0:0:1: [sdc] Write cache: disabled, read cache: enabled, 
doesn't support DPO or FUA
[ 2662.815614] sd 16:0:0:1: [sdd] Write Protect is off
[ 2662.815615] sd 16:0:0:1: [sdd] Mode Sense: 43 00 00 08
[ 2662.815882] sd 16:0:0:1: [sdd] Write cache: disabled, read cache: enabled, 
doesn't support DPO or FUA
[ 2662.816188] sd 17:0:0:1: [sde] 10737418240 512-byte logical blocks: (5.50 
TB/5.00 TiB)
[ 2662.816298] sd 17:0:0:1: [sde] Write Protect is off
[ 2662.816300] sd 17:0:0:1: [sde] Mode Sense: 43 00 00 08
[ 2662.816502] sd 17:0:0:1: [sde] Write cache: disabled, read cache: enabled, 
doesn't support DPO or FUA
[ 2662.820080] sd 15:0:0:1: [sdb] Attached SCSI disk
[ 2662.820594] sd 14:0:0:1: [sdc] Attached SCSI disk
[ 2662.820995] sd 17:0:0:1: [sde] Attached SCSI disk
[ 2662.821176] sd 16:0:0:1: [sdd] Attached SCSI disk
[ 2662.913642] device-mapper: multipath round-robin: version 1.2.0 loaded
[ 2663.954001] XFS (dm-2): Mounting V5 Filesystem
[ 2673.186083]  connection3:0: ping timeout of 5 secs expired, recv timeout 5, 
last rx 4295558209, last ping 4295559460, now 4295560768
[ 2673.186135]  connection3:0: detected conn error (1022)
[ 2673.186137]  connection2:0: ping timeout of 5 secs expired, recv timeout 5, 
last rx 4295558209, last ping 4295559460, now 4295560768
[ 2673.186168]  connection2:0: detected conn error (1022)
[ 2673.186170]  connection1:0: ping timeout of 5 secs expired, recv timeout 5, 
last rx 4295558209, last ping 4295559460, now 4295560768
[ 2673.186211]  connection1:0: detected conn error (1022)
[ 2674.209870]  connection4:0: ping timeout of 5 secs expired, recv timeout 5, 
last rx 4295558463, last ping 4295559720, now 4295561024
[ 2674.209924]  connection4:0: detected conn error (1022)
[ 2678.560630]  session1: session recovery timed out after 5 secs
[ 2678.560641]  session2: session recovery timed out after 5 secs
[ 2678.560647]  session3: session recovery timed out after 5 secs
[ 2678.951453] device-mapper: multipath: Failing path 8:32.
[ 2678.951509] device-mapper: multipath: Failing path 8:48.
[ 2678.951548] device-mapper: multipath: Failing path 8:16.
[ 2679.584302]  session4: session recovery timed out after 5 secs
[ 2679.584313] sd 17:0:0:1: rejecting I/O to offline device
[ 2679.584356] sd 17:0:0:1: [sde] killing request
[ 2679.584362] sd 17:0:0:1: rejecting I/O to offline device
[ 2679.584392] sd 17:0:0:1: [sde] killing request
[ 2679.584401] sd 17:0:0:1: [sde] FAILED Result: hostbyte=DID_NO_CONNECT 
drive

[PATCH] net: use skb_is_gso_sctp() instead of open-coding

2018-03-08 Thread Daniel Axtens
As well as the basic conversion, I noticed that a lot of the
SCTP code checks gso_type without first checking skb_is_gso()
so I have added that where appropriate.

Also, document the helper.

Cc: Daniel Borkmann 
Cc: Marcelo Ricardo Leitner 
Signed-off-by: Daniel Axtens 

---

This depends on d02f51cbcf12 ("bpf: fix bpf_skb_adjust_net/bpf_skb_proto_xlat
to deal with gso sctp skbs") which introduces the required helper.
That is in the bpf tree at the moment.
---
 Documentation/networking/segmentation-offloads.txt | 5 -
 net/core/skbuff.c  | 2 +-
 net/sched/act_csum.c   | 2 +-
 net/sctp/input.c   | 8 
 net/sctp/inqueue.c | 2 +-
 net/sctp/offload.c | 2 +-
 6 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/Documentation/networking/segmentation-offloads.txt 
b/Documentation/networking/segmentation-offloads.txt
index 23a8dd91a9ec..36bb931b35e0 100644
--- a/Documentation/networking/segmentation-offloads.txt
+++ b/Documentation/networking/segmentation-offloads.txt
@@ -155,7 +155,10 @@ Therefore, any code in the core networking stack must be 
aware of the
 possibility that gso_size will be GSO_BY_FRAGS and handle that case
 appropriately.
 
-There are a couple of helpers to make this easier:
+There are some helpers to make this easier:
+
+ - skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if
+   an skb is an SCTP GSO skb.
 
  - For size checks, the skb_gso_validate_*_len family of helpers correctly
considers GSO_BY_FRAGS.
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 0bb0d8877954..baf990528943 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4904,7 +4904,7 @@ static unsigned int skb_gso_transport_seglen(const struct 
sk_buff *skb)
thlen += inner_tcp_hdrlen(skb);
} else if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))) {
thlen = tcp_hdrlen(skb);
-   } else if (unlikely(shinfo->gso_type & SKB_GSO_SCTP)) {
+   } else if (unlikely(skb_is_gso_sctp(skb))) {
thlen = sizeof(struct sctphdr);
}
/* UFO sets gso_size to the size of the fragmentation
diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
index b7ba9b06b147..24b2e8e681cf 100644
--- a/net/sched/act_csum.c
+++ b/net/sched/act_csum.c
@@ -350,7 +350,7 @@ static int tcf_csum_sctp(struct sk_buff *skb, unsigned int 
ihl,
 {
struct sctphdr *sctph;
 
-   if (skb_is_gso(skb) && skb_shinfo(skb)->gso_type & SKB_GSO_SCTP)
+   if (skb_is_gso(skb) && skb_is_gso_sctp(skb))
return 1;
 
sctph = tcf_csum_skb_nextlayer(skb, ihl, ipl, sizeof(*sctph));
diff --git a/net/sctp/input.c b/net/sctp/input.c
index 0247cc432e02..b381d78548ac 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -106,6 +106,7 @@ int sctp_rcv(struct sk_buff *skb)
int family;
struct sctp_af *af;
struct net *net = dev_net(skb->dev);
+   bool is_gso = skb_is_gso(skb) && skb_is_gso_sctp(skb);
 
if (skb->pkt_type != PACKET_HOST)
goto discard_it;
@@ -123,8 +124,7 @@ int sctp_rcv(struct sk_buff *skb)
 * it's better to just linearize it otherwise crc computing
 * takes longer.
 */
-   if ((!(skb_shinfo(skb)->gso_type & SKB_GSO_SCTP) &&
-skb_linearize(skb)) ||
+   if ((!is_gso && skb_linearize(skb)) ||
!pskb_may_pull(skb, sizeof(struct sctphdr)))
goto discard_it;
 
@@ -135,7 +135,7 @@ int sctp_rcv(struct sk_buff *skb)
if (skb_csum_unnecessary(skb))
__skb_decr_checksum_unnecessary(skb);
else if (!sctp_checksum_disable &&
-!(skb_shinfo(skb)->gso_type & SKB_GSO_SCTP) &&
+!is_gso &&
 sctp_rcv_checksum(net, skb) < 0)
goto discard_it;
skb->csum_valid = 1;
@@ -1218,7 +1218,7 @@ static struct sctp_association 
*__sctp_rcv_lookup_harder(struct net *net,
 * issue as packets hitting this are mostly INIT or INIT-ACK and
 * those cannot be on GSO-style anyway.
 */
-   if ((skb_shinfo(skb)->gso_type & SKB_GSO_SCTP) == SKB_GSO_SCTP)
+   if (skb_is_gso(skb) && skb_is_gso_sctp(skb))
return NULL;
 
ch = (struct sctp_chunkhdr *)skb->data;
diff --git a/net/sctp/inqueue.c b/net/sctp/inqueue.c
index 48392552ee7c..23ebc5318edc 100644
--- a/net/sctp/inqueue.c
+++ b/net/sctp/inqueue.c
@@ -170,7 +170,7 @@ struct sctp_chunk *sctp_inq_pop(struct sctp_inq *queue)
 
chunk = list_entry(entry, struct sctp_chunk, list);
 
-   if ((skb_shinfo(chunk->skb)->gso_type & SKB_GSO_SCTP) == 
SKB_GSO_SCTP) {
+   if (skb_

[PATCH] docs: segmentation-offloads.txt: Correct TCP gso_types

2018-03-08 Thread Daniel Axtens
Pretty minor: just SKB_GSO_TCP  -> SKB_GSO_TCPV4 and
   SKB_GSO_TCP6 -> SKB_GSO_TCPV6.

Signed-off-by: Daniel Axtens 
---
 Documentation/networking/segmentation-offloads.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/segmentation-offloads.txt 
b/Documentation/networking/segmentation-offloads.txt
index d47480b61ac6..e879b009c75d 100644
--- a/Documentation/networking/segmentation-offloads.txt
+++ b/Documentation/networking/segmentation-offloads.txt
@@ -20,8 +20,8 @@ TCP Segmentation Offload
 
 TCP segmentation allows a device to segment a single frame into multiple
 frames with a data payload size specified in skb_shinfo()->gso_size.
-When TCP segmentation requested the bit for either SKB_GSO_TCP or
-SKB_GSO_TCP6 should be set in skb_shinfo()->gso_type and
+When TCP segmentation requested the bit for either SKB_GSO_TCPV4 or
+SKB_GSO_TCPV6 should be set in skb_shinfo()->gso_type and
 skb_shinfo()->gso_size should be set to a non-zero value.
 
 TCP segmentation is dependent on support for the use of partial checksum
-- 
2.14.1



Re: [PATCH bpf] bpf: fix bpf_skb_adjust_net/bpf_skb_proto_xlat to deal with gso sctp skbs

2018-03-03 Thread Daniel Axtens
Hi Daniel,

> From: Daniel Axtens 
>
> SCTP GSO skbs have a gso_size of GSO_BY_FRAGS, so any sort of
> unconditionally mangling of that will result in nonsense value
> and would corrupt the skb later on.
>
> Therefore, i) add two helpers skb_increase_gso_size() and
> skb_decrease_gso_size() that would throw a one time warning and
> bail out for such skbs and ii) refuse and return early with an
> error in those BPF helpers that are affected. We do need to bail
> out as early as possible from there before any changes on the
> skb have been performed.
>
> Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
> Co-authored-by: Daniel Borkmann 
> Signed-off-by: Daniel Axtens 
> Cc: Marcelo Ricardo Leitner 
> Acked-by: Alexei Starovoitov 

I've looked over your changes and they all look good to me.

> +/* Note: Should be called only if skb_is_gso(skb) is true */
> +static inline bool skb_is_gso_sctp(const struct sk_buff *skb)
> +{
> + return skb_shinfo(skb)->gso_type & SKB_GSO_SCTP;
> +}
> +

This helper is a fantastic idea and I will send a docs update to
highlight it.

Regards,
Daniel


[PATCH v2 4/4] net: make skb_gso_*_seglen functions private

2018-02-28 Thread Daniel Axtens
They're very hard to use properly as they do not consider the
GSO_BY_FRAGS case. Code should use skb_gso_validate_network_len
and skb_gso_validate_mac_len as they do consider this case.

Make the seglen functions static, which stops people using them
outside of skbuff.c

Signed-off-by: Daniel Axtens 

---

v2: drop inline from functions.
---
 include/linux/skbuff.h | 33 -
 net/core/skbuff.c  | 37 +++--
 2 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a057dd1a75c7..ddf77cf4ff2d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3285,7 +3285,6 @@ int skb_zerocopy(struct sk_buff *to, struct sk_buff *from,
 void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len);
 int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen);
 void skb_scrub_packet(struct sk_buff *skb, bool xnet);
-unsigned int skb_gso_transport_seglen(const struct sk_buff *skb);
 bool skb_gso_validate_network_len(const struct sk_buff *skb, unsigned int mtu);
 bool skb_gso_validate_mac_len(const struct sk_buff *skb, unsigned int len);
 struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features);
@@ -4104,38 +4103,6 @@ static inline bool skb_head_is_locked(const struct 
sk_buff *skb)
return !skb->head_frag || skb_cloned(skb);
 }
 
-/**
- * skb_gso_network_seglen - Return length of individual segments of a gso 
packet
- *
- * @skb: GSO skb
- *
- * skb_gso_network_seglen is used to determine the real size of the
- * individual segments, including Layer3 (IP, IPv6) and L4 headers (TCP/UDP).
- *
- * The MAC/L2 header is not accounted for.
- */
-static inline unsigned int skb_gso_network_seglen(const struct sk_buff *skb)
-{
-   unsigned int hdr_len = skb_transport_header(skb) -
-  skb_network_header(skb);
-   return hdr_len + skb_gso_transport_seglen(skb);
-}
-
-/**
- * skb_gso_mac_seglen - Return length of individual segments of a gso packet
- *
- * @skb: GSO skb
- *
- * skb_gso_mac_seglen is used to determine the real size of the
- * individual segments, including MAC/L2, Layer3 (IP, IPv6) and L4
- * headers (TCP/UDP).
- */
-static inline unsigned int skb_gso_mac_seglen(const struct sk_buff *skb)
-{
-   unsigned int hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
-   return hdr_len + skb_gso_transport_seglen(skb);
-}
-
 /* Local Checksum Offload.
  * Compute outer checksum based on the assumption that the
  * inner checksum will be offloaded later.
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b63767008824..0bb0d8877954 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4891,7 +4891,7 @@ EXPORT_SYMBOL_GPL(skb_scrub_packet);
  *
  * The MAC/L2 or network (IP, IPv6) headers are not accounted for.
  */
-unsigned int skb_gso_transport_seglen(const struct sk_buff *skb)
+static unsigned int skb_gso_transport_seglen(const struct sk_buff *skb)
 {
const struct skb_shared_info *shinfo = skb_shinfo(skb);
unsigned int thlen = 0;
@@ -4913,7 +4913,40 @@ unsigned int skb_gso_transport_seglen(const struct 
sk_buff *skb)
 */
return thlen + shinfo->gso_size;
 }
-EXPORT_SYMBOL_GPL(skb_gso_transport_seglen);
+
+/**
+ * skb_gso_network_seglen - Return length of individual segments of a gso 
packet
+ *
+ * @skb: GSO skb
+ *
+ * skb_gso_network_seglen is used to determine the real size of the
+ * individual segments, including Layer3 (IP, IPv6) and L4 headers (TCP/UDP).
+ *
+ * The MAC/L2 header is not accounted for.
+ */
+static unsigned int skb_gso_network_seglen(const struct sk_buff *skb)
+{
+   unsigned int hdr_len = skb_transport_header(skb) -
+  skb_network_header(skb);
+
+   return hdr_len + skb_gso_transport_seglen(skb);
+}
+
+/**
+ * skb_gso_mac_seglen - Return length of individual segments of a gso packet
+ *
+ * @skb: GSO skb
+ *
+ * skb_gso_mac_seglen is used to determine the real size of the
+ * individual segments, including MAC/L2, Layer3 (IP, IPv6) and L4
+ * headers (TCP/UDP).
+ */
+static unsigned int skb_gso_mac_seglen(const struct sk_buff *skb)
+{
+   unsigned int hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
+
+   return hdr_len + skb_gso_transport_seglen(skb);
+}
 
 /**
  * skb_gso_size_check - check the skb size, considering GSO_BY_FRAGS
-- 
2.14.1



[PATCH v2 1/4] net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len

2018-02-28 Thread Daniel Axtens
If you take a GSO skb, and split it into packets, will the network
length (L3 headers + L4 headers + payload) of those packets be small
enough to fit within a given MTU?

skb_gso_validate_mtu gives you the answer to that question. However,
we recently added to add a way to validate the MAC length of a split GSO
skb (L2+L3+L4+payload), and the names get confusing, so rename
skb_gso_validate_mtu to skb_gso_validate_network_len

Signed-off-by: Daniel Axtens 
---
 include/linux/skbuff.h  |  2 +-
 net/core/skbuff.c   | 11 ++-
 net/ipv4/ip_forward.c   |  2 +-
 net/ipv4/ip_output.c|  2 +-
 net/ipv4/netfilter/nf_flow_table_ipv4.c |  2 +-
 net/ipv6/ip6_output.c   |  2 +-
 net/ipv6/netfilter/nf_flow_table_ipv6.c |  2 +-
 net/mpls/af_mpls.c  |  2 +-
 net/xfrm/xfrm_device.c  |  2 +-
 9 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index c1e66bdcf583..a057dd1a75c7 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3286,7 +3286,7 @@ void skb_split(struct sk_buff *skb, struct sk_buff *skb1, 
const u32 len);
 int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen);
 void skb_scrub_packet(struct sk_buff *skb, bool xnet);
 unsigned int skb_gso_transport_seglen(const struct sk_buff *skb);
-bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu);
+bool skb_gso_validate_network_len(const struct sk_buff *skb, unsigned int mtu);
 bool skb_gso_validate_mac_len(const struct sk_buff *skb, unsigned int len);
 struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features);
 struct sk_buff *skb_vlan_untag(struct sk_buff *skb);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 09bd89c90a71..b63767008824 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4955,19 +4955,20 @@ static inline bool skb_gso_size_check(const struct 
sk_buff *skb,
 }
 
 /**
- * skb_gso_validate_mtu - Return in case such skb fits a given MTU
+ * skb_gso_validate_network_len - Will a split GSO skb fit into a given MTU?
  *
  * @skb: GSO skb
  * @mtu: MTU to validate against
  *
- * skb_gso_validate_mtu validates if a given skb will fit a wanted MTU
- * once split.
+ * skb_gso_validate_network_len validates if a given skb will fit a
+ * wanted MTU once split. It considers L3 headers, L4 headers, and the
+ * payload.
  */
-bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu)
+bool skb_gso_validate_network_len(const struct sk_buff *skb, unsigned int mtu)
 {
return skb_gso_size_check(skb, skb_gso_network_seglen(skb), mtu);
 }
-EXPORT_SYMBOL_GPL(skb_gso_validate_mtu);
+EXPORT_SYMBOL_GPL(skb_gso_validate_network_len);
 
 /**
  * skb_gso_validate_mac_len - Will a split GSO skb fit in a given length?
diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
index 2dd21c3281a1..b54b948b0596 100644
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -55,7 +55,7 @@ static bool ip_exceeds_mtu(const struct sk_buff *skb, 
unsigned int mtu)
if (skb->ignore_df)
return false;
 
-   if (skb_is_gso(skb) && skb_gso_validate_mtu(skb, mtu))
+   if (skb_is_gso(skb) && skb_gso_validate_network_len(skb, mtu))
return false;
 
return true;
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index e8e675be60ec..66340ab750e6 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -248,7 +248,7 @@ static int ip_finish_output_gso(struct net *net, struct 
sock *sk,
 
/* common case: seglen is <= mtu
 */
-   if (skb_gso_validate_mtu(skb, mtu))
+   if (skb_gso_validate_network_len(skb, mtu))
return ip_finish_output2(net, sk, skb);
 
/* Slowpath -  GSO segment length exceeds the egress MTU.
diff --git a/net/ipv4/netfilter/nf_flow_table_ipv4.c 
b/net/ipv4/netfilter/nf_flow_table_ipv4.c
index 25d2975da156..2447077d163d 100644
--- a/net/ipv4/netfilter/nf_flow_table_ipv4.c
+++ b/net/ipv4/netfilter/nf_flow_table_ipv4.c
@@ -185,7 +185,7 @@ static bool __nf_flow_exceeds_mtu(const struct sk_buff 
*skb, unsigned int mtu)
if ((ip_hdr(skb)->frag_off & htons(IP_DF)) == 0)
return false;
 
-   if (skb_is_gso(skb) && skb_gso_validate_mtu(skb, mtu))
+   if (skb_is_gso(skb) && skb_gso_validate_network_len(skb, mtu))
return false;
 
return true;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 997c7f19ad62..a8a919520090 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -412,7 +412,7 @@ static bool ip6_pkt_too_big(const struct sk_buff *skb, 
unsigned int mtu)
if (skb->ignore_df)
return false;
 
-   if (skb_is_gso(skb) && skb_gso_validate_mtu(skb, mtu))
+   if (skb_is_gso(skb) && skb_gso_validate_network_len(skb, mtu))
 

[PATCH v2 2/4] net: sched: tbf: handle GSO_BY_FRAGS case in enqueue

2018-02-28 Thread Daniel Axtens
tbf_enqueue() checks the size of a packet before enqueuing it.
However, the GSO size check does not consider the GSO_BY_FRAGS
case, and so will drop GSO SCTP packets, causing a massive drop
in throughput.

Use skb_gso_validate_mac_len() instead, as it does consider that
case.

Signed-off-by: Daniel Axtens 
---
 net/sched/sch_tbf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 229172d509cc..03225a8df973 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -188,7 +188,8 @@ static int tbf_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
int ret;
 
if (qdisc_pkt_len(skb) > q->max_size) {
-   if (skb_is_gso(skb) && skb_gso_mac_seglen(skb) <= q->max_size)
+   if (skb_is_gso(skb) &&
+   skb_gso_validate_mac_len(skb, q->max_size))
return tbf_segment(skb, sch, to_free);
return qdisc_drop(skb, sch, to_free);
}
-- 
2.14.1



[PATCH v2 3/4] net: xfrm: use skb_gso_validate_network_len() to check gso sizes

2018-02-28 Thread Daniel Axtens
Replace skb_gso_network_seglen() with
skb_gso_validate_network_len(), as it considers the GSO_BY_FRAGS
case.

Signed-off-by: Daniel Axtens 
---
 net/ipv4/xfrm4_output.c | 3 ++-
 net/ipv6/xfrm6_output.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
index 94b8702603bc..be980c195fc5 100644
--- a/net/ipv4/xfrm4_output.c
+++ b/net/ipv4/xfrm4_output.c
@@ -30,7 +30,8 @@ static int xfrm4_tunnel_check_size(struct sk_buff *skb)
 
mtu = dst_mtu(skb_dst(skb));
if ((!skb_is_gso(skb) && skb->len > mtu) ||
-   (skb_is_gso(skb) && skb_gso_network_seglen(skb) > 
ip_skb_dst_mtu(skb->sk, skb))) {
+   (skb_is_gso(skb) &&
+!skb_gso_validate_network_len(skb, ip_skb_dst_mtu(skb->sk, skb 
{
skb->protocol = htons(ETH_P_IP);
 
if (skb->sk)
diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
index 8ae87d4ec5ff..5959ce9620eb 100644
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -82,7 +82,7 @@ static int xfrm6_tunnel_check_size(struct sk_buff *skb)
 
if ((!skb_is_gso(skb) && skb->len > mtu) ||
(skb_is_gso(skb) &&
-skb_gso_network_seglen(skb) > ip6_skb_dst_mtu(skb))) {
+!skb_gso_validate_network_len(skb, ip6_skb_dst_mtu(skb {
skb->dev = dst->dev;
skb->protocol = htons(ETH_P_IPV6);
 
-- 
2.14.1



[PATCH v2 0/4] GSO_BY_FRAGS correctness improvements

2018-02-28 Thread Daniel Axtens
As requested [1], I went through and had a look at users of gso_size to
see if there were things that need to be fixed to consider
GSO_BY_FRAGS, and I have tried to improve our helper functions to deal
with this case.

I found a few. This fixes bugs relating to the use of
skb_gso_*_seglen() where GSO_BY_FRAGS is not considered.

Patch 1 renames skb_gso_validate_mtu to skb_gso_validate_network_len.
This is follow-up to my earlier patch 2b16f048729b ("net: create
skb_gso_validate_mac_len()"), and just makes everything a bit clearer.

Patches 2 and 3 replace the final users of skb_gso_network_seglen() -
which doesn't consider GSO_BY_FRAGS - with
skb_gso_validate_network_len(), which does. This allows me to make the
skb_gso_*_seglen functions private in patch 4 - now future users won't
accidentally do the wrong comparison.

Two things remain. One is qdisc_pkt_len_init, which is discussed at
[2] - it's caught up in the GSO_DODGY mess. I don't have any expertise
in GSO_DODGY, and it looks like a good clean fix will involve
unpicking the whole validation mess, so I have left it for now.

Secondly, there are 3 eBPF opcodes that change the gso_size of an SKB
and don't consider GSO_BY_FRAGS. This is going through the bpf tree.

Regards,
Daniel

[1] https://patchwork.ozlabs.org/comment/1852414/
[2] https://www.spinics.net/lists/netdev/msg482397.html

PS: This is all in the core networking stack. For a driver to be
affected by this it would need to support NETIF_F_GSO_SCTP /
NETIF_F_GSO_SOFTWARE and then either use gso_size or not be a purely
virtual device. (Many drivers look at gso_size, but do not support
SCTP segmentation, so the core network will segment an SCTP gso before
it hits them.) Based on that, the only driver that may be affected is
sunvnet, but I have no way of testing it, so I haven't looked at it.

v2: split out bpf stuff
    fix review comments from Dave Miller

Daniel Axtens (4):
  net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len
  net: sched: tbf: handle GSO_BY_FRAGS case in enqueue
  net: xfrm: use skb_gso_validate_network_len() to check gso sizes
  net: make skb_gso_*_seglen functions private

 include/linux/skbuff.h  | 35 +---
 net/core/skbuff.c   | 48 -
 net/ipv4/ip_forward.c   |  2 +-
 net/ipv4/ip_output.c|  2 +-
 net/ipv4/netfilter/nf_flow_table_ipv4.c |  2 +-
 net/ipv4/xfrm4_output.c |  3 ++-
 net/ipv6/ip6_output.c   |  2 +-
 net/ipv6/netfilter/nf_flow_table_ipv6.c |  2 +-
 net/ipv6/xfrm6_output.c |  2 +-
 net/mpls/af_mpls.c  |  2 +-
 net/sched/sch_tbf.c |  3 ++-
 net/xfrm/xfrm_device.c  |  2 +-
 12 files changed, 54 insertions(+), 51 deletions(-)

-- 
2.14.1



Re: [PATCH 5/6] net: add and use helpers when adjusting gso_size

2018-02-28 Thread Daniel Axtens
Hi Daniel,

>> It means that a user loaded eBPF program can trigger logs full of
>> warnings merely by using this eBPF helper and generating GSO'd SCTP
>> traffic.
>> 
>> Daniel and Alexei, this is a serious problem.  The eBPF helpers
>> mentioned here cannot handle SCTP GSO packets properly, and in fact
>> corrupt them if they adjust the gso_size.
>> 
>> SCTP GSO packets use the GSO_BY_FRAGS scheme and cannot be treated
>> the same way we treat normal GSO packets.
>
> Thanks for Cc, I would have missed it. This patch in combination with
> patch 6/6 seems okay since we reject it in these cases, which is fine
> (although it could be a warn_once), but patch 6/6 is definitely buggy
> in that we leave the skb in a half edited state when leaving the helper.
> Daniel, want me to fix this up from bpf side and route 5/6 and actual
> fix via bpf tree? Otherwise please respin with fixed 6/6.

I'm happy for you to take these two via the bpf tree with whatever
fixes/changes are appropriate - I'm no eBPF expert so my fix was a 'best
guess' only. Let me know if you would like anything respun on my end.

Regards,
Daniel

PS. apologies for missing you on Cc - will make sure you're cced on any
future bpf-related work!
>
> Thanks,
> Daniel


[PATCH 3/6] net: xfrm: use skb_gso_validate_network_len() to check gso sizes

2018-02-27 Thread Daniel Axtens
Replace skb_gso_network_seglen() with
skb_gso_validate_network_len(), as it considers the GSO_BY_FRAGS
case.

Signed-off-by: Daniel Axtens 
---
 net/ipv4/xfrm4_output.c | 3 ++-
 net/ipv6/xfrm6_output.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
index 94b8702603bc..be980c195fc5 100644
--- a/net/ipv4/xfrm4_output.c
+++ b/net/ipv4/xfrm4_output.c
@@ -30,7 +30,8 @@ static int xfrm4_tunnel_check_size(struct sk_buff *skb)
 
mtu = dst_mtu(skb_dst(skb));
if ((!skb_is_gso(skb) && skb->len > mtu) ||
-   (skb_is_gso(skb) && skb_gso_network_seglen(skb) > 
ip_skb_dst_mtu(skb->sk, skb))) {
+   (skb_is_gso(skb) &&
+!skb_gso_validate_network_len(skb, ip_skb_dst_mtu(skb->sk, skb 
{
skb->protocol = htons(ETH_P_IP);
 
if (skb->sk)
diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
index 8ae87d4ec5ff..5959ce9620eb 100644
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -82,7 +82,7 @@ static int xfrm6_tunnel_check_size(struct sk_buff *skb)
 
if ((!skb_is_gso(skb) && skb->len > mtu) ||
(skb_is_gso(skb) &&
-skb_gso_network_seglen(skb) > ip6_skb_dst_mtu(skb))) {
+!skb_gso_validate_network_len(skb, ip6_skb_dst_mtu(skb {
skb->dev = dst->dev;
skb->protocol = htons(ETH_P_IPV6);
 
-- 
2.14.1



[PATCH 6/6] net: filter: refuse to change gso_size of SCTP packets

2018-02-27 Thread Daniel Axtens
An eBPF tc action on a veth pair can be passed an SCTP GSO skb, which
has gso_size of GSO_BY_FRAGS.

If that action calls bpf_skb_change_proto(), bpf_skb_net_grow()
or bpf_skb_net_shrink(), the code will unconditionally attempt to
increment or decrement the gso_size to some nonsense value.

It's not clear how this could be done correctly, so simply refuse
to operate on SCTP GSO skbs.

Cc: Marcelo Ricardo Leitner 
Signed-off-by: Daniel Axtens 

---

Marcelo - I am not an SCTP expert by any means, so if this code should
be doing something else, please let me know.
---
 net/core/filter.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index d9684d8a3ac7..f189c988962e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2096,6 +2096,10 @@ static int bpf_skb_proto_4_to_6(struct sk_buff *skb)
return ret;
 
if (skb_is_gso(skb)) {
+   /* SCTP uses GSO_BY_FRAGS, cannot adjust */
+   if (unlikely(skb_shinfo(skb)->gso_type & SKB_GSO_SCTP))
+   return -ENOTSUPP;
+
/* SKB_GSO_TCPV4 needs to be changed into
 * SKB_GSO_TCPV6.
 */
@@ -2132,6 +2136,10 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
return ret;
 
if (skb_is_gso(skb)) {
+   /* SCTP uses GSO_BY_FRAGS, cannot adjust */
+   if (unlikely(skb_shinfo(skb)->gso_type & SKB_GSO_SCTP))
+   return -ENOTSUPP;
+
/* SKB_GSO_TCPV6 needs to be changed into
 * SKB_GSO_TCPV4.
 */
@@ -2252,6 +2260,10 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 
len_diff)
return ret;
 
if (skb_is_gso(skb)) {
+   /* SCTP uses GSO_BY_FRAGS, cannot adjust */
+   if (unlikely(skb_shinfo(skb)->gso_type & SKB_GSO_SCTP))
+   return -ENOTSUPP;
+
/* Due to header grow, MSS needs to be downgraded. */
skb_decrease_gso_size(skb, len_diff);
/* Header must be checked, and gso_segs recomputed. */
@@ -2276,6 +2288,10 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 
len_diff)
return ret;
 
if (skb_is_gso(skb)) {
+   /* SCTP uses GSO_BY_FRAGS, cannot adjust */
+   if (unlikely(skb_shinfo(skb)->gso_type & SKB_GSO_SCTP))
+   return -ENOTSUPP;
+
/* Due to header shrink, MSS can be upgraded. */
skb_increase_gso_size(skb, len_diff);
/* Header must be checked, and gso_segs recomputed. */
-- 
2.14.1



[PATCH 5/6] net: add and use helpers when adjusting gso_size

2018-02-27 Thread Daniel Axtens
An audit of users of gso_size reveals that an eBPF tc action on a
veth pair can be passed an SCTP GSO skb, which has gso_size of
GSO_BY_FRAGS.

If that action calls bpf_skb_change_proto(), bpf_skb_net_grow()
or bpf_skb_net_shrink(), the gso_size will be unconditionally
incremented or decremented to some nonsense value.

Add helpers that WARN if attempting to change a gso_size of a
GSO_BY_FRAGS skb (and leave the value unchanged).

Signed-off-by: Daniel Axtens 
---
 Documentation/networking/segmentation-offloads.txt | 11 +--
 include/linux/skbuff.h | 16 
 net/core/filter.c  |  8 
 3 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/Documentation/networking/segmentation-offloads.txt 
b/Documentation/networking/segmentation-offloads.txt
index d47480b61ac6..23a8dd91a9ec 100644
--- a/Documentation/networking/segmentation-offloads.txt
+++ b/Documentation/networking/segmentation-offloads.txt
@@ -153,8 +153,15 @@ To signal this, gso_size is set to the special value 
GSO_BY_FRAGS.
 
 Therefore, any code in the core networking stack must be aware of the
 possibility that gso_size will be GSO_BY_FRAGS and handle that case
-appropriately. (For size checks, the skb_gso_validate_*_len family of
-helpers do this automatically.)
+appropriately.
+
+There are a couple of helpers to make this easier:
+
+ - For size checks, the skb_gso_validate_*_len family of helpers correctly
+   considers GSO_BY_FRAGS.
+
+ - For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size
+   will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs.
 
 This also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bits
 set. Note also that NETIF_F_GSO_SCTP is included in NETIF_F_GSO_SOFTWARE.
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index d8340e6e8814..157a91864e16 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4047,6 +4047,22 @@ static inline void skb_gso_reset(struct sk_buff *skb)
skb_shinfo(skb)->gso_type = 0;
 }
 
+static inline void skb_increase_gso_size(struct sk_buff *skb, u16 increment)
+{
+   if (WARN_ON_ONCE(skb_shinfo(skb)->gso_size == GSO_BY_FRAGS))
+   return;
+
+   skb_shinfo(skb)->gso_size += increment;
+}
+
+static inline void skb_decrease_gso_size(struct sk_buff *skb, u16 decrement)
+{
+   if (WARN_ON_ONCE(skb_shinfo(skb)->gso_size == GSO_BY_FRAGS))
+   return;
+
+   skb_shinfo(skb)->gso_size -= decrement;
+}
+
 void __skb_warn_lro_forwarding(const struct sk_buff *skb);
 
 static inline bool skb_warn_if_lro(const struct sk_buff *skb)
diff --git a/net/core/filter.c b/net/core/filter.c
index 0c121adbdbaa..d9684d8a3ac7 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2105,7 +2105,7 @@ static int bpf_skb_proto_4_to_6(struct sk_buff *skb)
}
 
/* Due to IPv6 header, MSS needs to be downgraded. */
-   skb_shinfo(skb)->gso_size -= len_diff;
+   skb_decrease_gso_size(skb, len_diff);
/* Header must be checked, and gso_segs recomputed. */
skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY;
skb_shinfo(skb)->gso_segs = 0;
@@ -2141,7 +2141,7 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
}
 
/* Due to IPv4 header, MSS can be upgraded. */
-   skb_shinfo(skb)->gso_size += len_diff;
+   skb_increase_gso_size(skb, len_diff);
/* Header must be checked, and gso_segs recomputed. */
skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY;
skb_shinfo(skb)->gso_segs = 0;
@@ -2253,7 +2253,7 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 
len_diff)
 
if (skb_is_gso(skb)) {
/* Due to header grow, MSS needs to be downgraded. */
-   skb_shinfo(skb)->gso_size -= len_diff;
+   skb_decrease_gso_size(skb, len_diff);
/* Header must be checked, and gso_segs recomputed. */
skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY;
skb_shinfo(skb)->gso_segs = 0;
@@ -2277,7 +2277,7 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 
len_diff)
 
if (skb_is_gso(skb)) {
/* Due to header shrink, MSS can be upgraded. */
-   skb_shinfo(skb)->gso_size += len_diff;
+   skb_increase_gso_size(skb, len_diff);
/* Header must be checked, and gso_segs recomputed. */
skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY;
skb_shinfo(skb)->gso_segs = 0;
-- 
2.14.1



[PATCH 4/6] net: make skb_gso_*_seglen functions private

2018-02-27 Thread Daniel Axtens
They're very hard to use properly as they do not consider the
GSO_BY_FRAGS case. Code should use skb_gso_validate_network_len
and skb_gso_validate_mac_len as they do consider this case.

Make the seglen functions static inline, which stops people
using them, and also means the validate functions don't have
to make any out of line calls.

Signed-off-by: Daniel Axtens 
---
 include/linux/skbuff.h | 33 -
 net/core/skbuff.c  | 37 +++--
 2 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index eeb76bb7730a..d8340e6e8814 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3288,7 +3288,6 @@ int skb_zerocopy(struct sk_buff *to, struct sk_buff *from,
 void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len);
 int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen);
 void skb_scrub_packet(struct sk_buff *skb, bool xnet);
-unsigned int skb_gso_transport_seglen(const struct sk_buff *skb);
 bool skb_gso_validate_network_len(const struct sk_buff *skb, unsigned int mtu);
 bool skb_gso_validate_mac_len(const struct sk_buff *skb, unsigned int len);
 struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features);
@@ -4107,38 +4106,6 @@ static inline bool skb_head_is_locked(const struct 
sk_buff *skb)
return !skb->head_frag || skb_cloned(skb);
 }
 
-/**
- * skb_gso_network_seglen - Return length of individual segments of a gso 
packet
- *
- * @skb: GSO skb
- *
- * skb_gso_network_seglen is used to determine the real size of the
- * individual segments, including Layer3 (IP, IPv6) and L4 headers (TCP/UDP).
- *
- * The MAC/L2 header is not accounted for.
- */
-static inline unsigned int skb_gso_network_seglen(const struct sk_buff *skb)
-{
-   unsigned int hdr_len = skb_transport_header(skb) -
-  skb_network_header(skb);
-   return hdr_len + skb_gso_transport_seglen(skb);
-}
-
-/**
- * skb_gso_mac_seglen - Return length of individual segments of a gso packet
- *
- * @skb: GSO skb
- *
- * skb_gso_mac_seglen is used to determine the real size of the
- * individual segments, including MAC/L2, Layer3 (IP, IPv6) and L4
- * headers (TCP/UDP).
- */
-static inline unsigned int skb_gso_mac_seglen(const struct sk_buff *skb)
-{
-   unsigned int hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
-   return hdr_len + skb_gso_transport_seglen(skb);
-}
-
 /* Local Checksum Offload.
  * Compute outer checksum based on the assumption that the
  * inner checksum will be offloaded later.
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 0c0c1d6f28ef..a664a3ae507e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4893,7 +4893,7 @@ EXPORT_SYMBOL_GPL(skb_scrub_packet);
  *
  * The MAC/L2 or network (IP, IPv6) headers are not accounted for.
  */
-unsigned int skb_gso_transport_seglen(const struct sk_buff *skb)
+static inline unsigned int skb_gso_transport_seglen(const struct sk_buff *skb)
 {
const struct skb_shared_info *shinfo = skb_shinfo(skb);
unsigned int thlen = 0;
@@ -4915,7 +4915,40 @@ unsigned int skb_gso_transport_seglen(const struct 
sk_buff *skb)
 */
return thlen + shinfo->gso_size;
 }
-EXPORT_SYMBOL_GPL(skb_gso_transport_seglen);
+
+/**
+ * skb_gso_network_seglen - Return length of individual segments of a gso 
packet
+ *
+ * @skb: GSO skb
+ *
+ * skb_gso_network_seglen is used to determine the real size of the
+ * individual segments, including Layer3 (IP, IPv6) and L4 headers (TCP/UDP).
+ *
+ * The MAC/L2 header is not accounted for.
+ */
+static inline unsigned int skb_gso_network_seglen(const struct sk_buff *skb)
+{
+   unsigned int hdr_len = skb_transport_header(skb) -
+  skb_network_header(skb);
+
+   return hdr_len + skb_gso_transport_seglen(skb);
+}
+
+/**
+ * skb_gso_mac_seglen - Return length of individual segments of a gso packet
+ *
+ * @skb: GSO skb
+ *
+ * skb_gso_mac_seglen is used to determine the real size of the
+ * individual segments, including MAC/L2, Layer3 (IP, IPv6) and L4
+ * headers (TCP/UDP).
+ */
+static inline unsigned int skb_gso_mac_seglen(const struct sk_buff *skb)
+{
+   unsigned int hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
+
+   return hdr_len + skb_gso_transport_seglen(skb);
+}
 
 /**
  * skb_gso_size_check - check the skb size, considering GSO_BY_FRAGS
-- 
2.14.1



[PATCH 2/6] net: sched: tbf: handle GSO_BY_FRAGS case in enqueue

2018-02-27 Thread Daniel Axtens
tbf_enqueue() checks the size of a packet before enqueuing it.
However, the GSO size check does not consider the GSO_BY_FRAGS
case, and so will drop GSO SCTP packets, causing a massive drop
in throughput.

Use skb_gso_validate_mac_len() instead, as it does consider that
case.

Signed-off-by: Daniel Axtens 
---
 net/sched/sch_tbf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 229172d509cc..03225a8df973 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -188,7 +188,8 @@ static int tbf_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
int ret;
 
if (qdisc_pkt_len(skb) > q->max_size) {
-   if (skb_is_gso(skb) && skb_gso_mac_seglen(skb) <= q->max_size)
+   if (skb_is_gso(skb) &&
+   skb_gso_validate_mac_len(skb, q->max_size))
return tbf_segment(skb, sch, to_free);
return qdisc_drop(skb, sch, to_free);
}
-- 
2.14.1



[PATCH 0/6]GSO_BY_FRAGS correctness improvements

2018-02-27 Thread Daniel Axtens
As requested [1], I went through and had a look at users of gso_size to
see if there were things that need to be fixed to consider
GSO_BY_FRAGS, and I have tried to improve our helper functions to deal
with this case.

I found a few, and this fixes all but one of them. The one thing that
remains is qdisc_pkt_len_init which is discussed at [2] - it's caught
up in the GSO_DODGY mess. I don't have any expertise in GSO_DODGY, and
it looks like a good clean fix will involve unpicking the whole
validation mess, so I have left it for now.

Patch 1 renames skb_gso_validate_mtu to skb_gso_validate_network_len.
This is follow-up to my earlier patch 2b16f048729b ("net: create
skb_gso_validate_mac_len()"), and just makes everything a bit clearer.

Patches 2 and 3 replace the final users of skb_gso_network_seglen() -
which doesn't consider GSO_BY_FRAGS - with
skb_gso_validate_network_len(), which does. This allows me to make the
skb_gso_*_seglen functions private in patch 4 - now future users won't
accidentally do the wrong comparison.

Lastly, there are 3 eBPF opcodes that change the gso_size of an SKB
and don't consider GSO_BY_FRAGS. To address this, patch 5 adds a
couple of helpers for changing gso_size that throw a WARN if you use
them on a GSO_BY_FRAGS packet. To return a useful error to eBPF, patch
6 goes further and prevents any change of gso_size for SCTP GSO skbs.

Regards,
Daniel

[1] https://patchwork.ozlabs.org/comment/1852414/
[2] https://www.spinics.net/lists/netdev/msg482397.html

PS: This is all in the core networking stack. For a driver to be
affected by this it would need to support NETIF_F_GSO_SCTP /
NETIF_F_GSO_SOFTWARE and then either use gso_size or not be a purely
virtual device. (Many drivers look at gso_size, but do not support
SCTP segmentation, so the core network will segment an SCTP gso before
it hits them.) Based on that, the only driver that may be affected is
sunvnet, but I have no way of testing it, so I haven't looked at it.

Daniel Axtens (6):
  net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len
  net: sched: tbf: handle GSO_BY_FRAGS case in enqueue
  net: xfrm: use skb_gso_validate_network_len() to check gso sizes
  net: make skb_gso_*_seglen functions private
  net: add and use helpers when adjusting gso_size
  net: filter: refuse to change gso_size of SCTP packets

 Documentation/networking/segmentation-offloads.txt | 11 -
 include/linux/skbuff.h | 51 --
 net/core/filter.c  | 24 --
 net/core/skbuff.c  | 48 +---
 net/ipv4/ip_forward.c  |  2 +-
 net/ipv4/ip_output.c   |  2 +-
 net/ipv4/netfilter/nf_flow_table_ipv4.c|  2 +-
 net/ipv4/xfrm4_output.c|  3 +-
 net/ipv6/ip6_output.c  |  2 +-
 net/ipv6/netfilter/nf_flow_table_ipv6.c|  2 +-
 net/ipv6/xfrm6_output.c|  2 +-
 net/mpls/af_mpls.c |  2 +-
 net/sched/sch_tbf.c|  3 +-
 net/xfrm/xfrm_device.c |  2 +-
 14 files changed, 99 insertions(+), 57 deletions(-)

-- 
2.14.1



[PATCH 1/6] net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len

2018-02-27 Thread Daniel Axtens
If you take a GSO skb, and split it into packets, will the network
length (L3 headers + L4 headers + payload) of those packets be small
enough to fit within a given MTU?

skb_gso_validate_mtu gives you the answer to that question. However,
we recently added to add a way to validate the MAC length of a split GSO
skb (L2+L3+L4+payload), and the names get confusing, so rename
skb_gso_validate_mtu to skb_gso_validate_network_len

Signed-off-by: Daniel Axtens 
---
 include/linux/skbuff.h  |  2 +-
 net/core/skbuff.c   | 11 ++-
 net/ipv4/ip_forward.c   |  2 +-
 net/ipv4/ip_output.c|  2 +-
 net/ipv4/netfilter/nf_flow_table_ipv4.c |  2 +-
 net/ipv6/ip6_output.c   |  2 +-
 net/ipv6/netfilter/nf_flow_table_ipv6.c |  2 +-
 net/mpls/af_mpls.c  |  2 +-
 net/xfrm/xfrm_device.c  |  2 +-
 9 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 9bc1750ca3d3..eeb76bb7730a 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3289,7 +3289,7 @@ void skb_split(struct sk_buff *skb, struct sk_buff *skb1, 
const u32 len);
 int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen);
 void skb_scrub_packet(struct sk_buff *skb, bool xnet);
 unsigned int skb_gso_transport_seglen(const struct sk_buff *skb);
-bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu);
+bool skb_gso_validate_network_len(const struct sk_buff *skb, unsigned int mtu);
 bool skb_gso_validate_mac_len(const struct sk_buff *skb, unsigned int len);
 struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features);
 struct sk_buff *skb_vlan_untag(struct sk_buff *skb);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 96d36b81a3a5..0c0c1d6f28ef 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4957,19 +4957,20 @@ static inline bool skb_gso_size_check(const struct 
sk_buff *skb,
 }
 
 /**
- * skb_gso_validate_mtu - Return in case such skb fits a given MTU
+ * skb_gso_validate_network_len - Will a split GSO skb fit into a given MTU?
  *
  * @skb: GSO skb
  * @mtu: MTU to validate against
  *
- * skb_gso_validate_mtu validates if a given skb will fit a wanted MTU
- * once split.
+ * skb_gso_validate_network_len validates if a given skb will fit a
+ * wanted MTU once split. It considers L3 headers, L4 headers, and the
+ * payload.
  */
-bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu)
+bool skb_gso_validate_network_len(const struct sk_buff *skb, unsigned int mtu)
 {
return skb_gso_size_check(skb, skb_gso_network_seglen(skb), mtu);
 }
-EXPORT_SYMBOL_GPL(skb_gso_validate_mtu);
+EXPORT_SYMBOL_GPL(skb_gso_validate_network_len);
 
 /**
  * skb_gso_validate_mac_len - Will a split GSO skb fit in a given length?
diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
index 2dd21c3281a1..b54b948b0596 100644
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -55,7 +55,7 @@ static bool ip_exceeds_mtu(const struct sk_buff *skb, 
unsigned int mtu)
if (skb->ignore_df)
return false;
 
-   if (skb_is_gso(skb) && skb_gso_validate_mtu(skb, mtu))
+   if (skb_is_gso(skb) && skb_gso_validate_network_len(skb, mtu))
return false;
 
return true;
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index e8e675be60ec..66340ab750e6 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -248,7 +248,7 @@ static int ip_finish_output_gso(struct net *net, struct 
sock *sk,
 
/* common case: seglen is <= mtu
 */
-   if (skb_gso_validate_mtu(skb, mtu))
+   if (skb_gso_validate_network_len(skb, mtu))
return ip_finish_output2(net, sk, skb);
 
/* Slowpath -  GSO segment length exceeds the egress MTU.
diff --git a/net/ipv4/netfilter/nf_flow_table_ipv4.c 
b/net/ipv4/netfilter/nf_flow_table_ipv4.c
index 25d2975da156..2447077d163d 100644
--- a/net/ipv4/netfilter/nf_flow_table_ipv4.c
+++ b/net/ipv4/netfilter/nf_flow_table_ipv4.c
@@ -185,7 +185,7 @@ static bool __nf_flow_exceeds_mtu(const struct sk_buff 
*skb, unsigned int mtu)
if ((ip_hdr(skb)->frag_off & htons(IP_DF)) == 0)
return false;
 
-   if (skb_is_gso(skb) && skb_gso_validate_mtu(skb, mtu))
+   if (skb_is_gso(skb) && skb_gso_validate_network_len(skb, mtu))
return false;
 
return true;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 997c7f19ad62..a8a919520090 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -412,7 +412,7 @@ static bool ip6_pkt_too_big(const struct sk_buff *skb, 
unsigned int mtu)
if (skb->ignore_df)
return false;
 
-   if (skb_is_gso(skb) && skb_gso_validate_mtu(skb, mtu))
+   if (skb_is_gso(skb) && skb_gso_validate_network_len(skb, mtu))
 

Re: syzcaller patch postings...

2018-02-22 Thread Daniel Axtens
Dmitry Vyukov  writes:

> On Thu, Feb 22, 2018 at 3:35 PM, David Miller  wrote:
>> From: Dmitry Vyukov 
>> Date: Thu, 22 Feb 2018 10:58:07 +0100
>>
>>> Do I understand it correctly that if syzbot replies to the CC list
>>> that was in the testing request, it will resolve the problem? So if
>>> netdev wasn't in CC, it will not be added to CC.
>>>
>>> I will go and fix it now.
>>
>> I don't want syzbot to send the patch to netdev, even if it
>> was in the CC: list.
>>
>> And again this goes for netfilter-devel and linux-wireless as
>> well.
>>
>> There is no reason whatsoever for syzbot to ever post an already
>> posted patch back to the list again, even if it was on the CC:
>> list.
>>
>> In fact netdev will be on that CC: list most of the time.
>
> But if the list on CC the first time, then the patch is already on
> Patchwork, right? When syzbot replies it will add In-Reply-To, so this
> should be treated as a comment to the existing patch? Will it still
> cause problems?

You would think that it would treat it as a comment, but it doesn't.  We
treat something as a comment if and only if the subject begins with some
variant of Re:.  An I-R-T is not enough: I think the reasoning was that
people sometimes post their v2 series as replies to their v1 series,
which is bad; but we erred on the side of not losing patches.

There's not really a solid conceptual framework for this - in part
because parsing the sheer variety of mail people post is really, really
hard. Suggestions on a better algorithm are of course welcome.

> How does Patchwork understand that an email contains a patch? Perhaps
> if we make these emails appear as they don't contain a patch, it will
> solve the problem.

We have a pretty sophisticated parser that looks for things that look
like the output of diff. [FWIW, the algorithm is at
https://github.com/getpatchwork/patchwork/blob/master/patchwork/parser.py#L688 ]

You *could* try to trick it: looking at the code, probably the simplest
way would be to replace '---' in '--- a/foo/bar.c' with something else,
but I'm really quite uncomfortable with that.

Regards,
Daniel


Re: syzcaller patch postings...

2018-02-22 Thread Daniel Axtens
Dmitry Vyukov  writes:

> On Thu, Feb 22, 2018 at 2:31 PM, Daniel Axtens  wrote:
>> Dmitry Vyukov  writes:
>>
>>> On Thu, Feb 22, 2018 at 9:26 AM, Paolo Abeni  wrote:
>>>> On Wed, 2018-02-21 at 16:47 -0500, David Miller wrote:
>>>>> I have to mention this now before it gets out of control.
>>>>>
>>>>> I would like to ask that syzkaller stop posting the patch it is
>>>>> testing when it posts to netdev.
>>>>
>>>> There is an open issue on this topic:
>>>>
>>>> https://github.com/google/syzkaller/issues/526
>>>>
>>>> The current behaviour is that syzbot replies to all get_maintainer.pl
>>>> recipients after testing a patch, regardless of the test submission
>>>> recipient list, the idea was instead to respect such list.
>>>
>>>
>>> Hi David, Florian, Paolo,
>>>
>>> Didn't realize it triggers patchwork. This wasn't intentional, sorry.
>>
>> A little-publicised and incorrectly-documented(!) feature of Patchwork
>> is that it supports some email headers. In particular, if you include an
>> "X-Patchwork-Hint: ignore" header, the mail will not be parsed by
>> Patchwork.
>>
>> This will stop it being recorded as a patch. Unfortunately it will also
>> stop it being recorded as a comment - I don't know if that's an issue in
>> this case. Maybe we can set you up with Patchwork 2's new checks
>> infrastructure instead.
>
> Nice. But unfortunately the current mailing technology we use allows
> very limited set of headers and no custom headers:
> https://cloud.google.com/appengine/docs/standard/go/mail/mail-with-headers-attachments
> So while possible, it would require very significant rework...

Ah, oh well, nevermind.

> What's the Patchwork 2's new checks infrastructure?


It's probably more a long-term thing than an immediate fix, but...
The checks API is designed to integrate reporting of CI/testing results
into Patchwork. It allows - through a REST API - an arbitrary process
(like your checking) to report success/warning/failure against a
patch. In your case you could report success = patch prevents bug, and
failure = bug still exists with patch. It's still slightly a
work-in-progress: at the moment you need an API key from a maintainer to
post checks. But it does look pretty in the web frontend:
e.g. https://patchwork.ozlabs.org/patch/871346/ - and the number of
successful/warning/failed tests shows up on the patch list page.

There's currently only one project (that I know of) out there that uses
the checks API - Snowpatch: https://github.com/ruscur/snowpatch

If, at any point in the future, you want to explore this, let me know as
I'd be *very* happy to help with the implementation and if needed push
features into Patchwork that make it easier/better.


> If it will still remain a problem (hopefully not), then maybe it's
> possible to blacklist syzbot address from creating new patches. syzbot
> can do a lot, but so far does not also generate fixes for the bugs it
> discovers :)

In immediate practical terms, that might be the easiest. They all come
from the same email address, right?

Regards,
Daniel


Re: syzcaller patch postings...

2018-02-22 Thread Daniel Axtens
Dmitry Vyukov  writes:

> On Thu, Feb 22, 2018 at 9:26 AM, Paolo Abeni  wrote:
>> On Wed, 2018-02-21 at 16:47 -0500, David Miller wrote:
>>> I have to mention this now before it gets out of control.
>>>
>>> I would like to ask that syzkaller stop posting the patch it is
>>> testing when it posts to netdev.
>>
>> There is an open issue on this topic:
>>
>> https://github.com/google/syzkaller/issues/526
>>
>> The current behaviour is that syzbot replies to all get_maintainer.pl
>> recipients after testing a patch, regardless of the test submission
>> recipient list, the idea was instead to respect such list.
>
>
> Hi David, Florian, Paolo,
>
> Didn't realize it triggers patchwork. This wasn't intentional, sorry.

A little-publicised and incorrectly-documented(!) feature of Patchwork
is that it supports some email headers. In particular, if you include an
"X-Patchwork-Hint: ignore" header, the mail will not be parsed by
Patchwork.

This will stop it being recorded as a patch. Unfortunately it will also
stop it being recorded as a comment - I don't know if that's an issue in
this case. Maybe we can set you up with Patchwork 2's new checks
infrastructure instead.

>
> Do I understand it correctly that if syzbot replies to the CC list
> that was in the testing request, it will resolve the problem? So if
> netdev wasn't in CC, it will not be added to CC.
>
> I will go and fix it now.


Re: [PATCH] net: sched: tbf: handle GSO_BY_FRAGS case in enqueue

2018-02-15 Thread Daniel Axtens
Hi Marcelo,

>
> If this block was meant to be an out-of-band/changelog comment, your
> SOB line should be above the first --- marker.
> Anyhow,
> Reviewed-by: Marcelo Ricardo Leitner 

Thanks - I did a v2 with this around the right way [1], but DaveM asked
me to be a bit more thorough and look for other GSO_BY_FRAGS issues. [2]

I did find a couple, including the one I asked you about with
qdisk_pkt_len_init [3] (thanks for your answer, btw). Currently I'm
trying to wrap my head around GSO_DODGY as Eric mentioned on that
thread, and then I'll do a series with this fix and the other
GSO_BY_FRAGS fixes I find.

Regards,
Daniel

[1] https://patchwork.ozlabs.org/patch/869145/
[2] https://patchwork.ozlabs.org/comment/1852414/
[3] https://www.spinics.net/lists/netdev/msg482397.html
>
>> ---
>>  net/sched/sch_tbf.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
>> index 229172d509cc..03225a8df973 100644
>> --- a/net/sched/sch_tbf.c
>> +++ b/net/sched/sch_tbf.c
>> @@ -188,7 +188,8 @@ static int tbf_enqueue(struct sk_buff *skb, struct Qdisc 
>> *sch,
>>  int ret;
>>  
>>  if (qdisc_pkt_len(skb) > q->max_size) {
>> -if (skb_is_gso(skb) && skb_gso_mac_seglen(skb) <= q->max_size)
>> +if (skb_is_gso(skb) &&
>> +skb_gso_validate_mac_len(skb, q->max_size))
>>  return tbf_segment(skb, sch, to_free);
>>  return qdisc_drop(skb, sch, to_free);
>>  }
>> -- 
>> 2.14.1
>> 


[PATCH 0/3] Updates to segmentation-offloads.txt

2018-02-13 Thread Daniel Axtens
I've been trying to wrap my head around GSO for a while now. This is a
set of small changes to the docs that would probably have been helpful
when I was starting out.

I realise that GSO_DODGY is still a notable omission - I'm hesitant to
write too much on it just yet as I don't understand it well and I
think it's in the process of changing.

Daniel Axtens (3):
  docs: segmentation-offloads.txt: update for UFO depreciation
  docs: segmentation-offloads.txt: Fix ref to SKB_GSO_TUNNEL_REMCSUM
  docs: segmentation-offloads.txt: add SCTP info

 Documentation/networking/segmentation-offloads.txt | 38 +++---
 1 file changed, 34 insertions(+), 4 deletions(-)

-- 
2.14.1



[PATCH 3/3] docs: segmentation-offloads.txt: add SCTP info

2018-02-13 Thread Daniel Axtens
Most of this is extracted from 90017accff61 ("sctp: Add GSO support"),
with some extra text about GSO_BY_FRAGS and the need to check for it.

Cc: Marcelo Ricardo Leitner 
Signed-off-by: Daniel Axtens 
---
 Documentation/networking/segmentation-offloads.txt | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/Documentation/networking/segmentation-offloads.txt 
b/Documentation/networking/segmentation-offloads.txt
index b247471a183c..d47480b61ac6 100644
--- a/Documentation/networking/segmentation-offloads.txt
+++ b/Documentation/networking/segmentation-offloads.txt
@@ -13,6 +13,7 @@ The following technologies are described:
  * Generic Segmentation Offload - GSO
  * Generic Receive Offload - GRO
  * Partial Generic Segmentation Offload - GSO_PARTIAL
+ * SCTP accelleration with GSO - GSO_BY_FRAGS
 
 TCP Segmentation Offload
 
@@ -132,3 +133,28 @@ values for if the header was simply duplicated.  The one 
exception to this
 is the outer IPv4 ID field.  It is up to the device drivers to guarantee
 that the IPv4 ID field is incremented in the case that a given header does
 not have the DF bit set.
+
+SCTP accelleration with GSO
+===
+
+SCTP - despite the lack of hardware support - can still take advantage of
+GSO to pass one large packet through the network stack, rather than
+multiple small packets.
+
+This requires a different approach to other offloads, as SCTP packets
+cannot be just segmented to (P)MTU. Rather, the chunks must be contained in
+IP segments, padding respected. So unlike regular GSO, SCTP can't just
+generate a big skb, set gso_size to the fragmentation point and deliver it
+to IP layer.
+
+Instead, the SCTP protocol layer builds an skb with the segments correctly
+padded and stored as chained skbs, and skb_segment() splits based on those.
+To signal this, gso_size is set to the special value GSO_BY_FRAGS.
+
+Therefore, any code in the core networking stack must be aware of the
+possibility that gso_size will be GSO_BY_FRAGS and handle that case
+appropriately. (For size checks, the skb_gso_validate_*_len family of
+helpers do this automatically.)
+
+This also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bits
+set. Note also that NETIF_F_GSO_SCTP is included in NETIF_F_GSO_SOFTWARE.
-- 
2.14.1



[PATCH 1/3] docs: segmentation-offloads.txt: update for UFO depreciation

2018-02-13 Thread Daniel Axtens
UFO is deprecated except for tuntap and packet per 0c19f846d582,
("net: accept UFO datagrams from tuntap and packet"). Update UFO
docs to reflect this.

Signed-off-by: Daniel Axtens 
---
 Documentation/networking/segmentation-offloads.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/networking/segmentation-offloads.txt 
b/Documentation/networking/segmentation-offloads.txt
index 2f09455a993a..2cda12ab7075 100644
--- a/Documentation/networking/segmentation-offloads.txt
+++ b/Documentation/networking/segmentation-offloads.txt
@@ -49,6 +49,10 @@ datagram into multiple IPv4 fragments.  Many of the 
requirements for UDP
 fragmentation offload are the same as TSO.  However the IPv4 ID for
 fragments should not increment as a single IPv4 datagram is fragmented.
 
+UFO is deprecated: modern kernels will no longer generate UFO skbs, but can
+still receive them from tuntap and similar devices. Offload of UDP-based
+tunnel protocols is still supported.
+
 IPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads
 
 
-- 
2.14.1



[PATCH 2/3] docs: segmentation-offloads.txt: Fix ref to SKB_GSO_TUNNEL_REMCSUM

2018-02-13 Thread Daniel Axtens
The doc originally called it SKB_GSO_REMCSUM. Fix it.

Fixes: f7a6272bf3cb ("Documentation: Add documentation for TSO and GSO 
features")
Signed-off-by: Daniel Axtens 
--
The only text change is s/SKB_GSO_REMCSUM/SKB_GSO_TUNNEL_REMCSUM/, but
I rewrapped the text which is why the whole para changed.

---
 Documentation/networking/segmentation-offloads.txt | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/segmentation-offloads.txt 
b/Documentation/networking/segmentation-offloads.txt
index 2cda12ab7075..b247471a183c 100644
--- a/Documentation/networking/segmentation-offloads.txt
+++ b/Documentation/networking/segmentation-offloads.txt
@@ -87,10 +87,10 @@ SKB_GSO_UDP_TUNNEL_CSUM.  These two additional tunnel types 
reflect the
 fact that the outer header also requests to have a non-zero checksum
 included in the outer header.
 
-Finally there is SKB_GSO_REMCSUM which indicates that a given tunnel header
-has requested a remote checksum offload.  In this case the inner headers
-will be left with a partial checksum and only the outer header checksum
-will be computed.
+Finally there is SKB_GSO_TUNNEL_REMCSUM which indicates that a given tunnel
+header has requested a remote checksum offload.  In this case the inner
+headers will be left with a partial checksum and only the outer header
+checksum will be computed.
 
 Generic Segmentation Offload
 
-- 
2.14.1



qdisc_pkt_len_init: SCTP/GSO_BY_FRAGS and robustness questions

2018-02-06 Thread Daniel Axtens
Hi Marcelo and Eric,

I'm working on checking code that might be impacted by GSO_BY_FRAGS -
after finding that the token bucket filter qdisc code doesn't handle it
properly, DaveM said I should look for other places where this might be
an issue [0].

I'm currently looking at qdisc_pkt_len_init in net/core/dev.c. This is
called by __dev_queue_xmit, before validate_xmit_skb, so before an SCTP
skb would be segmented if the hardware doesn't support SCTP offload.

There are two things I was hoping you two could offer some advice on:

1) Eric, in 7c68d1a6b4db ("net: qdisc_pkt_len_init() should be more
   robust") you replaced a chunk of code that is similar to the code
   found in skb_gso_transport_seglen() and replaced it with more robust
   code. Do we need to change skb_gso_transport_seglen() in a similar way?

2) Marcelo, unlike skb_gso_transport_seglen(), where you added a case
   for SCTP in 90017accff61 ("sctp: Add GSO support"), there doesn't
   seem to be a GSO_BY_FRAGS or SCTP check in qdisc_pkt_len_init, so I
   think the accounting is probably wrong for SCTP. I'm not 100% sure
   how to fix this as it's now quite different from the calcuations in
   skb_gso_transport_seglen() - so I was hoping that you might have an
   idea.

Thanks in advance!

[0]: https://patchwork.ozlabs.org/patch/869145/#1852414

Regards,
Daniel




[PATCH v2] net: sched: tbf: handle GSO_BY_FRAGS case in enqueue

2018-02-04 Thread Daniel Axtens
tbf_enqueue() checks the size of a packet before enqueuing it.
However, the GSO size check does not consider the GSO_BY_FRAGS
case, and so will drop GSO SCTP packets, causing a massive drop
in throughput.

Use skb_gso_validate_mac_len() instead, as it does consider that
case.

Signed-off-by: Daniel Axtens 

---

skb_gso_validate_mac_len() is an out-of-line call, but so is
skb_gso_mac_seglen(), so this is slower but not much slower. I
will send a patch to make the skb_gso_validate_* functions
inline-able shortly.

Also, GSO_BY_FRAGS considered harmful - I'm pretty sure this is
not the only place it causes issues.

v2: put S-o-b in the right spot, thanks Andrew Donnellan

---
 net/sched/sch_tbf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 229172d509cc..03225a8df973 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -188,7 +188,8 @@ static int tbf_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
int ret;
 
if (qdisc_pkt_len(skb) > q->max_size) {
-   if (skb_is_gso(skb) && skb_gso_mac_seglen(skb) <= q->max_size)
+   if (skb_is_gso(skb) &&
+   skb_gso_validate_mac_len(skb, q->max_size))
return tbf_segment(skb, sch, to_free);
return qdisc_drop(skb, sch, to_free);
}
-- 
2.14.1



[PATCH] net: sched: tbf: handle GSO_BY_FRAGS case in enqueue

2018-02-04 Thread Daniel Axtens
tbf_enqueue() checks the size of a packet before enqueuing it.
However, the GSO size check does not consider the GSO_BY_FRAGS
case, and so will drop GSO SCTP packets, causing a massive drop
in throughput.

Use skb_gso_validate_mac_len() instead, as it does consider that
case.

---

skb_gso_validate_mac_len() is an out-of-line call, but so is
skb_gso_mac_seglen(), so this is slower but not much slower. I
will send a patch to make the skb_gso_validate_* functions
inline-able shortly.

Also, GSO_BY_FRAGS considered harmful - I'm pretty sure this is
not the only place it causes issues.

Signed-off-by: Daniel Axtens 
---
 net/sched/sch_tbf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 229172d509cc..03225a8df973 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -188,7 +188,8 @@ static int tbf_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
int ret;
 
if (qdisc_pkt_len(skb) > q->max_size) {
-   if (skb_is_gso(skb) && skb_gso_mac_seglen(skb) <= q->max_size)
+   if (skb_is_gso(skb) &&
+   skb_gso_validate_mac_len(skb, q->max_size))
return tbf_segment(skb, sch, to_free);
return qdisc_drop(skb, sch, to_free);
}
-- 
2.14.1



RE: [PATCH v4 2/2] bnx2x: disable GSO where gso_size is too big for hardware

2018-01-31 Thread Daniel Axtens
"Chopra, Manish"  writes:

>> -Original Message-
>> From: Daniel Axtens [mailto:d...@axtens.net]
>> Sent: Wednesday, January 31, 2018 8:46 AM
>> To: netdev@vger.kernel.org
>> Cc: Daniel Axtens ; Eric Dumazet ;
>> Chopra, Manish ; Jason Wang
>> ; Pravin Shelar ; Marcelo Ricardo
>> Leitner 
>> Subject: [PATCH v4 2/2] bnx2x: disable GSO where gso_size is too big for
>> hardware
>> 
>> If a bnx2x card is passed a GSO packet with a gso_size larger than
>> ~9700 bytes, it will cause a firmware error that will bring the card
>> down:
>> 
>> bnx2x: [bnx2x_attn_int_deasserted3:4323(enP24p1s0f0)]MC assert!
>> bnx2x: [bnx2x_mc_assert:720(enP24p1s0f0)]XSTORM_ASSERT_LIST_INDEX 0x2
>> bnx2x: [bnx2x_mc_assert:736(enP24p1s0f0)]XSTORM_ASSERT_INDEX 0x0 =
>> 0x 0x25e43e47 0x00463e01 0x00010052
>> bnx2x: [bnx2x_mc_assert:750(enP24p1s0f0)]Chip Revision: everest3, FW
>> Version: 7_13_1 ... (dump of values continues) ...
>> 
>> Detect when the mac length of a GSO packet is greater than the maximum
>> packet size (9700 bytes) and disable GSO.
>> 
>> Signed-off-by: Daniel Axtens 
>> 
>> ---
>> 
>> v4: Only call the slow check if the gso_size is large.
>> Eric - I think this is what you had in mind?
>> Manish - do you think this is an acceptable performance trade-off?
>>  GSO will work for any packet size, and only jumbo frames will
>>   have to do the slower test.
>> 
>> Again, only build-tested.
>> ---
>>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 18
>> ++
>>  1 file changed, 18 insertions(+)
>> 
>> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
>> b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
>> index 7b08323e3f3d..74fc9af4aadb 100644
>> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
>> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
>> @@ -12934,6 +12934,24 @@ static netdev_features_t
>> bnx2x_features_check(struct sk_buff *skb,
>>struct net_device *dev,
>>netdev_features_t features)
>>  {
>> +/*
>> + * A skb with gso_size + header length > 9700 will cause a
>> + * firmware panic. Drop GSO support.
>> + *
>> + * Eventually the upper layer should not pass these packets down.
>> + *
>> + * For speed, if the gso_size is <= 9000, assume there will
>> + * not be 700 bytes of headers and pass it through. Only do a
>> + * full (slow) validation if the gso_size is > 9000.
>> + *
>> + * (Due to the way SKB_BY_FRAGS works this will also do a full
>> + * validation in that case.)
>> + */
>> +if (unlikely(skb_is_gso(skb) &&
>> + (skb_shinfo(skb)->gso_size > 9000) &&
>> + !skb_gso_validate_mac_len(skb, 9700)))
>> +features &= ~NETIF_F_GSO_MASK;
>
> Hi Daniel,
>
> Obviously, it could be bad from performance perspective since every gso 
> packet has to do these check.
> When running iperf/netperf performance benchmark, where GSO is likely to 
> occur.
>
> Why do you have to put two checks for skb_is_gso() and gso_size ? Isn't 
> gso_size > anything means GSO skb ?

Yes, the check is redundant. I do it to make my logic clearer.

The compiler will optimise it into one check. I compiled with gcc-7.2
and gcc-4.8, both targeting amd64, and in both cases I could only find
one relevant cmp:

skb_is_gso():
/home/dja/dev/linux/linux/include/linux/skbuff.h:4032
3686:   48 8b 8f d0 00 00 00mov0xd0(%rdi),%rcx
bnx2x_features_check():
/home/dja/dev/linux/linux/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c:12950
368d:   66 81 7c 01 04 28 23cmpw   $0x2328,0x4(%rcx,%rax,1)
3694:   0f 87 ea 01 00 00   ja 3884 

0x2328 is decimal 9000.

> I assume it won't cause disabling the offload if "headers [L2 + L3 + L4] + 
> gso_size" is still <= 9700. ?

That is correct. The flow is:

If the gso_size is less than or equal to 9000, offload is enabled with
no further checks. This is the fast path.

If the gso_size is greater than 9000, then the "headers [L2 + L3 + L4] +
gso_size" is checked. This is an out-of-line check done once per GSO skb.

If headers + gso_size is <= 9700, offload is enabled.

If headers + gso_size > 9700, offload is disabled.

Regards,
Daniel
>
> Thanks,
> Manish



[PATCH v4 2/2] bnx2x: disable GSO where gso_size is too big for hardware

2018-01-30 Thread Daniel Axtens
If a bnx2x card is passed a GSO packet with a gso_size larger than
~9700 bytes, it will cause a firmware error that will bring the card
down:

bnx2x: [bnx2x_attn_int_deasserted3:4323(enP24p1s0f0)]MC assert!
bnx2x: [bnx2x_mc_assert:720(enP24p1s0f0)]XSTORM_ASSERT_LIST_INDEX 0x2
bnx2x: [bnx2x_mc_assert:736(enP24p1s0f0)]XSTORM_ASSERT_INDEX 0x0 = 0x 
0x25e43e47 0x00463e01 0x00010052
bnx2x: [bnx2x_mc_assert:750(enP24p1s0f0)]Chip Revision: everest3, FW Version: 
7_13_1
... (dump of values continues) ...

Detect when the mac length of a GSO packet is greater than the maximum
packet size (9700 bytes) and disable GSO.

Signed-off-by: Daniel Axtens 

---

v4: Only call the slow check if the gso_size is large.
Eric - I think this is what you had in mind?
Manish - do you think this is an acceptable performance trade-off?
 GSO will work for any packet size, and only jumbo frames will
 have to do the slower test.

Again, only build-tested.
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 7b08323e3f3d..74fc9af4aadb 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -12934,6 +12934,24 @@ static netdev_features_t bnx2x_features_check(struct 
sk_buff *skb,
  struct net_device *dev,
  netdev_features_t features)
 {
+   /*
+* A skb with gso_size + header length > 9700 will cause a
+* firmware panic. Drop GSO support.
+*
+* Eventually the upper layer should not pass these packets down.
+*
+* For speed, if the gso_size is <= 9000, assume there will
+* not be 700 bytes of headers and pass it through. Only do a
+* full (slow) validation if the gso_size is > 9000.
+*
+* (Due to the way SKB_BY_FRAGS works this will also do a full
+* validation in that case.)
+*/
+   if (unlikely(skb_is_gso(skb) &&
+(skb_shinfo(skb)->gso_size > 9000) &&
+!skb_gso_validate_mac_len(skb, 9700)))
+   features &= ~NETIF_F_GSO_MASK;
+
features = vlan_features_check(skb, features);
return vxlan_features_check(skb, features);
 }
-- 
2.14.1



[PATCH v4 0/2] bnx2x: disable GSO on too-large packets

2018-01-30 Thread Daniel Axtens
We observed a case where a packet received on an ibmveth device had a
GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
device, where it caused a firmware assert. This is described in detail
at [0].

Ultimately we want a fix in the core, but that is very tricky to
backport. So for now, just stop the bnx2x driver from crashing.

When net-next re-opens I will send the fix to the core and a revert
for this.

v4 changes:
  - fix compilation error with EXPORTs (patch 1)
  - only do slow test if gso_size is greater than 9000 bytes (patch 2)

Thanks,
Daniel
   
[0]: https://patchwork.ozlabs.org/patch/859410/

Cc: Eric Dumazet 
Cc: manish.cho...@cavium.com
Cc: Jason Wang 
Cc: Pravin Shelar 
Cc: Marcelo Ricardo Leitner 

Daniel Axtens (2):
  net: create skb_gso_validate_mac_len()
  bnx2x: disable GSO where gso_size is too big for hardware

 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 18 +++
 include/linux/skbuff.h   | 16 ++
 net/core/skbuff.c| 63 +++-
 net/sched/sch_tbf.c  | 10 
 4 files changed, 84 insertions(+), 23 deletions(-)

-- 
2.14.1



[PATCH v4 1/2] net: create skb_gso_validate_mac_len()

2018-01-30 Thread Daniel Axtens
If you take a GSO skb, and split it into packets, will the MAC
length (L2 + L3 + L4 headers + payload) of those packets be small
enough to fit within a given length?

Move skb_gso_mac_seglen() to skbuff.h with other related functions
like skb_gso_network_seglen() so we can use it, and then create
skb_gso_validate_mac_len to do the full calculation.

Signed-off-by: Daniel Axtens 

---

v4: fix mistake in EXPORT name
---
 include/linux/skbuff.h | 16 +
 net/core/skbuff.c  | 63 +++---
 net/sched/sch_tbf.c| 10 
 3 files changed, 66 insertions(+), 23 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b8e0da6c27d6..242d6773c7c2 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3287,6 +3287,7 @@ int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, 
int shiftlen);
 void skb_scrub_packet(struct sk_buff *skb, bool xnet);
 unsigned int skb_gso_transport_seglen(const struct sk_buff *skb);
 bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu);
+bool skb_gso_validate_mac_len(const struct sk_buff *skb, unsigned int len);
 struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features);
 struct sk_buff *skb_vlan_untag(struct sk_buff *skb);
 int skb_ensure_writable(struct sk_buff *skb, int write_len);
@@ -4120,6 +4121,21 @@ static inline unsigned int skb_gso_network_seglen(const 
struct sk_buff *skb)
return hdr_len + skb_gso_transport_seglen(skb);
 }
 
+/**
+ * skb_gso_mac_seglen - Return length of individual segments of a gso packet
+ *
+ * @skb: GSO skb
+ *
+ * skb_gso_mac_seglen is used to determine the real size of the
+ * individual segments, including MAC/L2, Layer3 (IP, IPv6) and L4
+ * headers (TCP/UDP).
+ */
+static inline unsigned int skb_gso_mac_seglen(const struct sk_buff *skb)
+{
+   unsigned int hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
+   return hdr_len + skb_gso_transport_seglen(skb);
+}
+
 /* Local Checksum Offload.
  * Compute outer checksum based on the assumption that the
  * inner checksum will be offloaded later.
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 01e8285aea73..8c61c27c1b28 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4914,37 +4914,74 @@ unsigned int skb_gso_transport_seglen(const struct 
sk_buff *skb)
 EXPORT_SYMBOL_GPL(skb_gso_transport_seglen);
 
 /**
- * skb_gso_validate_mtu - Return in case such skb fits a given MTU
+ * skb_gso_size_check - check the skb size, considering GSO_BY_FRAGS
  *
- * @skb: GSO skb
- * @mtu: MTU to validate against
+ * There are a couple of instances where we have a GSO skb, and we
+ * want to determine what size it would be after it is segmented.
  *
- * skb_gso_validate_mtu validates if a given skb will fit a wanted MTU
- * once split.
+ * We might want to check:
+ * -L3+L4+payload size (e.g. IP forwarding)
+ * - L2+L3+L4+payload size (e.g. sanity check before passing to driver)
+ *
+ * This is a helper to do that correctly considering GSO_BY_FRAGS.
+ *
+ * @seg_len: The segmented length (from skb_gso_*_seglen). In the
+ *   GSO_BY_FRAGS case this will be [header sizes + GSO_BY_FRAGS].
+ *
+ * @max_len: The maximum permissible length.
+ *
+ * Returns true if the segmented length <= max length.
  */
-bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu)
-{
+static inline bool skb_gso_size_check(const struct sk_buff *skb,
+ unsigned int seg_len,
+ unsigned int max_len) {
const struct skb_shared_info *shinfo = skb_shinfo(skb);
const struct sk_buff *iter;
-   unsigned int hlen;
-
-   hlen = skb_gso_network_seglen(skb);
 
if (shinfo->gso_size != GSO_BY_FRAGS)
-   return hlen <= mtu;
+   return seg_len <= max_len;
 
/* Undo this so we can re-use header sizes */
-   hlen -= GSO_BY_FRAGS;
+   seg_len -= GSO_BY_FRAGS;
 
skb_walk_frags(skb, iter) {
-   if (hlen + skb_headlen(iter) > mtu)
+   if (seg_len + skb_headlen(iter) > max_len)
return false;
}
 
return true;
 }
+
+/**
+ * skb_gso_validate_mtu - Return in case such skb fits a given MTU
+ *
+ * @skb: GSO skb
+ * @mtu: MTU to validate against
+ *
+ * skb_gso_validate_mtu validates if a given skb will fit a wanted MTU
+ * once split.
+ */
+bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu)
+{
+   return skb_gso_size_check(skb, skb_gso_network_seglen(skb), mtu);
+}
 EXPORT_SYMBOL_GPL(skb_gso_validate_mtu);
 
+/**
+ * skb_gso_validate_mac_len - Will a split GSO skb fit in a given length?
+ *
+ * @skb: GSO skb
+ * @len: length to validate against
+ *
+ * skb_gso_validate_mac_len validates if a given skb will fit a wanted
+ * length once split, including L2, L3 and L4 headers and the payload.
+ */
+bool skb_gso_validate_ma

Re: [PATCH v3 1/2] net: create skb_gso_validate_mac_len()

2018-01-30 Thread Daniel Axtens
Marcelo Ricardo Leitner  writes:

> Hi,
>
> On Tue, Jan 30, 2018 at 12:14:46PM +1100, Daniel Axtens wrote:
>> If you take a GSO skb, and split it into packets, will the MAC
>> length (L2 + L3 + L4 headers + payload) of those packets be small
>> enough to fit within a given length?
>> 
>> Move skb_gso_mac_seglen() to skbuff.h with other related functions
>> like skb_gso_network_seglen() so we can use it, and then create
>> skb_gso_validate_mac_len to do the full calculation.
>> 
>> Signed-off-by: Daniel Axtens 
>> ---
>>  include/linux/skbuff.h | 16 +
>>  net/core/skbuff.c  | 65 
>> +++---
>>  net/sched/sch_tbf.c| 10 
>>  3 files changed, 67 insertions(+), 24 deletions(-)
>> 
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index b8e0da6c27d6..242d6773c7c2 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -3287,6 +3287,7 @@ int skb_shift(struct sk_buff *tgt, struct sk_buff 
>> *skb, int shiftlen);
>>  void skb_scrub_packet(struct sk_buff *skb, bool xnet);
>>  unsigned int skb_gso_transport_seglen(const struct sk_buff *skb);
>>  bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu);
>> +bool skb_gso_validate_mac_len(const struct sk_buff *skb, unsigned int len);
>>  struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t 
>> features);
>>  struct sk_buff *skb_vlan_untag(struct sk_buff *skb);
>>  int skb_ensure_writable(struct sk_buff *skb, int write_len);
>> @@ -4120,6 +4121,21 @@ static inline unsigned int 
>> skb_gso_network_seglen(const struct sk_buff *skb)
>>  return hdr_len + skb_gso_transport_seglen(skb);
>>  }
>>  
>> +/**
>> + * skb_gso_mac_seglen - Return length of individual segments of a gso packet
>> + *
>> + * @skb: GSO skb
>> + *
>> + * skb_gso_mac_seglen is used to determine the real size of the
>> + * individual segments, including MAC/L2, Layer3 (IP, IPv6) and L4
>> + * headers (TCP/UDP).
>> + */
>> +static inline unsigned int skb_gso_mac_seglen(const struct sk_buff *skb)
>> +{
>> +unsigned int hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
>> +return hdr_len + skb_gso_transport_seglen(skb);
>> +}
>> +
>>  /* Local Checksum Offload.
>>   * Compute outer checksum based on the assumption that the
>>   * inner checksum will be offloaded later.
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 01e8285aea73..55d84ab7d093 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -4914,36 +4914,73 @@ unsigned int skb_gso_transport_seglen(const struct 
>> sk_buff *skb)
>>  EXPORT_SYMBOL_GPL(skb_gso_transport_seglen);
>>  
>>  /**
>> - * skb_gso_validate_mtu - Return in case such skb fits a given MTU
>> + * skb_gso_size_check - check the skb size, considering GSO_BY_FRAGS
>>   *
>> - * @skb: GSO skb
>> - * @mtu: MTU to validate against
>> + * There are a couple of instances where we have a GSO skb, and we
>> + * want to determine what size it would be after it is segmented.
>>   *
>> - * skb_gso_validate_mtu validates if a given skb will fit a wanted MTU
>> - * once split.
>> + * We might want to check:
>> + * -L3+L4+payload size (e.g. IP forwarding)
>> + * - L2+L3+L4+payload size (e.g. sanity check before passing to driver)
>> + *
>> + * This is a helper to do that correctly considering GSO_BY_FRAGS.
>> + *
>> + * @seg_len: The segmented length (from skb_gso_*_seglen). In the
>> + *   GSO_BY_FRAGS case this will be [header sizes + GSO_BY_FRAGS].
>> + *
>> + * @max_len: The maximum permissible length.
>> + *
>> + * Returns true if the segmented length <= max length.
>>   */
>> -bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu)
>> -{
>> +static inline bool skb_gso_size_check(const struct sk_buff *skb,
>> +  unsigned int seg_len,
>> +  unsigned int max_len) {
>>  const struct skb_shared_info *shinfo = skb_shinfo(skb);
>>  const struct sk_buff *iter;
>> -unsigned int hlen;
>> -
>> -hlen = skb_gso_network_seglen(skb);
>>  
>>  if (shinfo->gso_size != GSO_BY_FRAGS)
>> -return hlen <= mtu;
>> +return seg_len <= max_len;
>>  
>>  /* Undo this so we can re-use header sizes */
>> -hlen -= GSO_BY_FRAGS;
>> +seg_len -= GSO_BY_FRAGS;
>>  
>

Re: [PATCH v3 1/2] net: create skb_gso_validate_mac_len()

2018-01-29 Thread Daniel Axtens
Hi Eric,

>> skb_gso_transport_seglen(skb) is quite expensive (out of line)
>> 
>> It is unfortunate bnx2x seems to support 9600 MTU (
>> ETH_MAX_JUMBO_PACKET_SIZE ), because 100 bytes of headers can be too
>> small in some cases.
>> 
>> Presumably we could avoid calling the function for standard MTU <=
>> 9000

Yes, it is an expensive call. I will send another version where we only
call the expensive path in the jumbo frame case. I'll wait until
tomorrow in case there is any more feedback in the next 24 hours or so.

> Also are we sure about these bnx2x crashes being limited to TSO ?
>
> Maybe it will crash the same after GSO has segmented the packet and we
> provide a big (like 10,000 bytes) packet ? I do not believe upper stack
> will prevent this.

We did test with TSO off and that was sufficient to prevent the crash.

You are right that the upper stack sends down the 10k packet, so I don't
know why it prevents the crash. But we did definitely test it! I don't
have access to the firmware details but I imagine it's an issue with the
offloaded segmentation.

Thanks for your patience with the many versions of this patch set.

Regards,
Daniel


[PATCH v3 1/2] net: create skb_gso_validate_mac_len()

2018-01-29 Thread Daniel Axtens
If you take a GSO skb, and split it into packets, will the MAC
length (L2 + L3 + L4 headers + payload) of those packets be small
enough to fit within a given length?

Move skb_gso_mac_seglen() to skbuff.h with other related functions
like skb_gso_network_seglen() so we can use it, and then create
skb_gso_validate_mac_len to do the full calculation.

Signed-off-by: Daniel Axtens 
---
 include/linux/skbuff.h | 16 +
 net/core/skbuff.c  | 65 +++---
 net/sched/sch_tbf.c| 10 
 3 files changed, 67 insertions(+), 24 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b8e0da6c27d6..242d6773c7c2 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3287,6 +3287,7 @@ int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, 
int shiftlen);
 void skb_scrub_packet(struct sk_buff *skb, bool xnet);
 unsigned int skb_gso_transport_seglen(const struct sk_buff *skb);
 bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu);
+bool skb_gso_validate_mac_len(const struct sk_buff *skb, unsigned int len);
 struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features);
 struct sk_buff *skb_vlan_untag(struct sk_buff *skb);
 int skb_ensure_writable(struct sk_buff *skb, int write_len);
@@ -4120,6 +4121,21 @@ static inline unsigned int skb_gso_network_seglen(const 
struct sk_buff *skb)
return hdr_len + skb_gso_transport_seglen(skb);
 }
 
+/**
+ * skb_gso_mac_seglen - Return length of individual segments of a gso packet
+ *
+ * @skb: GSO skb
+ *
+ * skb_gso_mac_seglen is used to determine the real size of the
+ * individual segments, including MAC/L2, Layer3 (IP, IPv6) and L4
+ * headers (TCP/UDP).
+ */
+static inline unsigned int skb_gso_mac_seglen(const struct sk_buff *skb)
+{
+   unsigned int hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
+   return hdr_len + skb_gso_transport_seglen(skb);
+}
+
 /* Local Checksum Offload.
  * Compute outer checksum based on the assumption that the
  * inner checksum will be offloaded later.
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 01e8285aea73..55d84ab7d093 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4914,36 +4914,73 @@ unsigned int skb_gso_transport_seglen(const struct 
sk_buff *skb)
 EXPORT_SYMBOL_GPL(skb_gso_transport_seglen);
 
 /**
- * skb_gso_validate_mtu - Return in case such skb fits a given MTU
+ * skb_gso_size_check - check the skb size, considering GSO_BY_FRAGS
  *
- * @skb: GSO skb
- * @mtu: MTU to validate against
+ * There are a couple of instances where we have a GSO skb, and we
+ * want to determine what size it would be after it is segmented.
  *
- * skb_gso_validate_mtu validates if a given skb will fit a wanted MTU
- * once split.
+ * We might want to check:
+ * -L3+L4+payload size (e.g. IP forwarding)
+ * - L2+L3+L4+payload size (e.g. sanity check before passing to driver)
+ *
+ * This is a helper to do that correctly considering GSO_BY_FRAGS.
+ *
+ * @seg_len: The segmented length (from skb_gso_*_seglen). In the
+ *   GSO_BY_FRAGS case this will be [header sizes + GSO_BY_FRAGS].
+ *
+ * @max_len: The maximum permissible length.
+ *
+ * Returns true if the segmented length <= max length.
  */
-bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu)
-{
+static inline bool skb_gso_size_check(const struct sk_buff *skb,
+ unsigned int seg_len,
+ unsigned int max_len) {
const struct skb_shared_info *shinfo = skb_shinfo(skb);
const struct sk_buff *iter;
-   unsigned int hlen;
-
-   hlen = skb_gso_network_seglen(skb);
 
if (shinfo->gso_size != GSO_BY_FRAGS)
-   return hlen <= mtu;
+   return seg_len <= max_len;
 
/* Undo this so we can re-use header sizes */
-   hlen -= GSO_BY_FRAGS;
+   seg_len -= GSO_BY_FRAGS;
 
skb_walk_frags(skb, iter) {
-   if (hlen + skb_headlen(iter) > mtu)
+   if (seg_len + skb_headlen(iter) > max_len)
return false;
}
 
return true;
 }
-EXPORT_SYMBOL_GPL(skb_gso_validate_mtu);
+
+/**
+ * skb_gso_validate_mtu - Return in case such skb fits a given MTU
+ *
+ * @skb: GSO skb
+ * @mtu: MTU to validate against
+ *
+ * skb_gso_validate_mtu validates if a given skb will fit a wanted MTU
+ * once split.
+ */
+bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu)
+{
+   return skb_gso_size_check(skb, skb_gso_network_seglen(skb), mtu);
+}
+EXPORT_SYMBOL_GPL(skb_gso_validate_network_len);
+
+/**
+ * skb_gso_validate_mac_len - Will a split GSO skb fit in a given length?
+ *
+ * @skb: GSO skb
+ * @len: length to validate against
+ *
+ * skb_gso_validate_mac_len validates if a given skb will fit a wanted
+ * length once split, including L2, L3 and L4 headers and the payload.
+ */
+bool skb_gso_vali

[PATCH v3 0/2] bnx2x: disable GSO on too-large packets

2018-01-29 Thread Daniel Axtens
We observed a case where a packet received on an ibmveth device had a
GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
device, where it caused a firmware assert. This is described in detail
at [0].

Ultimately we want a fix in the core, but that is very tricky to
backport. So for now, just stop the bnx2x driver from crashing.

When net-next re-opens I will send the fix to the core and a revert
for this.

Marcelo: I have left renaming skb_gso_validate_mtu() for the
next series.

Thanks,
Daniel
   
[0]: https://patchwork.ozlabs.org/patch/859410/

Cc: manish.cho...@cavium.com
Cc: Jason Wang 
Cc: Pravin Shelar 
Cc: Marcelo Ricardo Leitner 

Daniel Axtens (2):
  net: create skb_gso_validate_mac_len()
  bnx2x: disable GSO where gso_size is too big for hardware

 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |  9 
 include/linux/skbuff.h   | 16 ++
 net/core/skbuff.c| 65 +++-
 net/sched/sch_tbf.c  | 10 
 4 files changed, 76 insertions(+), 24 deletions(-)

-- 
2.14.1



[PATCH v3 2/2] bnx2x: disable GSO where gso_size is too big for hardware

2018-01-29 Thread Daniel Axtens
If a bnx2x card is passed a GSO packet with a gso_size larger than
~9700 bytes, it will cause a firmware error that will bring the card
down:

bnx2x: [bnx2x_attn_int_deasserted3:4323(enP24p1s0f0)]MC assert!
bnx2x: [bnx2x_mc_assert:720(enP24p1s0f0)]XSTORM_ASSERT_LIST_INDEX 0x2
bnx2x: [bnx2x_mc_assert:736(enP24p1s0f0)]XSTORM_ASSERT_INDEX 0x0 = 0x 
0x25e43e47 0x00463e01 0x00010052
bnx2x: [bnx2x_mc_assert:750(enP24p1s0f0)]Chip Revision: everest3, FW Version: 
7_13_1
... (dump of values continues) ...

Detect when the mac length of a GSO packet is greater than the maximum
packet size (9700 bytes) and disable GSO.

Signed-off-by: Daniel Axtens 
---

v3: use skb_gso_validate_mac_len to get the actual size, to allow
jumbo frames to work.

I don't have local access to a bnx2x card and sending this off to the
user who does would add about a month of round-trip time, so this
particular spin is build-tested only. (Earlier spins were tested on
real hardware.)

---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 7b08323e3f3d..fbb08e360625 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -12934,6 +12934,15 @@ static netdev_features_t bnx2x_features_check(struct 
sk_buff *skb,
  struct net_device *dev,
  netdev_features_t features)
 {
+   /*
+* A skb with gso_size + header length > 9700 will cause a
+* firmware panic. Drop GSO support.
+*
+* Eventually the upper layer should not pass these packets down.
+*/
+   if (unlikely(skb_is_gso(skb) && !skb_gso_validate_mac_len(skb, 9700)))
+   features &= ~NETIF_F_GSO_MASK;
+
features = vlan_features_check(skb, features);
return vxlan_features_check(skb, features);
 }
-- 
2.14.1



Re: [PATCH v2 0/4] Check size of packets before sending

2018-01-28 Thread Daniel Axtens
Eric Dumazet  writes:

> On Fri, 2018-01-26 at 00:44 +1100, Daniel Axtens wrote:
>> Hi Eric,
>> 
>> > May I ask which tree are you targeting ?
>> > 
>> > ( Documentation/networking/netdev-FAQ.txt )
>> 
>> I have been targeting net-next, but I haven't pulled for about two
>> weeks. I will rebase and if there are conflicts I will resend early next
>> week.
>> 
>> > Anything touching GSO is very risky and should target net-next,
>> > especially considering 4.15 is released this week end.
>> > 
>> > Are we really willing to backport this intrusive series in stable
>> > trees, or do we have a smaller fix for bnx2x ?
>> 
>> I do actually have a smaller fix for bnx2x, although it would need more work:
>> https://patchwork.ozlabs.org/patch/859410/
>> 
>> It leaves open the possibility of too-large packets causing issues on
>> other drivers. DaveM wasn't a fan: 
>> https://patchwork.ozlabs.org/patch/859410/#1839429
>
> Yes, I know he prefers a generic solution, but I am pragmatic here.
> Old kernels are very far from current GSO stack in net-next.
>
> Backporting all the dependencies is going to be very boring/risky.

OK, so how about:

 - first, a series that introduces skb_gso_validate_mac_len and uses it
   in bnx2x. This should be backportable without difficulty.

 - then, a series that wires skb_gso_validate_mac_len into the core -
   validate_xmit_skb for instance, and reverts the bnx2x fix. This would
   not need to be backported.

If that's an approach we can agree on, I am ready to send it when
net-next opens again.

Regards,
Daniel


Re: [PATCH v2 0/4] Check size of packets before sending

2018-01-25 Thread Daniel Axtens
Hi Eric,

> May I ask which tree are you targeting ?
>
> ( Documentation/networking/netdev-FAQ.txt )

I have been targeting net-next, but I haven't pulled for about two
weeks. I will rebase and if there are conflicts I will resend early next
week.

> Anything touching GSO is very risky and should target net-next,
> especially considering 4.15 is released this week end.
>
> Are we really willing to backport this intrusive series in stable
> trees, or do we have a smaller fix for bnx2x ?

I do actually have a smaller fix for bnx2x, although it would need more work:
https://patchwork.ozlabs.org/patch/859410/

It leaves open the possibility of too-large packets causing issues on
other drivers. DaveM wasn't a fan: 
https://patchwork.ozlabs.org/patch/859410/#1839429

Regards,
Daniel
>
> Thanks.


[PATCH v2 4/4] net: check the size of a packet in validate_xmit_skb

2018-01-24 Thread Daniel Axtens
There are a number of paths where an oversize skb could be sent to
a driver. The driver should not be required to check for this - the
core layer should do it instead.

Add a check to validate_xmit_skb that checks both GSO and non-GSO
packets and drops them if they are too large.

Signed-off-by: Daniel Axtens 
---
 net/core/dev.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 6c96c26aadbf..f09eece2cd21 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1830,13 +1830,11 @@ static inline void net_timestamp_set(struct sk_buff 
*skb)
__net_timestamp(SKB);   \
}   \
 
-bool is_skb_forwardable(const struct net_device *dev, const struct sk_buff 
*skb)
+static inline bool skb_mac_len_fits_dev(const struct net_device *dev,
+   const struct sk_buff *skb)
 {
unsigned int len;
 
-   if (!(dev->flags & IFF_UP))
-   return false;
-
len = dev->mtu + dev->hard_header_len + VLAN_HLEN;
if (skb->len <= len)
return true;
@@ -1850,6 +1848,14 @@ bool is_skb_forwardable(const struct net_device *dev, 
const struct sk_buff *skb)
 
return false;
 }
+
+bool is_skb_forwardable(const struct net_device *dev, const struct sk_buff 
*skb)
+{
+   if (!(dev->flags & IFF_UP))
+   return false;
+
+   return skb_mac_len_fits_dev(dev, skb);
+}
 EXPORT_SYMBOL_GPL(is_skb_forwardable);
 
 int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
@@ -3081,6 +3087,9 @@ static struct sk_buff *validate_xmit_skb(struct sk_buff 
*skb, struct net_device
if (unlikely(!skb))
goto out_null;
 
+   if (unlikely(!skb_mac_len_fits_dev(dev, skb)))
+   goto out_kfree_skb;
+
if (netif_needs_gso(skb, features)) {
struct sk_buff *segs;
 
-- 
2.14.1



[PATCH v2 1/4] net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len

2018-01-24 Thread Daniel Axtens
If you take a GSO skb, and split it into packets, will the network
length (L3 headers + L4 headers + payload) of those packets be small
enough to fit within a given MTU?

skb_gso_validate_mtu gives you the answer to that question. However,
we're about to add a way to validate the MAC length of a split GSO
skb (L2+L3+L4+payload), and the names get confusing, so rename
skb_gso_validate_mtu to skb_gso_validate_network_len

Signed-off-by: Daniel Axtens 
---
 include/linux/skbuff.h  | 2 +-
 net/core/skbuff.c   | 9 +
 net/ipv4/ip_forward.c   | 2 +-
 net/ipv4/ip_output.c| 2 +-
 net/ipv4/netfilter/nf_flow_table_ipv4.c | 2 +-
 net/ipv6/ip6_output.c   | 2 +-
 net/ipv6/netfilter/nf_flow_table_ipv6.c | 2 +-
 net/mpls/af_mpls.c  | 2 +-
 net/xfrm/xfrm_device.c  | 2 +-
 9 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b8e0da6c27d6..b137c79bf88d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3286,7 +3286,7 @@ void skb_split(struct sk_buff *skb, struct sk_buff *skb1, 
const u32 len);
 int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen);
 void skb_scrub_packet(struct sk_buff *skb, bool xnet);
 unsigned int skb_gso_transport_seglen(const struct sk_buff *skb);
-bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu);
+bool skb_gso_validate_network_len(const struct sk_buff *skb, unsigned int mtu);
 struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features);
 struct sk_buff *skb_vlan_untag(struct sk_buff *skb);
 int skb_ensure_writable(struct sk_buff *skb, int write_len);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 01e8285aea73..a93e5c7aa5b2 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4914,15 +4914,16 @@ unsigned int skb_gso_transport_seglen(const struct 
sk_buff *skb)
 EXPORT_SYMBOL_GPL(skb_gso_transport_seglen);
 
 /**
- * skb_gso_validate_mtu - Return in case such skb fits a given MTU
+ * skb_gso_validate_network_len - Will a split GSO skb fit into a given MTU?
  *
  * @skb: GSO skb
  * @mtu: MTU to validate against
  *
- * skb_gso_validate_mtu validates if a given skb will fit a wanted MTU
- * once split.
+ * skb_gso_validate_network_len validates if a given skb will fit a
+ * wanted MTU once split. It considers L3 headers, L4 headers, and the
+ * payload.
  */
-bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu)
+bool skb_gso_validate_network_len(const struct sk_buff *skb, unsigned int mtu)
 {
const struct skb_shared_info *shinfo = skb_shinfo(skb);
const struct sk_buff *iter;
diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
index 2dd21c3281a1..b54b948b0596 100644
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -55,7 +55,7 @@ static bool ip_exceeds_mtu(const struct sk_buff *skb, 
unsigned int mtu)
if (skb->ignore_df)
return false;
 
-   if (skb_is_gso(skb) && skb_gso_validate_mtu(skb, mtu))
+   if (skb_is_gso(skb) && skb_gso_validate_network_len(skb, mtu))
return false;
 
return true;
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index e8e675be60ec..66340ab750e6 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -248,7 +248,7 @@ static int ip_finish_output_gso(struct net *net, struct 
sock *sk,
 
/* common case: seglen is <= mtu
 */
-   if (skb_gso_validate_mtu(skb, mtu))
+   if (skb_gso_validate_network_len(skb, mtu))
return ip_finish_output2(net, sk, skb);
 
/* Slowpath -  GSO segment length exceeds the egress MTU.
diff --git a/net/ipv4/netfilter/nf_flow_table_ipv4.c 
b/net/ipv4/netfilter/nf_flow_table_ipv4.c
index b2d01eb25f2c..cdf2625dc277 100644
--- a/net/ipv4/netfilter/nf_flow_table_ipv4.c
+++ b/net/ipv4/netfilter/nf_flow_table_ipv4.c
@@ -185,7 +185,7 @@ static bool __nf_flow_exceeds_mtu(const struct sk_buff 
*skb, unsigned int mtu)
if ((ip_hdr(skb)->frag_off & htons(IP_DF)) == 0)
return false;
 
-   if (skb_is_gso(skb) && skb_gso_validate_mtu(skb, mtu))
+   if (skb_is_gso(skb) && skb_gso_validate_network_len(skb, mtu))
return false;
 
return true;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 18547a44bdaf..4e888328d4dd 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -412,7 +412,7 @@ static bool ip6_pkt_too_big(const struct sk_buff *skb, 
unsigned int mtu)
if (skb->ignore_df)
return false;
 
-   if (skb_is_gso(skb) && skb_gso_validate_mtu(skb, mtu))
+   if (skb_is_gso(skb) && skb_gso_validate_network_len(skb, mtu))
return false;
 
return true;
diff --git a/net/ipv6/netfilter/nf_flow_table_ipv6.c 
b/net/ipv6/netfilter/nf_flo

[PATCH v2 2/4] net: move skb_gso_mac_seglen to skbuff.h

2018-01-24 Thread Daniel Axtens
We're about to use this elsewhere, so move it into the header with
the other related functions like skb_gso_network_seglen().

Signed-off-by: Daniel Axtens 
---
 include/linux/skbuff.h | 15 +++
 net/sched/sch_tbf.c| 10 --
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b137c79bf88d..4b3ca6a5ec0a 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4120,6 +4120,21 @@ static inline unsigned int skb_gso_network_seglen(const 
struct sk_buff *skb)
return hdr_len + skb_gso_transport_seglen(skb);
 }
 
+/**
+ * skb_gso_mac_seglen - Return length of individual segments of a gso packet
+ *
+ * @skb: GSO skb
+ *
+ * skb_gso_mac_seglen is used to determine the real size of the
+ * individual segments, including MAC/L2, Layer3 (IP, IPv6) and L4
+ * headers (TCP/UDP).
+ */
+static inline unsigned int skb_gso_mac_seglen(const struct sk_buff *skb)
+{
+   unsigned int hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
+   return hdr_len + skb_gso_transport_seglen(skb);
+}
+
 /* Local Checksum Offload.
  * Compute outer checksum based on the assumption that the
  * inner checksum will be offloaded later.
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 83e76d046993..229172d509cc 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -142,16 +142,6 @@ static u64 psched_ns_t2l(const struct psched_ratecfg *r,
return len;
 }
 
-/*
- * Return length of individual segments of a gso packet,
- * including all headers (MAC, IP, TCP/UDP)
- */
-static unsigned int skb_gso_mac_seglen(const struct sk_buff *skb)
-{
-   unsigned int hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
-   return hdr_len + skb_gso_transport_seglen(skb);
-}
-
 /* GSO packet is too big, segment it so that tbf can transmit
  * each segment in time
  */
-- 
2.14.1



[PATCH v2 3/4] net: is_skb_forwardable: check the size of GSO segments

2018-01-24 Thread Daniel Axtens
is_skb_forwardable attempts to detect if a packet is too large to
be sent to the destination device. However, this test does not
consider GSO skbs, and it is possible that a GSO skb, when
segmented, will be larger than the device can transmit.

Create skb_gso_validate_mac_len, and use that to check.

Signed-off-by: Daniel Axtens 
---
 include/linux/skbuff.h |  1 +
 net/core/dev.c |  7 +++---
 net/core/skbuff.c  | 67 +++---
 3 files changed, 57 insertions(+), 18 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4b3ca6a5ec0a..ec9c47b5a1c8 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3287,6 +3287,7 @@ int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, 
int shiftlen);
 void skb_scrub_packet(struct sk_buff *skb, bool xnet);
 unsigned int skb_gso_transport_seglen(const struct sk_buff *skb);
 bool skb_gso_validate_network_len(const struct sk_buff *skb, unsigned int mtu);
+bool skb_gso_validate_mac_len(const struct sk_buff *skb, unsigned int len);
 struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features);
 struct sk_buff *skb_vlan_untag(struct sk_buff *skb);
 int skb_ensure_writable(struct sk_buff *skb, int write_len);
diff --git a/net/core/dev.c b/net/core/dev.c
index 94435cd09072..6c96c26aadbf 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1841,11 +1841,12 @@ bool is_skb_forwardable(const struct net_device *dev, 
const struct sk_buff *skb)
if (skb->len <= len)
return true;
 
-   /* if TSO is enabled, we don't care about the length as the packet
-* could be forwarded without being segmented before
+   /*
+* if TSO is enabled, we need to check the size of the
+* segmented packets
 */
if (skb_is_gso(skb))
-   return true;
+   return skb_gso_validate_mac_len(skb, len);
 
return false;
 }
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a93e5c7aa5b2..93f66725c32d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4914,37 +4914,74 @@ unsigned int skb_gso_transport_seglen(const struct 
sk_buff *skb)
 EXPORT_SYMBOL_GPL(skb_gso_transport_seglen);
 
 /**
- * skb_gso_validate_network_len - Will a split GSO skb fit into a given MTU?
+ * skb_gso_size_check - check the skb size, considering GSO_BY_FRAGS
  *
- * @skb: GSO skb
- * @mtu: MTU to validate against
+ * There are a couple of instances where we have a GSO skb, and we
+ * want to determine what size it would be after it is segmented.
  *
- * skb_gso_validate_network_len validates if a given skb will fit a
- * wanted MTU once split. It considers L3 headers, L4 headers, and the
- * payload.
+ * We might want to check:
+ * -L3+L4+payload size (e.g. IP forwarding)
+ * - L2+L3+L4+payload size (e.g. sanity check before passing to driver)
+ *
+ * This is a helper to do that correctly considering GSO_BY_FRAGS.
+ *
+ * @seg_len: The segmented length (from skb_gso_*_seglen). In the
+ *   GSO_BY_FRAGS case this will be [header sizes + GSO_BY_FRAGS].
+ *
+ * @max_len: The maximum permissible length.
+ *
+ * Returns true if the segmented length <= max length.
  */
-bool skb_gso_validate_network_len(const struct sk_buff *skb, unsigned int mtu)
-{
+static inline bool skb_gso_size_check(const struct sk_buff *skb,
+ unsigned int seg_len,
+ unsigned int max_len) {
const struct skb_shared_info *shinfo = skb_shinfo(skb);
const struct sk_buff *iter;
-   unsigned int hlen;
-
-   hlen = skb_gso_network_seglen(skb);
 
if (shinfo->gso_size != GSO_BY_FRAGS)
-   return hlen <= mtu;
+   return seg_len <= max_len;
 
/* Undo this so we can re-use header sizes */
-   hlen -= GSO_BY_FRAGS;
+   seg_len -= GSO_BY_FRAGS;
 
skb_walk_frags(skb, iter) {
-   if (hlen + skb_headlen(iter) > mtu)
+   if (seg_len + skb_headlen(iter) > max_len)
return false;
}
 
return true;
 }
-EXPORT_SYMBOL_GPL(skb_gso_validate_mtu);
+
+/**
+ * skb_gso_validate_network_len - Does an skb fit a given MTU?
+ *
+ * @skb: GSO skb
+ * @mtu: MTU to validate against
+ *
+ * skb_gso_validate_network_len validates if a given skb will fit a
+ * wanted MTU once split. It considers L3 headers, L4 headers, and the
+ * payload.
+ */
+bool skb_gso_validate_network_len(const struct sk_buff *skb, unsigned int mtu)
+{
+   return skb_gso_size_check(skb, skb_gso_network_seglen(skb), mtu);
+}
+EXPORT_SYMBOL_GPL(skb_gso_validate_network_len);
+
+/**
+ * skb_gso_validate_mac_len - Will a split GSO skb fit in a given length?
+ *
+ * @skb: GSO skb
+ * @len: length to validate against
+ *
+ * skb_gso_validate_mac_len validates if a given skb will fit a wanted
+ * length once split, including L2, L3 and L4 headers and the payload.
+ */
+

[PATCH v2 0/4] Check size of packets before sending

2018-01-24 Thread Daniel Axtens
There are a few ways we can send packets that are too large to a
network driver.

When non-GSO packets are forwarded, we validate their size, based on
the MTU of the destination device. However, when GSO packets are
forwarded, we do not validate their size. We implicitly assume that
when they are segmented, the resultant packets will be correctly
sized.

This is not always the case.

We observed a case where a packet received on an ibmveth device had a
GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
device, where it caused a firmware assert. This is described in detail
at [0] and was the genesis of this series.

Rather than fixing this in the driver, this series fixes the
core path. It does it in 2 steps:

 1) make is_skb_forwardable check GSO packets - this catches bridges
 
 2) make validate_xmit_skb check the size of all packets, so as to
catch everything else (e.g. macvlan, tc mired, OVS)

I am a bit nervous about how this series will interact with nested
VLANs, as the existing code only allows for one VLAN_HLEN. (Previously
these packets would sail past unchecked.) But I thought it would be
prudent to get more eyes on this sooner rather than later.

Thanks,
Daniel

v1: https://www.spinics.net/lists/netdev/msg478634.html
Changes in v2:

 - improve names, thanks Marcelo Ricardo Leitner

 - add check to xmit_validate_skb; thanks to everyone who participated
   in the discussion.

 - drop extra check in Open vSwitch. Bad packets will be caught by
   validate_xmit_skb for now and we can come back and add it later if
   OVS people would like the extra logging.
   
[0]: https://patchwork.ozlabs.org/patch/859410/

Cc: Jason Wang 
Cc: Pravin Shelar 
Cc: Marcelo Ricardo Leitner 
Cc: manish.cho...@cavium.com
Cc: d...@openvswitch.org

Daniel Axtens (4):
  net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len
  net: move skb_gso_mac_seglen to skbuff.h
  net: is_skb_forwardable: check the size of GSO segments
  net: check the size of a packet in validate_xmit_skb

 include/linux/skbuff.h  | 18 -
 net/core/dev.c  | 24 
 net/core/skbuff.c   | 66 ++---
 net/ipv4/ip_forward.c   |  2 +-
 net/ipv4/ip_output.c|  2 +-
 net/ipv4/netfilter/nf_flow_table_ipv4.c |  2 +-
 net/ipv6/ip6_output.c   |  2 +-
 net/ipv6/netfilter/nf_flow_table_ipv6.c |  2 +-
 net/mpls/af_mpls.c  |  2 +-
 net/sched/sch_tbf.c | 10 -
 net/xfrm/xfrm_device.c  |  2 +-
 11 files changed, 93 insertions(+), 39 deletions(-)

-- 
2.14.1



Re: [PATCH 0/3] Check gso_size of packets when forwarding

2018-01-19 Thread Daniel Axtens
Pravin Shelar  writes:

> On Thu, Jan 18, 2018 at 5:28 PM, Daniel Axtens  wrote:
>> Pravin Shelar  writes:
>>
>>> On Thu, Jan 18, 2018 at 5:08 AM, Daniel Axtens  wrote:
>>>> Pravin Shelar  writes:
>>>>
>>>>> On Mon, Jan 15, 2018 at 6:09 PM, Daniel Axtens  wrote:
>>>>>> When regular packets are forwarded, we validate their size against the
>>>>>> MTU of the destination device. However, when GSO packets are
>>>>>> forwarded, we do not validate their size against the MTU. We
>>>>>> implicitly assume that when they are segmented, the resultant packets
>>>>>> will be correctly sized.
>>>>>>
>>>>>> This is not always the case.
>>>>>>
>>>>>> We observed a case where a packet received on an ibmveth device had a
>>>>>> GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
>>>>>> device, where it caused a firmware assert. This is described in detail
>>>>>> at [0] and was the genesis of this series. Rather than fixing it in
>>>>>> the driver, this series fixes the forwarding path.
>>>>>>
>>>>> Are there any other possible forwarding path in networking stack? TC
>>>>> is one subsystem that could forward such a packet to the bnx2x device,
>>>>> how is that handled ?
>>>>
>>>> So far I have only looked at bridges, openvswitch and macvlan. In
>>>> general, if the code uses dev_forward_skb() it should automatically be
>>>> fine as that invokes is_skb_forwardable(), which we patch.
>>>>
>>> But there are other ways to forward packets, e.g tc-mirred or bpf
>>> redirect. We need to handle all these cases rather than fixing one at
>>> a time. As Jason suggested netif_needs_gso() looks like good function
>>> to validate if a device is capable of handling given GSO packet.
>>
>> I am not entirely sure this is a better solution.
>>
>> The biggest reason I am uncomfortable with this is that if
>> netif_needs_gso() returns true, the skb will be segmented. The segment
>> sizes will be based on gso_size. Since gso_size is greater than the MTU,
>> the resulting segments will themselves be over-MTU. Those over-MTU
>> segments will then be passed to the network card. I think we should not
>> be creating over-MTU segments; we should instead be dropping the packet
>> and logging.
>>
>
> Would this oversized segment cause the firmware assert?
> If this solves the assert issue then I do not see much value in adding
> checks in fast-path just for logging.

No - I tested this (or rather, as I don't have direct access to a bnx2x
card, this was tested on my behalf): as long as the packet is not a GSO
packet, it doesn't cause the crash. So we *could* segment them, I just
think that knowingly creating invalid segments is not particularly
pleasant.

Do you think my approach in a later email which checks and drops without
logging is sufficient? It is the same GSO check, and also checks non-GSO
packets against the MTU.

Regards,
Daniel


Re: [PATCH 0/3] Check gso_size of packets when forwarding

2018-01-18 Thread Daniel Axtens
Daniel Axtens  writes:

> Pravin Shelar  writes:
>
>> On Thu, Jan 18, 2018 at 5:08 AM, Daniel Axtens  wrote:
>>> Pravin Shelar  writes:
>>>
>>>> On Mon, Jan 15, 2018 at 6:09 PM, Daniel Axtens  wrote:
>>>>> When regular packets are forwarded, we validate their size against the
>>>>> MTU of the destination device. However, when GSO packets are
>>>>> forwarded, we do not validate their size against the MTU. We
>>>>> implicitly assume that when they are segmented, the resultant packets
>>>>> will be correctly sized.
>>>>>
>>>>> This is not always the case.
>>>>>
>>>>> We observed a case where a packet received on an ibmveth device had a
>>>>> GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
>>>>> device, where it caused a firmware assert. This is described in detail
>>>>> at [0] and was the genesis of this series. Rather than fixing it in
>>>>> the driver, this series fixes the forwarding path.
>>>>>
>>>> Are there any other possible forwarding path in networking stack? TC
>>>> is one subsystem that could forward such a packet to the bnx2x device,
>>>> how is that handled ?
>>>
>>> So far I have only looked at bridges, openvswitch and macvlan. In
>>> general, if the code uses dev_forward_skb() it should automatically be
>>> fine as that invokes is_skb_forwardable(), which we patch.
>>>
>> But there are other ways to forward packets, e.g tc-mirred or bpf
>> redirect. We need to handle all these cases rather than fixing one at
>> a time. As Jason suggested netif_needs_gso() looks like good function
>> to validate if a device is capable of handling given GSO packet.
>
> I am not entirely sure this is a better solution.
>
> The biggest reason I am uncomfortable with this is that if
> netif_needs_gso() returns true, the skb will be segmented. The segment
> sizes will be based on gso_size. Since gso_size is greater than the MTU,
> the resulting segments will themselves be over-MTU. Those over-MTU
> segments will then be passed to the network card. I think we should not
> be creating over-MTU segments; we should instead be dropping the packet
> and logging.
>
> I do take the point that you and Jason are making: a more generic
> fix would  be good. I'm just not sure where to put it.

How about this? It puts the check in validate_xmit_skb(). (completely untested)

Ultimately I think we need to make a broader policy decision about who
is responsible for ensuring that packets are correctly sized - is it the
caller of dev_queue_xmit (as in bridges and openswitch) or is it lower
down the stack, such as validate_xmit_skb()? Making the caller check is
more efficient - packets created on the system are always going to fit
due to the checks in the protocol code, so rechecking is inefficient and
unnecessary. But checking later is more reliable as we don't have to
identify all forwarding paths.

Regards,
Daniel

diff --git a/net/core/dev.c b/net/core/dev.c
index 5af04be128c1..b8c3f33ece19 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1830,13 +1830,11 @@ static inline void net_timestamp_set(struct sk_buff 
*skb)
__net_timestamp(SKB);   \
}   \
 
-bool is_skb_forwardable(const struct net_device *dev, const struct sk_buff 
*skb)
+static inline bool does_skb_fit_mtu(const struct net_device *dev,
+   const struct sk_buff *skb)
 {
unsigned int len;
 
-   if (!(dev->flags & IFF_UP))
-   return false;
-
len = dev->mtu + dev->hard_header_len + VLAN_HLEN;
if (skb->len <= len)
return true;
@@ -1850,6 +1848,14 @@ bool is_skb_forwardable(const struct net_device *dev, 
const struct sk_buff *skb)
 
return false;
 }
+
+bool is_skb_forwardable(const struct net_device *dev, const struct sk_buff 
*skb)
+{
+   if (!(dev->flags & IFF_UP))
+   return false;
+
+   return does_skb_fit_mtu(dev, skb);
+}
 EXPORT_SYMBOL_GPL(is_skb_forwardable);
 
 int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
@@ -3081,6 +3087,9 @@ static struct sk_buff *validate_xmit_skb(struct sk_buff 
*skb, struct net_device
if (unlikely(!skb))
goto out_null;
 
+   if (unlikely(!does_skb_fit_mtu(dev, skb)))
+   goto out_kfree_skb;
+
if (netif_needs_gso(skb, features)) {
struct sk_buff *segs;
 



>
>
> Regards,
> Daniel


Re: [PATCH 0/3] Check gso_size of packets when forwarding

2018-01-18 Thread Daniel Axtens
Pravin Shelar  writes:

> On Thu, Jan 18, 2018 at 5:08 AM, Daniel Axtens  wrote:
>> Pravin Shelar  writes:
>>
>>> On Mon, Jan 15, 2018 at 6:09 PM, Daniel Axtens  wrote:
>>>> When regular packets are forwarded, we validate their size against the
>>>> MTU of the destination device. However, when GSO packets are
>>>> forwarded, we do not validate their size against the MTU. We
>>>> implicitly assume that when they are segmented, the resultant packets
>>>> will be correctly sized.
>>>>
>>>> This is not always the case.
>>>>
>>>> We observed a case where a packet received on an ibmveth device had a
>>>> GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
>>>> device, where it caused a firmware assert. This is described in detail
>>>> at [0] and was the genesis of this series. Rather than fixing it in
>>>> the driver, this series fixes the forwarding path.
>>>>
>>> Are there any other possible forwarding path in networking stack? TC
>>> is one subsystem that could forward such a packet to the bnx2x device,
>>> how is that handled ?
>>
>> So far I have only looked at bridges, openvswitch and macvlan. In
>> general, if the code uses dev_forward_skb() it should automatically be
>> fine as that invokes is_skb_forwardable(), which we patch.
>>
> But there are other ways to forward packets, e.g tc-mirred or bpf
> redirect. We need to handle all these cases rather than fixing one at
> a time. As Jason suggested netif_needs_gso() looks like good function
> to validate if a device is capable of handling given GSO packet.

I am not entirely sure this is a better solution.

The biggest reason I am uncomfortable with this is that if
netif_needs_gso() returns true, the skb will be segmented. The segment
sizes will be based on gso_size. Since gso_size is greater than the MTU,
the resulting segments will themselves be over-MTU. Those over-MTU
segments will then be passed to the network card. I think we should not
be creating over-MTU segments; we should instead be dropping the packet
and logging.

I do take the point that you and Jason are making: a more generic
fix would be good. I'm just not sure where to put it.


Regards,
Daniel


Re: [PATCH 0/3] Check gso_size of packets when forwarding

2018-01-18 Thread Daniel Axtens
Daniel Axtens  writes:

> Jason Wang  writes:
>
>> On 2018年01月18日 16:28, Pravin Shelar wrote:
>>> On Mon, Jan 15, 2018 at 6:09 PM, Daniel Axtens  wrote:
>>>> When regular packets are forwarded, we validate their size against the
>>>> MTU of the destination device. However, when GSO packets are
>>>> forwarded, we do not validate their size against the MTU. We
>>>> implicitly assume that when they are segmented, the resultant packets
>>>> will be correctly sized.
>>>>
>>>> This is not always the case.
>>>>
>>>> We observed a case where a packet received on an ibmveth device had a
>>>> GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
>>>> device, where it caused a firmware assert. This is described in detail
>>>> at [0] and was the genesis of this series. Rather than fixing it in
>>>> the driver, this series fixes the forwarding path.
>>>>
>>> Are there any other possible forwarding path in networking stack? TC
>>> is one subsystem that could forward such a packet to the bnx2x device,
>>> how is that handled ?
>>
>> Yes, so it looks to me we should do the check in e.g netif_needs_gso() 
>> before passing it to hardware. And bnx2x needs to set its gso_max_size 
>> correctly.
>
> I don't think gso_max_size is quite the same. If I understand
> net/ipv4/tcp.c correctly, gso_max_size is supposed to cap the total
> length of the skb, not the length of each segmented packet. The problem
> for the bnx2x card is the length of the segment, not the overall length.
>
>>
>> Btw, looks like this could be triggered from a guest which is a DOS. We 
>> need request a CVE for this.
>>
>
> You are correct about how this can be triggered: in fact it came to my
> attention because a network packet from one LPAR (PowerVM virtual
> machine) brought down the card attached to a different LPAR. It didn't
> occur to me that it was potentially a security issue. I am talking with
> the security team at Canonical regarding this.

I have requested a CVE from the Distributed Weakness Filing.

Regards,
Daniel

>
> Regards,
> Daniel
>
>> Thanks
>>
>>>
>>>> To fix this:
>>>>
>>>>   - Move a helper in patch 1.
>>>>
>>>>   - Validate GSO segment lengths in is_skb_forwardable() in the GSO
>>>> case, rather than assuming all will be well. This fixes bridges.
>>>> This is patch 2.
>>>>
>>>>   - Open vSwitch uses its own slightly specialised algorithm for
>>>> checking lengths. Wire up checking for that in patch 3.
>>>>
>>>> [0]: https://patchwork.ozlabs.org/patch/859410/
>>>>
>>>> Cc: manish.cho...@cavium.com
>>>> Cc: d...@openvswitch.org
>>>>
>>>> Daniel Axtens (3):
>>>>net: move skb_gso_mac_seglen to skbuff.h
>>>>net: is_skb_forwardable: validate length of GSO packet segments
>>>>openvswitch: drop GSO packets that are too large
>>>>
>>>>   include/linux/skbuff.h  | 16 
>>>>   net/core/dev.c  |  7 ---
>>>>   net/core/skbuff.c   | 34 ++
>>>>   net/openvswitch/vport.c | 37 ++---
>>>>   net/sched/sch_tbf.c | 10 --
>>>>   5 files changed, 84 insertions(+), 20 deletions(-)
>>>>
>>>> --
>>>> 2.14.1
>>>>


Re: [PATCH 0/3] Check gso_size of packets when forwarding

2018-01-18 Thread Daniel Axtens
Jason Wang  writes:

> On 2018年01月18日 16:28, Pravin Shelar wrote:
>> On Mon, Jan 15, 2018 at 6:09 PM, Daniel Axtens  wrote:
>>> When regular packets are forwarded, we validate their size against the
>>> MTU of the destination device. However, when GSO packets are
>>> forwarded, we do not validate their size against the MTU. We
>>> implicitly assume that when they are segmented, the resultant packets
>>> will be correctly sized.
>>>
>>> This is not always the case.
>>>
>>> We observed a case where a packet received on an ibmveth device had a
>>> GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
>>> device, where it caused a firmware assert. This is described in detail
>>> at [0] and was the genesis of this series. Rather than fixing it in
>>> the driver, this series fixes the forwarding path.
>>>
>> Are there any other possible forwarding path in networking stack? TC
>> is one subsystem that could forward such a packet to the bnx2x device,
>> how is that handled ?
>
> Yes, so it looks to me we should do the check in e.g netif_needs_gso() 
> before passing it to hardware. And bnx2x needs to set its gso_max_size 
> correctly.

I don't think gso_max_size is quite the same. If I understand
net/ipv4/tcp.c correctly, gso_max_size is supposed to cap the total
length of the skb, not the length of each segmented packet. The problem
for the bnx2x card is the length of the segment, not the overall length.

>
> Btw, looks like this could be triggered from a guest which is a DOS. We 
> need request a CVE for this.
>

You are correct about how this can be triggered: in fact it came to my
attention because a network packet from one LPAR (PowerVM virtual
machine) brought down the card attached to a different LPAR. It didn't
occur to me that it was potentially a security issue. I am talking with
the security team at Canonical regarding this.

Regards,
Daniel

> Thanks
>
>>
>>> To fix this:
>>>
>>>   - Move a helper in patch 1.
>>>
>>>   - Validate GSO segment lengths in is_skb_forwardable() in the GSO
>>> case, rather than assuming all will be well. This fixes bridges.
>>> This is patch 2.
>>>
>>>   - Open vSwitch uses its own slightly specialised algorithm for
>>> checking lengths. Wire up checking for that in patch 3.
>>>
>>> [0]: https://patchwork.ozlabs.org/patch/859410/
>>>
>>> Cc: manish.cho...@cavium.com
>>> Cc: d...@openvswitch.org
>>>
>>> Daniel Axtens (3):
>>>net: move skb_gso_mac_seglen to skbuff.h
>>>net: is_skb_forwardable: validate length of GSO packet segments
>>>openvswitch: drop GSO packets that are too large
>>>
>>>   include/linux/skbuff.h  | 16 
>>>   net/core/dev.c  |  7 ---
>>>   net/core/skbuff.c   | 34 ++
>>>   net/openvswitch/vport.c | 37 ++---
>>>   net/sched/sch_tbf.c | 10 --
>>>   5 files changed, 84 insertions(+), 20 deletions(-)
>>>
>>> --
>>> 2.14.1
>>>


Re: [PATCH 0/3] Check gso_size of packets when forwarding

2018-01-18 Thread Daniel Axtens
Pravin Shelar  writes:

> On Mon, Jan 15, 2018 at 6:09 PM, Daniel Axtens  wrote:
>> When regular packets are forwarded, we validate their size against the
>> MTU of the destination device. However, when GSO packets are
>> forwarded, we do not validate their size against the MTU. We
>> implicitly assume that when they are segmented, the resultant packets
>> will be correctly sized.
>>
>> This is not always the case.
>>
>> We observed a case where a packet received on an ibmveth device had a
>> GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
>> device, where it caused a firmware assert. This is described in detail
>> at [0] and was the genesis of this series. Rather than fixing it in
>> the driver, this series fixes the forwarding path.
>>
> Are there any other possible forwarding path in networking stack? TC
> is one subsystem that could forward such a packet to the bnx2x device,
> how is that handled ?

So far I have only looked at bridges, openvswitch and macvlan. In
general, if the code uses dev_forward_skb() it should automatically be
fine as that invokes is_skb_forwardable(), which we patch.

There is potentially a forwarding path in macvlan that doesn't pass
through dev_forward_skb() (which calls is_skb_forwardable). I did post a
patch to this earlier, but it got somewhat derailed by how you would get
such a packet in to a macvlan device and whether that should be fixed
instead. Once we have some consensus on Jason's questions and the
overall approach is solid, I'm happy to work on that again.

To answer your specific question, I'm not aware of how tc can cause a
packet to be forwarded - if you can point me to the right general area I
can have a look.

Regards,
Daniel

>
>> To fix this:
>>
>>  - Move a helper in patch 1.
>>
>>  - Validate GSO segment lengths in is_skb_forwardable() in the GSO
>>case, rather than assuming all will be well. This fixes bridges.
>>This is patch 2.
>>
>>  - Open vSwitch uses its own slightly specialised algorithm for
>>checking lengths. Wire up checking for that in patch 3.
>>
>> [0]: https://patchwork.ozlabs.org/patch/859410/
>>
>> Cc: manish.cho...@cavium.com
>> Cc: d...@openvswitch.org
>>
>> Daniel Axtens (3):
>>   net: move skb_gso_mac_seglen to skbuff.h
>>   net: is_skb_forwardable: validate length of GSO packet segments
>>   openvswitch: drop GSO packets that are too large
>>
>>  include/linux/skbuff.h  | 16 
>>  net/core/dev.c  |  7 ---
>>  net/core/skbuff.c   | 34 ++
>>  net/openvswitch/vport.c | 37 ++---
>>  net/sched/sch_tbf.c | 10 --
>>  5 files changed, 84 insertions(+), 20 deletions(-)
>>
>> --
>> 2.14.1
>>


[PATCH 3/3] openvswitch: drop GSO packets that are too large

2018-01-15 Thread Daniel Axtens
Open vSwitch attempts to detect if a packet is too large to be
sent to the destination device. However, this test does not
consider GSO packets, and it is possible that a GSO packet, when
resegmented, will be larger than the device can send.

Add detection for packets which are too large. We use
skb_gso_validate_len, reusing the length calculation in the existing
checks - see 738314a084aa
("openvswitch: use hard_header_len instead of hardcoded ETH_HLEN")
This is different from the is_skb_forwardable logic in that it only
allows for the length of a VLAN tag if one is actually present.

Signed-off-by: Daniel Axtens 
---
 net/openvswitch/vport.c | 37 ++---
 1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index b6c8524032a0..290eeaa82344 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -464,16 +464,16 @@ int ovs_vport_receive(struct vport *vport, struct sk_buff 
*skb,
return 0;
 }
 
-static unsigned int packet_length(const struct sk_buff *skb,
- struct net_device *dev)
+static unsigned int packet_length_offset(const struct sk_buff *skb,
+const struct net_device *dev)
 {
-   unsigned int length = skb->len - dev->hard_header_len;
+   unsigned int length = dev->hard_header_len;
 
if (!skb_vlan_tag_present(skb) &&
eth_type_vlan(skb->protocol))
-   length -= VLAN_HLEN;
+   length += VLAN_HLEN;
 
-   /* Don't subtract for multiple VLAN tags. Most (all?) drivers allow
+   /* Don't adjust for multiple VLAN tags. Most (all?) drivers allow
 * (ETH_LEN + VLAN_HLEN) in addition to the mtu value, but almost none
 * account for 802.1ad. e.g. is_skb_forwardable().
 */
@@ -481,6 +481,21 @@ static unsigned int packet_length(const struct sk_buff 
*skb,
return length;
 }
 
+static inline unsigned int packet_length(const struct sk_buff *skb,
+const struct net_device *dev)
+{
+   return skb->len - packet_length_offset(skb, dev);
+}
+
+static inline bool vport_gso_validate_len(const struct sk_buff *skb,
+ const struct net_device *dev,
+ unsigned int mtu)
+{
+   unsigned int len = mtu + packet_length_offset(skb, dev);
+   
+   return skb_gso_validate_len(skb, len);
+}
+
 void ovs_vport_send(struct vport *vport, struct sk_buff *skb, u8 mac_proto)
 {
int mtu = vport->dev->mtu;
@@ -504,13 +519,21 @@ void ovs_vport_send(struct vport *vport, struct sk_buff 
*skb, u8 mac_proto)
goto drop;
}
 
-   if (unlikely(packet_length(skb, vport->dev) > mtu &&
-!skb_is_gso(skb))) {
+   if (!skb_is_gso(skb) &&
+   unlikely(packet_length(skb, vport->dev) > mtu)) {
net_warn_ratelimited("%s: dropped over-mtu packet: %d > %d\n",
 vport->dev->name,
 packet_length(skb, vport->dev), mtu);
vport->dev->stats.tx_errors++;
goto drop;
+   } else if (skb_is_gso(skb) &&
+  unlikely(!vport_gso_validate_len(skb, vport->dev, mtu))) {
+   net_warn_ratelimited("%s: dropped over-mtu GSO packet: "
+"gso_size = %d, mtu = %d\n",
+vport->dev->name,
+skb_shinfo(skb)->gso_size, mtu);
+   vport->dev->stats.tx_errors++;
+   goto drop;
}
 
skb->dev = vport->dev;
-- 
2.14.1



[PATCH 0/3] Check gso_size of packets when forwarding

2018-01-15 Thread Daniel Axtens
When regular packets are forwarded, we validate their size against the
MTU of the destination device. However, when GSO packets are
forwarded, we do not validate their size against the MTU. We
implicitly assume that when they are segmented, the resultant packets
will be correctly sized.

This is not always the case.

We observed a case where a packet received on an ibmveth device had a
GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
device, where it caused a firmware assert. This is described in detail
at [0] and was the genesis of this series. Rather than fixing it in
the driver, this series fixes the forwarding path.

To fix this:

 - Move a helper in patch 1.

 - Validate GSO segment lengths in is_skb_forwardable() in the GSO
   case, rather than assuming all will be well. This fixes bridges.
   This is patch 2.

 - Open vSwitch uses its own slightly specialised algorithm for
   checking lengths. Wire up checking for that in patch 3.

[0]: https://patchwork.ozlabs.org/patch/859410/

Cc: manish.cho...@cavium.com
Cc: d...@openvswitch.org

Daniel Axtens (3):
  net: move skb_gso_mac_seglen to skbuff.h
  net: is_skb_forwardable: validate length of GSO packet segments
  openvswitch: drop GSO packets that are too large

 include/linux/skbuff.h  | 16 
 net/core/dev.c  |  7 ---
 net/core/skbuff.c   | 34 ++
 net/openvswitch/vport.c | 37 ++---
 net/sched/sch_tbf.c | 10 --
 5 files changed, 84 insertions(+), 20 deletions(-)

-- 
2.14.1



[PATCH 1/3] net: move skb_gso_mac_seglen to skbuff.h

2018-01-15 Thread Daniel Axtens
We're about to use this elsewhere, so move it into the header with
the other related functions like skb_gso_network_seglen().

Signed-off-by: Daniel Axtens 
---
 include/linux/skbuff.h | 15 +++
 net/sched/sch_tbf.c| 10 --
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b8e0da6c27d6..b8acafd73272 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4120,6 +4120,21 @@ static inline unsigned int skb_gso_network_seglen(const 
struct sk_buff *skb)
return hdr_len + skb_gso_transport_seglen(skb);
 }
 
+/**
+ * skb_gso_mac_seglen - Return length of individual segments of a gso packet
+ *
+ * @skb: GSO skb
+ *
+ * skb_gso_mac_seglen is used to determine the real size of the
+ * individual segments, including MAC/L2, Layer3 (IP, IPv6) and L4
+ * headers (TCP/UDP).
+ */
+static inline unsigned int skb_gso_mac_seglen(const struct sk_buff *skb)
+{
+   unsigned int hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
+   return hdr_len + skb_gso_transport_seglen(skb);
+}
+
 /* Local Checksum Offload.
  * Compute outer checksum based on the assumption that the
  * inner checksum will be offloaded later.
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 83e76d046993..229172d509cc 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -142,16 +142,6 @@ static u64 psched_ns_t2l(const struct psched_ratecfg *r,
return len;
 }
 
-/*
- * Return length of individual segments of a gso packet,
- * including all headers (MAC, IP, TCP/UDP)
- */
-static unsigned int skb_gso_mac_seglen(const struct sk_buff *skb)
-{
-   unsigned int hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
-   return hdr_len + skb_gso_transport_seglen(skb);
-}
-
 /* GSO packet is too big, segment it so that tbf can transmit
  * each segment in time
  */
-- 
2.14.1



[PATCH 2/3] net: is_skb_forwardable: validate length of GSO packet segments

2018-01-15 Thread Daniel Axtens
is_skb_forwardable attempts to detect if a packet is too large to
be sent to the destination device. However, this test does not
consider GSO packets, and it is possible that a GSO packet, when
resegmented, will be larger than the device can transmit.

Add detection for packets which will be too large by creating a
skb_gso_validate_len() routine which is similar to
skb_gso_validate_mtu(), but which considers L2 headers, and wire
it up in is_skb_forwardable().

Signed-off-by: Daniel Axtens 
---
 include/linux/skbuff.h |  1 +
 net/core/dev.c |  7 ---
 net/core/skbuff.c  | 34 ++
 3 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b8acafd73272..6a9e4b6f80a7 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3287,6 +3287,7 @@ int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, 
int shiftlen);
 void skb_scrub_packet(struct sk_buff *skb, bool xnet);
 unsigned int skb_gso_transport_seglen(const struct sk_buff *skb);
 bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu);
+bool skb_gso_validate_len(const struct sk_buff *skb, unsigned int len);
 struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features);
 struct sk_buff *skb_vlan_untag(struct sk_buff *skb);
 int skb_ensure_writable(struct sk_buff *skb, int write_len);
diff --git a/net/core/dev.c b/net/core/dev.c
index 94435cd09072..5af04be128c1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1841,11 +1841,12 @@ bool is_skb_forwardable(const struct net_device *dev, 
const struct sk_buff *skb)
if (skb->len <= len)
return true;
 
-   /* if TSO is enabled, we don't care about the length as the packet
-* could be forwarded without being segmented before
+   /*
+* if TSO is enabled, we need to check the size of the
+* segmented packets
 */
if (skb_is_gso(skb))
-   return true;
+   return skb_gso_validate_len(skb, len);
 
return false;
 }
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 01e8285aea73..aefe049e4b0c 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4945,6 +4945,40 @@ bool skb_gso_validate_mtu(const struct sk_buff *skb, 
unsigned int mtu)
 }
 EXPORT_SYMBOL_GPL(skb_gso_validate_mtu);
 
+/**
+ * skb_gso_validate_len - Will a split GSO skb fit in a given length?
+ *
+ * @skb: GSO skb
+ * @len: length to validate against
+ *
+ * skb_gso_validate_len validates if a given skb will fit a wanted
+ * length once split, including L2, L3 and L4 headers.
+ *
+ * Similar to skb_gso_validate_mtu, but inclusive of L2 headers.
+ */
+bool skb_gso_validate_len(const struct sk_buff *skb, unsigned int len)
+{
+   const struct skb_shared_info *shinfo = skb_shinfo(skb);
+   const struct sk_buff *iter;
+   unsigned int hlen;
+
+   hlen = skb_gso_mac_seglen(skb);
+
+   if (shinfo->gso_size != GSO_BY_FRAGS)
+   return hlen <= len;
+
+   /* Undo this so we can re-use header sizes */
+   hlen -= GSO_BY_FRAGS;
+
+   skb_walk_frags(skb, iter) {
+   if (hlen + skb_headlen(iter) > len)
+   return false;
+   }
+
+   return true;
+}
+EXPORT_SYMBOL_GPL(skb_gso_validate_len);
+
 static struct sk_buff *skb_reorder_vlan_header(struct sk_buff *skb)
 {
if (skb_cow(skb, skb_headroom(skb)) < 0) {
-- 
2.14.1



Re: [PATCH v2] bnx2x: disable GSO where gso_size is too big for hardware

2018-01-15 Thread Daniel Axtens
David Miller  writes:

> From: Daniel Axtens 
> Date: Fri, 12 Jan 2018 10:59:05 +1100
>
>> If a bnx2x card is passed a GSO packet with a gso_size larger than
>> ~9700 bytes, it will cause a firmware error that will bring the card
>> down:
>> 
>> bnx2x: [bnx2x_attn_int_deasserted3:4323(enP24p1s0f0)]MC assert!
>> bnx2x: [bnx2x_mc_assert:720(enP24p1s0f0)]XSTORM_ASSERT_LIST_INDEX 0x2
>> bnx2x: [bnx2x_mc_assert:736(enP24p1s0f0)]XSTORM_ASSERT_INDEX 0x0 = 
>> 0x 0x25e43e47 0x00463e01 0x00010052
>> bnx2x: [bnx2x_mc_assert:750(enP24p1s0f0)]Chip Revision: everest3, FW 
>> Version: 7_13_1
>> ... (dump of values continues) ...
>> 
>> Detect when gso_size + header length is greater than the maximum
>> packet size (9700 bytes) and disable GSO. For simplicity and speed
>> this is approximated by comparing gso_size against 9200 and assuming
>> no-one will have more than 500 bytes of headers.
>
> What is the MTU size configured on the bnx2x device when these 9700
> byte packets are seen?
>
> If it's less than 9700, whatever is allowing your device (openvswitch,
> ibmveth, whatever) needs to be fixed.

Sure, I had an approach that checks the gso_size in is_skb_forwardable
and the equivalent openvswitch path - I'll send that.

Regards,
Daniel

>
> I don't like this at all, quite frankly.  We'll have one device now that
> has this special check, probably many others can run into this situation
> as well but they won't be used on these kinds of powerpc boxes and
> therefore nobody is going to notice.
>
> I'm not applying this without more information or better justification,
> sorry.


[PATCH v2] bnx2x: disable GSO where gso_size is too big for hardware

2018-01-11 Thread Daniel Axtens
If a bnx2x card is passed a GSO packet with a gso_size larger than
~9700 bytes, it will cause a firmware error that will bring the card
down:

bnx2x: [bnx2x_attn_int_deasserted3:4323(enP24p1s0f0)]MC assert!
bnx2x: [bnx2x_mc_assert:720(enP24p1s0f0)]XSTORM_ASSERT_LIST_INDEX 0x2
bnx2x: [bnx2x_mc_assert:736(enP24p1s0f0)]XSTORM_ASSERT_INDEX 0x0 = 0x 
0x25e43e47 0x00463e01 0x00010052
bnx2x: [bnx2x_mc_assert:750(enP24p1s0f0)]Chip Revision: everest3, FW Version: 
7_13_1
... (dump of values continues) ...

Detect when gso_size + header length is greater than the maximum
packet size (9700 bytes) and disable GSO. For simplicity and speed
this is approximated by comparing gso_size against 9200 and assuming
no-one will have more than 500 bytes of headers.

This raises the obvious question - how do we end up with a packet with
a gso_size that's greater than 9700? This has been observed on an
powerpc system when Open vSwitch is forwarding a packet from an
ibmveth device.

ibmveth is a bit special. It's the driver for communication between
virtual machines (aka 'partitions'/LPARs) running under IBM's
proprietary hypervisor on ppc machines. It allows sending very large
packets (up to 64kB) between LPARs. This involves some quite
'interesting' things: for example, when talking TCP, the MSS is stored
the checksum field (see ibmveth_rx_mss_helper() in ibmveth.c).

Normally on a box like this, there would be a Virtual I/O Server
(VIOS) partition that owns the physical network card. VIOS lets the
AIX partitions know when they're talking to a real network and that
they should drop their MSS. This works fine if VIOS owns the physical
network card.

However, in this case, a Linux partition owns the card (this is known
as a NovaLink setup). The negotiation between VIOS and AIX uses a
non-standard TCP option, so Linux has never supported that.  Instead,
Linux just supports receiving large packets. It doesn't support any
form of messaging/MSS negotiation back to other LPARs.

To get some clarity about where the large MSS was coming from, I asked
Thomas Falcon, the maintainer of ibmveth, for some background:

"In most cases, large segments are an aggregation of smaller packets
by the Virtual I/O Server (VIOS) partition and then are forwarded to
the Linux LPAR / ibmveth driver. These segments can be as large as
64KB. In this case, since the customer is using Novalink, I believe
what is happening is pretty straightforward: the large segments are
created by the AIX partition and then forwarded to the Linux
partition, ... The ibmveth driver doesn't do any aggregation itself
but just ensures the proper bits are set before sending the frame up
to avoid giving the upper layers indigestion."

It is possible to stop AIX from sending these large segments, but it
requires configuration on each LPAR. While ibmveth's behaviour is
admittedly weird, we should fix this here: it shouldn't be possible
for it to cause a firmware panic on another card.

Cc: Thomas Falcon  # ibmveth
Cc: Yuval Mintz  # bnx2x
Thanks-to: Jay Vosburgh  # veth info
Signed-off-by: Daniel Axtens 

---
v2: change to a feature check as suggested by Eric Dumazet.

---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 7b08323e3f3d..bab909b5d7a2 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -12934,6 +12934,17 @@ static netdev_features_t bnx2x_features_check(struct 
sk_buff *skb,
  struct net_device *dev,
  netdev_features_t features)
 {
+   /*
+* A skb with gso_size + header length > 9700 will cause a
+* firmware panic. Drop GSO support.
+*
+* To avoid costly calculations on all packets (and because
+* super-jumbo frames are rare), allow 500 bytes of headers
+* and just disable GSO if gso_size is greater than 9200.
+*/
+   if (unlikely(skb_is_gso(skb) && skb_shinfo(skb)->gso_size > 9200))
+   features &= ~NETIF_F_GSO_MASK;
+
features = vlan_features_check(skb, features);
return vxlan_features_check(skb, features);
 }
-- 
2.14.1



Re: [PATCH] macvlan: verify MTU before lowerdev xmit

2017-11-17 Thread Daniel Axtens
Hi Dave,

> So how exactly do the oversized packets get to the macvlan device from
> the VM in this scenerio?  That detail seems to be missing from the
> diagrams you provided earlier.  The VM and the macvlan boxes are just
> connected with a line.

Inside the VM I'm using netperf talking on an interface which the guest
believes to have a MTU of 1500.

I'm setting up the VM using libvirt - my understanding is that libvirt
creates a macvtap device in private mode, and qemu opens that tap device
and writes data from the emulated network card (I see the same behaviour
with a emulated rtl8139, e1000, and with virtio).

I think I could replicate this with any userspace program - qemu is just
the easiest for me at the moment.

Hopefully that's what you had in mind? Let me know if you wanted
different info.

Regards,
Daniel

[The gory details, in case it matters: The VM has a network adaptor with
the following XML:


  
  
  
  

]

>
> Thank you.


Re: [PATCH] macvlan: verify MTU before lowerdev xmit

2017-11-17 Thread Daniel Axtens
Hi Dave,

> This is an area where we really haven't set down some clear rules
> for behavior.
>
> If an interface has a particular MTU, it must be able to successfully
> send MTU sized packets on that link be it virtual or physical.
>
> Only a "next hop" can have a different MTU and thus drop packets.
> This requirement is absolutely necessary in order for proper
> signalling (path MTU messages) to make their way back to the sending
> host.
>
> In this VM-->macvlan case it's more like a point to point connection
> and there lacks a "next hop" to serve and the provider of proper
> signalling.
>
> This whole situation seems to be handled quite poorly in virtualized
> setups.  Allowing one end of the virtual networking "link" into the
> guest have a different MTU from the other end is a HUGE mistake.

I agree, but users do it, so I'm just trying to figure out the best way
to handle it. Currently the bridge code and openvswitch will detect when
a large packet arrives and drop the packet* - the bridge code with
is_skb_forwardable() and openvswitch with it's own approach in
vport.c. This patch tries to bring macvlan in line with those.

*except for GSO packets - they are assumed to be ok, which isn't always
 true. I am working on some patches for this - but my approach may need
 to be changed in light of what you're saying.

> There needs to be control path signalling between the guest and the
> provider of the virtual link so that they can synchronize their MTU
> settings.

We have these sorts of problems on probably every type of virtual
interface, some of which are easier to deal with than others. I'm
particularly worried about interfaces like ibmveth where the 'other end'
is usually an AIX partition on a big powerpc system. AIX tends to bring
up these interfaces with MTUs of around 64k as well. (This is what
originially got me down the rabbit hole of caring about mismatched-MTU
handling!)

> Yes this is hard, but what is happening now doesn't fly in the long
> term.

I'm at a bit of a loss about how we would do a proper fix. The only
thing that comes to mind would be for the receive routines of the
virtual network interfaces to check the size of incoming packets, but -
if I understand correctly - we would need to know what the maximum size
for a recieved packet should be.

Regards,
Daniel


Re: [PATCH] macvlan: verify MTU before lowerdev xmit

2017-11-14 Thread Daniel Axtens
Hi Shannon,

> Now that I think about this a little more, why is this not already 
> getting handled by the NETDEV_CHANGEMTU notifier?  In what case are you 
> running into this, and why is it not triggering the notifier?

My commit message was probably not super clear here - apologies for any
mangling of terminology/concepts. The changed MTU is getting propagated
from the lowerdev to the macvtap interface just fine, it's just not
getting enforced.

Hopefully the following diagrams make it clearer.

This is the current behaviour: a VM sends a packet of 1514 bytes and it
makes it through the macvtap/macvlan and lowerdev, to the destination,
even though the intermediate MTUs are lower.

*---*
|VM |
|   |
| mtu 1500  |
*---*
  |
  V 1514 byte packet
  |
*---*
|  macvtap  |
|   |
| mtu 1480  |
*---*
  |
  V 1514 byte packet
  |
*---*
| lowerdev  |
|   |
| mtu 1480  |
*---*
  |
  V 1514 byte packet
  |
*---*
|   dest|
|   |
| mtu 1500  |
*---*

My thought here is that the lowerdev should not be asked to transmit a
packet that is larger than its MTU. The patch causes the following
behaviour:


*---*
|VM |
|   |
| mtu 1500  |
*---*
  |
  V 1514 byte packet
  |
*---*
|  macvtap  |
|   |
| mtu 1480  |
*---*
  |
  | packet dropped
  X  1500 > 1480

I think this makes macvlan consistent with bridges.

>> As for the other paths, I see that the call to dev_forward_skb() already 
>> has this protection in it, but does the call to dev_queue_xmit_accel() 
>> in macvlan_start_xmit() need similar protection?

I think so. I will have a look and do a v2 if the core idea of the patch
is OK.

Regards,
Daniel

>> 
>
>
> sln
>
>
>> 
>> 
>>>   skb->dev = vlan->lowerdev;
>>>   return dev_queue_xmit(skb);
>>>   }
>>>


Re: [PATCH] ibmveth: Kernel crash LSO offload flag toggle

2017-11-14 Thread Daniel Axtens
Hi Bryant,

This looks a bit better, but...

> The following patch ensures that the bounce_buffer is not null
> prior to using it within skb_copy_from_linear_data.

How would this occur?

Looking at ibmveth.c, I see bounce_buffer being freed in ibmveth_close()
and allocated in ibmveth_open() only. If allocation fails, the whole
opening of the device fails with -ENOMEM.

It seems your test case - changing TSO - causes ibmveth_set_tso() to
cause an adaptor restart - an ibmveth_close(dev) and then an
ibmveth_open(dev). Is this happening in parallel with an out of memory
condition - is the memory allocation failing?

Alternatively, could it be the case that you're closing the device while
packets are in flight, and then trying to use a bounce_buffer that's
been freed elsewhere? Do you need to decouple memory freeing from
ibmveth_close?

> The problem can be recreated toggling on/off Large send offload.
>
> The following script when run (along with some iperf traffic recreates the
> crash within 5-10 mins or so).
>
> while true
> do
>   ethtool -k ibmveth0 | grep tcp-segmentation-offload
>   ethtool -K ibmveth0 tso off
>   ethtool -k ibmveth0 | grep tcp-segmentation-offload
>   ethtool -K ibmveth0 tso on
> done
>
> Note: This issue happens the very first time largsesend offload is
> turned off too (but the above script recreates the issue all the times)
>
> [76563.914173] Unable to handle kernel paging request for data at address 
> 0x
> [76563.914197] Faulting instruction address: 0xc0063940
> [76563.914205] Oops: Kernel access of bad area, sig: 11 [#1]
> [76563.914210] SMP NR_CPUS=2048 NUMA pSeries
> [76563.914217] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag 
> udp_diag inet_diag unix_diag af_packet_diag netlink_diag nls_utf8 isofs 
> binfmt_misc pseries_rng rtc_generic autofs4 ibmvfc scsi_transport_fc ibmvscsi 
> ibmveth
> [76563.914251] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0-34-generic 
> #53-Ubuntu
^--- yikes!

There are relevant changes to this area since 4.4:
2c42bf4b4317 ("ibmveth: check return of skb_linearize in ibmveth_start_xmit")
66aa0678efc2 ("ibmveth: Support to enable LSO/CSO for Trunk VEA.")

Does this crash occurs on a more recent kernel?

> diff --git a/drivers/net/ethernet/ibm/ibmveth.c 
> b/drivers/net/ethernet/ibm/ibmveth.c
> index f210398200ece..1d29b1649118d 100644
> --- a/drivers/net/ethernet/ibm/ibmveth.c
> +++ b/drivers/net/ethernet/ibm/ibmveth.c
> @@ -1092,8 +1092,14 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff 
> *skb,
>*/
>   if (force_bounce || (!skb_is_nonlinear(skb) &&
>   (skb->len < tx_copybreak))) {
> - skb_copy_from_linear_data(skb, adapter->bounce_buffer,
> -   skb->len);
> + if (adapter->bounce_buffer) {
> + skb_copy_from_linear_data(skb, adapter->bounce_buffer,
> +   skb->len);
> + } else {
> + adapter->tx_send_failed++;
> + netdev->stats.tx_dropped++;
> + goto out;
Finally, as I alluded to at the top of this message, isn't the
disappearance of the bounce-buffer a pretty serious issue? As I
understand it, it means your data structures are now inconsistent. Do
you need to - at the least - be more chatty here?

Regards,
Daniel
> + }
>  
>   descs[0].fields.flags_len = desc_flags | skb->len;
>   descs[0].fields.address = adapter->bounce_buffer_dma;
> -- 
> 2.13.6 (Apple Git-96)


[PATCH] macvlan: verify MTU before lowerdev xmit

2017-11-14 Thread Daniel Axtens
If a macvlan device which is not in bridge mode receives a packet,
it is sent straight to the lowerdev without checking against the
device's MTU. This also happens for multicast traffic.

Add an is_skb_forwardable() check against the lowerdev before
sending the packet out through it. I think this is the simplest
and best way to do it, and is consistent with the use of
dev_forward_skb() in the bridge path.

This is easy to replicate:
 - create a VM with a macvtap connection in private mode
 - set the lowerdev MTU to something low in the host (e.g. 1480)
 - do not set the MTU lower in the guest (e.g. keep at 1500)
 - netperf to a different host with the same high MTU
 - observe that currently, the driver will forward too-big packets
 - observe that with this patch the packets are dropped

Cc: Shannon Nelson 
Signed-off-by: Daniel Axtens 

---

After hearing Shannon's lightning talk on macvlan at netdev I
figured I'd strike while the iron is hot and get this out of my
patch queue where it has been languishing.
---
 drivers/net/macvlan.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index a178c5efd33e..8adcad6798c5 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -534,6 +534,10 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct 
net_device *dev)
}
 
 xmit_world:
+   /* verify MTU */
+   if (!is_skb_forwardable(vlan->lowerdev, skb))
+   return NET_XMIT_DROP;
+
skb->dev = vlan->lowerdev;
return dev_queue_xmit(skb);
 }
-- 
2.11.0



Re: [PATCH] ibmveth: Kernel crash LSO offload flag toggle

2017-11-13 Thread Daniel Axtens
Hi Bryant,

A few things:

1) The commit message could probably be trimmed, especially the stack
traces.

2) What exactly are you changing and why does it fix the issue? I
couldn't figure that out from the commit message.

3) Does this need a Fixes: tag?


>   }
>  
> - netdev->min_mtu = IBMVETH_MIN_MTU;
> - netdev->max_mtu = ETH_MAX_MTU;
> -

4) What does the above hunk do? It seems unrelated...


Regards,
Daniel


Re: [PATCH] bnx2x: drop packets where gso_size is too big for hardware

2017-09-17 Thread Daniel Axtens
Hi Eric,

>> +if (unlikely(skb_shinfo(skb)->gso_size + hlen > 
>> MAX_PACKET_SIZE)) {
>> +BNX2X_ERR("reported gso segment size plus headers "
>> +  "(%d + %d) > MAX_PACKET_SIZE; dropping pkt!",
>> +  skb_shinfo(skb)->gso_size, hlen);
>> +
>> +goto free_and_drop;
>> +}
>> +
>
>
> If you had this test in bnx2x_features_check(), packet could be
> segmented by core networking stack before reaching bnx2x_start_xmit() by
> clearing NETIF_F_GSO_MASK
>
> -> No drop would be involved.
>
> check i40evf_features_check() for similar logic.

So I've been experimenting with this and reading through the core
networking code. If my understanding is correct, disabling GSO will
cause the packet to be segmented, but it will be segemented into
gso_size+header length packets. So in this case (~10kB gso_size) the
resultant packets will still be too big - although at least they don't
cause a crash in that case.

We could continue with this anyway as it at least prevents the crash -
but, and I haven't been able to find a nice definitive answer to this -
are implementations of ndo_start_xmit permitted to assume that the the
skb passed in will fit within the MTU? I notice that most callers will
attempt to ensure this - for example ip_output.c, ip6_output.c and
ip_forward.c all contain calls to skb_gso_validate_mtu(). If
implementations are permitted to assume this, perhaps a fix to
openvswitch would be more appropriate?

Regards,
Daniel


Re: [PATCH] bnx2x: drop packets where gso_size is too big for hardware

2017-08-31 Thread Daniel Axtens
Eric Dumazet  writes:

> If you had this test in bnx2x_features_check(), packet could be
> segmented by core networking stack before reaching bnx2x_start_xmit() by
> clearing NETIF_F_GSO_MASK
>
> -> No drop would be involved.

Thanks for the pointer - networking code is all a bit new to me.

I'm just struggling at the moment to figure out what the right way to
calculate the length. My original patch uses gso_size + hlen, but:

 - On reflection, while this solves the immediate bug, I'm not 100% sure
   this is the right thing to be calculating

 - If it is, then we have the problem that hlen is calculated in a bunch
   of weird and wonderful ways which make it a nightmare to extract.

Yuval (or anyone else who groks the driver properly) - what's the right
test to be doing here to make sure we don't write to much data to the
card?

Regards,
Daniel




[PATCH] bnx2x: drop packets where gso_size is too big for hardware

2017-08-30 Thread Daniel Axtens
If a bnx2x card is passed a GSO packet with a gso_size larger than
~9700 bytes, it will cause a firmware error that will bring the card
down:

bnx2x: [bnx2x_attn_int_deasserted3:4323(enP24p1s0f0)]MC assert!
bnx2x: [bnx2x_mc_assert:720(enP24p1s0f0)]XSTORM_ASSERT_LIST_INDEX 0x2
bnx2x: [bnx2x_mc_assert:736(enP24p1s0f0)]XSTORM_ASSERT_INDEX 0x0 = 0x 
0x25e43e47 0x00463e01 0x00010052
bnx2x: [bnx2x_mc_assert:750(enP24p1s0f0)]Chip Revision: everest3, FW Version: 
7_13_1
... (dump of values continues) ...

Detect when gso_size + header length is greater than the maximum
packet size (9700 bytes) and drop the packet.

This raises the obvious question - how do we end up with a packet with
a gso_size that's greater than 9700? This has been observed on an
powerpc system when Open vSwitch is forwarding a packet from an
ibmveth device.

ibmveth is a bit special. It's the driver for communication between
virtual machines (aka 'partitions'/LPARs) running under IBM's
proprietary hypervisor on ppc machines. It allows sending very large
packets (up to 64kB) between LPARs. This involves some quite
'interesting' things: for example, when talking TCP, the MSS is stored
the checksum field (see ibmveth_rx_mss_helper() in ibmveth.c).

Normally on a box like this, there would be a Virtual I/O Server
(VIOS) partition that owns the physical network card. VIOS lets the
AIX partitions know when they're talking to a real network and that
they should drop their MSS. This works fine if VIOS owns the physical
network card.

However, in this case, a Linux partition owns the card (this is known
as a NovaLink setup). The negotiation between VIOS and AIX uses a
non-standard TCP option, so Linux has never supported that.  Instead,
Linux just supports receiving large packets. It doesn't support any
form of messaging/MSS negotiation back to other LPARs.

To get some clarity about where the large MSS was coming from, I asked
Thomas Falcon, the maintainer of ibmveth, for some background:

"In most cases, large segments are an aggregation of smaller packets
by the Virtual I/O Server (VIOS) partition and then are forwarded to
the Linux LPAR / ibmveth driver. These segments can be as large as
64KB. In this case, since the customer is using Novalink, I believe
what is happening is pretty straightforward: the large segments are
created by the AIX partition and then forwarded to the Linux
partition, ... The ibmveth driver doesn't do any aggregation itself
but just ensures the proper bits are set before sending the frame up
to avoid giving the upper layers indigestion."

It is possible to stop AIX from sending these large segments, but it
requires configuration on each LPAR. While ibmveth's behaviour is
admittedly weird, we should fix this here: it shouldn't be possible
for it to cause a firmware panic on another card.

Cc: Thomas Falcon  # ibmveth
Cc: Yuval Mintz  # bnx2x
Thanks-to: Jay Vosburgh  # veth info
Signed-off-by: Daniel Axtens 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h  |  2 ++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  | 33 +++-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c |  1 -
 3 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
index 352beff796ae..b36d54737d70 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -2517,4 +2517,6 @@ void bnx2x_set_rx_ts(struct bnx2x *bp, struct sk_buff 
*skb);
  */
 int bnx2x_vlan_reconfigure_vid(struct bnx2x *bp);
 
+#define MAX_PACKET_SIZE(9700)
+
 #endif /* bnx2x.h */
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 1216c1f1e052..1c5517a9348c 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -3742,6 +3742,7 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
__le16 pkt_size = 0;
struct ethhdr *eth;
u8 mac_type = UNICAST_ADDRESS;
+   unsigned int pkts_compl = 0, bytes_compl = 0;
 
 #ifdef BNX2X_STOP_ON_ERROR
if (unlikely(bp->panic))
@@ -4029,6 +4030,14 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
   skb->len, hlen, skb_headlen(skb),
   skb_shinfo(skb)->gso_size);
 
+   if (unlikely(skb_shinfo(skb)->gso_size + hlen > 
MAX_PACKET_SIZE)) {
+   BNX2X_ERR("reported gso segment size plus headers "
+ "(%d + %d) > MAX_PACKET_SIZE; dropping pkt!",
+ skb_shinfo(skb)->gso_size, hlen);
+
+   goto free_and_drop;
+   }
+
tx_start_bd->bd_flags.as_bitfield |= ETH_TX_BD_FLAGS_SW_LSO;
 
  

[PATCH] openvswitch: fix mis-ordered comment lines for ovs_skb_cb

2017-07-03 Thread Daniel Axtens
I was trying to wrap my head around meaning of mru, and realised
that the second line of the comment defining it had somehow
ended up after the line defining cutlen, leading to much confusion.

Reorder the lines to make sense.

Signed-off-by: Daniel Axtens 
---
 net/openvswitch/datapath.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index da931bdef8a7..5d8dcd88815f 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -98,8 +98,8 @@ struct datapath {
  * @input_vport: The original vport packet came in on. This value is cached
  * when a packet is received by OVS.
  * @mru: The maximum received fragement size; 0 if the packet is not
- * @cutlen: The number of bytes from the packet end to be removed.
  * fragmented.
+ * @cutlen: The number of bytes from the packet end to be removed.
  */
 struct ovs_skb_cb {
struct vport*input_vport;
-- 
2.11.0



bnx2x - bnx2x_attn_int_deasserted3 - MC assert!

2017-05-17 Thread Daniel Axtens
Hi,

I'm looking into a powerpc 64-bit little-endian system with 4
(IBM-branded) Broadcom Corporation NetXtreme II BCM57800 1/10 Gigabit
Ethernet adaptors installed. It's running 4.11 and using Open vSwitch
and bonded interfaces. We're seeing regular firmware errors coming out
of the network cards when netperf is started as soon as the interfaces
appear:

[60072.330838] bnx2x: [bnx2x_attn_int_deasserted3:4323(enP24p1s0f0)]MC assert!
[60072.330854] bnx2x: 
[bnx2x_mc_assert:720(enP24p1s0f0)]XSTORM_ASSERT_LIST_INDEX 0x2
[60072.330860] bnx2x: [bnx2x_mc_assert:736(enP24p1s0f0)]XSTORM_ASSERT_INDEX 0x0 
= 0x 0x25e43e47 0x00463e01 0x00010052
[60072.330878] bnx2x: [bnx2x_mc_assert:750(enP24p1s0f0)]Chip Revision: 
everest3, FW Version: 7_13_1
... (dump of values follows) ...

I've included the dmesg output with the dump of both crashing cards
below. The errors do not appear if netperf isn't started until a few
minutes after the interfaces come up. The errors are not new in 4.11 -
they have been reproduced in 4.8 and 4.10.

This all seems to come out of firmware rather than Linux, and I don't
have the datasheets to decode the magic numbers, so I was hoping you'd
be able to shed some light on it?

Thanks in advance!

Regards,
Daniel Axtens

== dmesg output ==
[60072.330838] bnx2x: [bnx2x_attn_int_deasserted3:4323(enP24p1s0f0)]MC assert!
[60072.330854] bnx2x: 
[bnx2x_mc_assert:720(enP24p1s0f0)]XSTORM_ASSERT_LIST_INDEX 0x2
[60072.330860] bnx2x: [bnx2x_mc_assert:736(enP24p1s0f0)]XSTORM_ASSERT_INDEX 0x0 
= 0x 0x25e43e47 0x00463e01 0x00010052
[60072.330878] bnx2x: [bnx2x_mc_assert:750(enP24p1s0f0)]Chip Revision: 
everest3, FW Version: 7_13_1
[60072.330881] bnx2x: [bnx2x_attn_int_deasserted3:4329(enP24p1s0f0)]driver 
assert
[60072.330884] bnx2x: [bnx2x_panic_dump:923(enP24p1s0f0)]begin crash dump 
-
[60072.330888] bnx2x: [bnx2x_panic_dump:933(enP24p1s0f0)]def_idx(0xe51f)  
def_att_idx(0x4)  attn_state(0x1)  spq_prod_idx(0x37) next_stats_cnt(0xe517)
[60072.330892] bnx2x: [bnx2x_panic_dump:938(enP24p1s0f0)]DSB: attn bits(0x0)  
ack(0x1)  id(0x0)  idx(0x4)
[60072.330894] bnx2x: [bnx2x_panic_dump:939(enP24p1s0f0)] def (0x0 0x0 0x0 
0x0 0x0 0x0 0x0 0xe605 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0)  igu_sb_id(0x0)  
igu_seg_id(0x1) pf_id(0x0)  vnic_id(0x0)  vf_id(0xff)  vf_valid (0x0) state(0x1)
[60072.330909] bnx2x: [bnx2x_panic_dump:990(enP24p1s0f0)]fp0: 
rx_bd_prod(0x4245)  rx_bd_cons(0x7e)  rx_comp_prod(0x76e1)  
rx_comp_cons(0x7515)  *rx_cons_sb(0x7515)
[60072.330913] bnx2x: [bnx2x_panic_dump:993(enP24p1s0f0)] rx_sge_prod(0x0)  
last_max_sge(0x0)  fp_hc_idx(0x287b)
[60072.330916] bnx2x: [bnx2x_panic_dump:1010(enP24p1s0f0)]fp0: 
tx_pkt_prod(0x211b)  tx_pkt_cons(0x211b)  tx_bd_prod(0x4278)  
tx_bd_cons(0x4277)  *tx_cons_sb(0x211b)
[60072.330920] bnx2x: [bnx2x_panic_dump:1010(enP24p1s0f0)]fp0: tx_pkt_prod(0x0) 
 tx_pkt_cons(0x0)  tx_bd_prod(0x0)  tx_bd_cons(0x0)  *tx_cons_sb(0x0)
[60072.330924] bnx2x: [bnx2x_panic_dump:1010(enP24p1s0f0)]fp0: tx_pkt_prod(0x0) 
 tx_pkt_cons(0x0)  tx_bd_prod(0x0)  tx_bd_cons(0x0)  *tx_cons_sb(0x0)
[60072.330928] bnx2x: [bnx2x_panic_dump:1021(enP24p1s0f0)] run indexes 
(0x287b 0x0)
[60072.330929] bnx2x: [bnx2x_panic_dump:1027(enP24p1s0f0)] indexes (0x0 
0x7515 0x0 0x0 0x0 0x211b 0x0 0x0)pf_id(0x0)  vf_id(0xff)  vf_valid(0x0) 
vnic_id(0x0)  same_igu_sb_1b(0x1) state(0x1)
[60072.330956] bnx2x: [bnx2x_panic_dump:990(enP24p1s0f0)]fp1: 
rx_bd_prod(0x8af8)  rx_bd_cons(0x931)  rx_comp_prod(0xa811)  
rx_comp_cons(0xa645)  *rx_cons_sb(0xa645)
[60072.330960] bnx2x: [bnx2x_panic_dump:993(enP24p1s0f0)] rx_sge_prod(0x0)  
last_max_sge(0x0)  fp_hc_idx(0x21d8)
[60072.330964] bnx2x: [bnx2x_panic_dump:1010(enP24p1s0f0)]fp1: 
tx_pkt_prod(0x1eb8)  tx_pkt_cons(0x1eb8)  tx_bd_prod(0x3dad)  
tx_bd_cons(0x3dac)  *tx_cons_sb(0x1eb8)
[60072.330968] bnx2x: [bnx2x_panic_dump:1010(enP24p1s0f0)]fp1: tx_pkt_prod(0x0) 
 tx_pkt_cons(0x0)  tx_bd_prod(0x0)  tx_bd_cons(0x0)  *tx_cons_sb(0x0)
[60072.330972] bnx2x: [bnx2x_panic_dump:1010(enP24p1s0f0)]fp1: tx_pkt_prod(0x0) 
 tx_pkt_cons(0x0)  tx_bd_prod(0x0)  tx_bd_cons(0x0)  *tx_cons_sb(0x0)
[60072.330975] bnx2x: [bnx2x_panic_dump:1021(enP24p1s0f0)] run indexes 
(0x21d8 0x0)
[60072.330976] bnx2x: [bnx2x_panic_dump:1027(enP24p1s0f0)] indexes (0x0 
0xa645 0x0 0x0 0x0 0x1eb8 0x0 0x0)pf_id(0x0)  vf_id(0xff)  vf_valid(0x0) 
vnic_id(0x0)  same_igu_sb_1b(0x1) state(0x1)
[60072.331003] bnx2x: [bnx2x_panic_dump:990(enP24p1s0f0)]fp2: 
rx_bd_prod(0x1cd5)  rx_bd_cons(0xb0e)  rx_comp_prod(0xdd30)  
rx_comp_cons(0xdb64)  *rx_cons_sb(0xdb64)
[60072.331007] bnx2x: [bnx2x_panic_dump:993(enP24p1s0f0)] rx_sge_prod(0x0)  
last_max_sge(0x0)  fp_hc_idx(0x97d0)
[60072.331011] bnx2x: [bnx2x_panic_dump:1010(enP24p1s0f0)]fp2: 
tx_pkt_prod(0x1ed6)  tx_pkt_cons(0x1ed6)  tx_bd_prod(0x3de9)  
tx_bd_cons(0x3de8)  *tx_cons_sb(0x1ed6)
[60072.331015] bnx2x: [bnx2x_panic_dump:1010(enP24p1s0f0)]fp2: tx_pkt_prod(0x0)