Re: [RFC PATCH 3/5] bnx2x: Add support for segmentation of tunnels with outer checksums

2016-08-31 Thread Alexander Duyck
On Wed, Aug 31, 2016 at 4:31 AM, Yuval Mintz  wrote:
> One question I have regarding the feature, regarding the
> partial offload compatible with ndo_features_check().
>
> Consider the following example -
> Let's assume my adapter is capable of doing outer-csum validation
> for vxlan packets, but only if inner network protocol is IPv4,
> while at the same time it's capable of doing gso offloading for any
> vxlan-encapsulated packet.

When you say validation do you mean generating an outer checksum?  If
I recall ndo_features_check applies to Tx features not Rx features.
I'll assume you mean Tx checksum offload and not validation.

> In that case, I'll publish NETIF_F_GSO_UDP_TUNNEL_CSUM as
> part of my offload capability [& encap capabilities], and in my driver's
> implementation of ndo_features_check() I'd remove the feature in case
> I'd find out SKB is vxlan whose inner network protocol is ipv6.

Right.  That sounds correct so far.

> Is there a way I could have benefited from the partial offload in that
> case? If I understand the feature correctly, if I were to publish
> UDP_TUNNEL_CSUM  under partial-gso, it would mean that stack would
> peel the feature off until it would decide there's actual need for SW GSO,
> thereby denying me the ability of utilizing the HW offload capability for 
> CSUM.
>
> Am I reading it wrong? Or does this trade-off exist?

I think you may have that correct.  Basically any feature you
advertise as partially supporting via GSO_PARTIAL ends up being tied
to the partial offload after that.  Although from what I have seen the
difference isn't really all that great between GSO_PARTIAL versus TSO.
In many cases the stack generates MSS aligned frames anyway so all
GSO_PARTIAL is really doing is unsharing the header and updating the
header fields.


RE: [RFC PATCH 3/5] bnx2x: Add support for segmentation of tunnels with outer checksums

2016-08-31 Thread Yuval Mintz
One question I have regarding the feature, regarding the
partial offload compatible with ndo_features_check().

Consider the following example -
Let's assume my adapter is capable of doing outer-csum validation
for vxlan packets, but only if inner network protocol is IPv4,
while at the same time it's capable of doing gso offloading for any
vxlan-encapsulated packet.

In that case, I'll publish NETIF_F_GSO_UDP_TUNNEL_CSUM as
part of my offload capability [& encap capabilities], and in my driver's
implementation of ndo_features_check() I'd remove the feature in case
I'd find out SKB is vxlan whose inner network protocol is ipv6.

Is there a way I could have benefited from the partial offload in that
case? If I understand the feature correctly, if I were to publish
UDP_TUNNEL_CSUM  under partial-gso, it would mean that stack would
peel the feature off until it would decide there's actual need for SW GSO,
thereby denying me the ability of utilizing the HW offload capability for CSUM.

Am I reading it wrong? Or does this trade-off exist?


Re: [RFC PATCH 3/5] bnx2x: Add support for segmentation of tunnels with outer checksums

2016-08-25 Thread Alexander Duyck
On Wed, Aug 24, 2016 at 10:06 PM, Yuval Mintz  wrote:
>> >> This patch assumes that the bnx2x hardware will ignore existing
>> >> IPv4/v6 header fields for length and checksum as well as the length
>> >> and checksum fields for outer UDP and GRE headers.
>> >>
>> >> I have no means of testing this as I do not have any bnx2x hardware
>> >> but thought I would submit it as an RFC to see if anyone out there
>> >> wants to test this and see if this does in fact enable this
>> >> functionality allowing us to to segment tunneled frames that have an outer
>> checksum.
>> >>
>> >> Signed-off-by: Alexander Duyck 
>> >
>> > So it took me some [well, a lot] time to reach this, but I've finally  
>> > gave it a try.
>> > I saw a performance boost with the partial support - Throughput for
>> > vxlan tunnels with and without udpcsum were almost identical after
>> > this, whereas without this patch the udpcsum prevented GSO and a
>> > TCP/IPv4 connection on top of it got roughly half the throughput.
>> >
>> > However, I did encounter one oddity I couldn't explain - After I've
>> > disabled tx-udp_tnl-segmentation via ethtool on the base interface,
>> > got left with:
>> >tx-gso-partial: on
>> >tx-udp_tnl-segmentation: off
>> >tx-udp_tnl-csum-segmentation: on
>> >
>> > When I ran traffic over both vxlan tunnels the one with the udpcsum
>> > was still Passing gso aggregations to base device to transmit [and the
>> > throughput was same as before], where's the tunnel without the udpcsum
>> > showed only MTU-sized packets reaching the base interface for
>> > transmission [which is what I've expected]
>> >
>> > Any idea why that happened?
>>
>> So the way they are implemented tx-udp_tnl-segmentation and tx-udp_tnl-
>> csum-segmentation are treated as two separate features.
>> The kernel currently gives them the same treatment as NETIF_F_TSO and
>> NETIF_F_TSO6.  You can disable one and the other still functions.
>>
>> Now if you disable tx-gso-partial you should expect to see tx-udp_tnl-csum-
>> segmentation be disabled because it is dependent on the partial GSO offload.
>>
>> - Alex
>
> O.k., thanks.
> Then I'll run some more testing scenarios, and assuming everything works
> fine I'll re-send this. Alex - should I place you at the 'from' field?

If you could do that, then that would be great.  I'm not working at
Mirantis anymore, but I don't believe that the account is setup to
bounce the email so it shouldn't generate any noise for Dave.  If not
you can just leave my original signed off by and add yours if you
want.

- Alex


RE: [RFC PATCH 3/5] bnx2x: Add support for segmentation of tunnels with outer checksums

2016-08-24 Thread Yuval Mintz
> >> This patch assumes that the bnx2x hardware will ignore existing
> >> IPv4/v6 header fields for length and checksum as well as the length
> >> and checksum fields for outer UDP and GRE headers.
> >>
> >> I have no means of testing this as I do not have any bnx2x hardware
> >> but thought I would submit it as an RFC to see if anyone out there
> >> wants to test this and see if this does in fact enable this
> >> functionality allowing us to to segment tunneled frames that have an outer
> checksum.
> >>
> >> Signed-off-by: Alexander Duyck 
> >
> > So it took me some [well, a lot] time to reach this, but I've finally  gave 
> > it a try.
> > I saw a performance boost with the partial support - Throughput for
> > vxlan tunnels with and without udpcsum were almost identical after
> > this, whereas without this patch the udpcsum prevented GSO and a
> > TCP/IPv4 connection on top of it got roughly half the throughput.
> >
> > However, I did encounter one oddity I couldn't explain - After I've
> > disabled tx-udp_tnl-segmentation via ethtool on the base interface,
> > got left with:
> >tx-gso-partial: on
> >tx-udp_tnl-segmentation: off
> >tx-udp_tnl-csum-segmentation: on
> >
> > When I ran traffic over both vxlan tunnels the one with the udpcsum
> > was still Passing gso aggregations to base device to transmit [and the
> > throughput was same as before], where's the tunnel without the udpcsum
> > showed only MTU-sized packets reaching the base interface for
> > transmission [which is what I've expected]
> >
> > Any idea why that happened?
> 
> So the way they are implemented tx-udp_tnl-segmentation and tx-udp_tnl-
> csum-segmentation are treated as two separate features.
> The kernel currently gives them the same treatment as NETIF_F_TSO and
> NETIF_F_TSO6.  You can disable one and the other still functions.
> 
> Now if you disable tx-gso-partial you should expect to see tx-udp_tnl-csum-
> segmentation be disabled because it is dependent on the partial GSO offload.
> 
> - Alex

O.k., thanks.
Then I'll run some more testing scenarios, and assuming everything works
fine I'll re-send this. Alex - should I place you at the 'from' field?


Re: [RFC PATCH 3/5] bnx2x: Add support for segmentation of tunnels with outer checksums

2016-08-24 Thread Alexander Duyck
On Wed, Aug 24, 2016 at 5:33 AM, Yuval Mintz  wrote:
>> This patch assumes that the bnx2x hardware will ignore existing IPv4/v6 
>> header
>> fields for length and checksum as well as the length and checksum fields for
>> outer UDP and GRE headers.
>>
>> I have no means of testing this as I do not have any bnx2x hardware but 
>> thought
>> I would submit it as an RFC to see if anyone out there wants to test this 
>> and see
>> if this does in fact enable this functionality allowing us to to segment 
>> tunneled
>> frames that have an outer checksum.
>>
>> Signed-off-by: Alexander Duyck 
>
> So it took me some [well, a lot] time to reach this, but I've finally  gave 
> it a try.
> I saw a performance boost with the partial support -
> Throughput for vxlan tunnels with and without udpcsum were almost identical
> after this, whereas without this patch the udpcsum prevented GSO and
> a TCP/IPv4 connection on top of it got roughly half the throughput.
>
> However, I did encounter one oddity I couldn't explain -
> After I've disabled tx-udp_tnl-segmentation via ethtool on the base interface,
> got left with:
>tx-gso-partial: on
>tx-udp_tnl-segmentation: off
>tx-udp_tnl-csum-segmentation: on
>
> When I ran traffic over both vxlan tunnels the one with the udpcsum was still
> Passing gso aggregations to base device to transmit [and the throughput was
> same as before], where's the tunnel without the udpcsum showed only
> MTU-sized packets reaching the base interface for transmission [which is what
> I've expected]
>
> Any idea why that happened?

So the way they are implemented tx-udp_tnl-segmentation and
tx-udp_tnl-csum-segmentation are treated as two separate features.
The kernel currently gives them the same treatment as NETIF_F_TSO and
NETIF_F_TSO6.  You can disable one and the other still functions.

Now if you disable tx-gso-partial you should expect to see
tx-udp_tnl-csum-segmentation be disabled because it is dependent on
the partial GSO offload.

- Alex


RE: [RFC PATCH 3/5] bnx2x: Add support for segmentation of tunnels with outer checksums

2016-08-24 Thread Yuval Mintz
> This patch assumes that the bnx2x hardware will ignore existing IPv4/v6 header
> fields for length and checksum as well as the length and checksum fields for
> outer UDP and GRE headers.
> 
> I have no means of testing this as I do not have any bnx2x hardware but 
> thought
> I would submit it as an RFC to see if anyone out there wants to test this and 
> see
> if this does in fact enable this functionality allowing us to to segment 
> tunneled
> frames that have an outer checksum.
> 
> Signed-off-by: Alexander Duyck 

So it took me some [well, a lot] time to reach this, but I've finally  gave it 
a try.
I saw a performance boost with the partial support -
Throughput for vxlan tunnels with and without udpcsum were almost identical
after this, whereas without this patch the udpcsum prevented GSO and 
a TCP/IPv4 connection on top of it got roughly half the throughput.

However, I did encounter one oddity I couldn't explain -
After I've disabled tx-udp_tnl-segmentation via ethtool on the base interface,
got left with:
   tx-gso-partial: on
   tx-udp_tnl-segmentation: off
   tx-udp_tnl-csum-segmentation: on

When I ran traffic over both vxlan tunnels the one with the udpcsum was still
Passing gso aggregations to base device to transmit [and the throughput was
same as before], where's the tunnel without the udpcsum showed only
MTU-sized packets reaching the base interface for transmission [which is what
I've expected]

Any idea why that happened?


[RFC PATCH 3/5] bnx2x: Add support for segmentation of tunnels with outer checksums

2016-04-19 Thread Alexander Duyck
This patch assumes that the bnx2x hardware will ignore existing IPv4/v6
header fields for length and checksum as well as the length and checksum
fields for outer UDP and GRE headers.

I have no means of testing this as I do not have any bnx2x hardware but
thought I would submit it as an RFC to see if anyone out there wants to
test this and see if this does in fact enable this functionality allowing
us to to segment tunneled frames that have an outer checksum.

Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |   13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index d465bd721146..e3459524fb0e 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -13258,14 +13258,21 @@ static int bnx2x_init_dev(struct bnx2x *bp, struct 
pci_dev *pdev,
NETIF_F_RXCSUM | NETIF_F_LRO | NETIF_F_GRO |
NETIF_F_RXHASH | NETIF_F_HW_VLAN_CTAG_TX;
if (!chip_is_e1x) {
-   dev->hw_features |= NETIF_F_GSO_GRE | NETIF_F_GSO_UDP_TUNNEL |
-   NETIF_F_GSO_IPIP | NETIF_F_GSO_SIT;
+   dev->hw_features |= NETIF_F_GSO_GRE | NETIF_F_GSO_GRE_CSUM |
+   NETIF_F_GSO_IPIP | NETIF_F_GSO_SIT |
+   NETIF_F_GSO_UDP_TUNNEL |
+   NETIF_F_GSO_UDP_TUNNEL_CSUM |
+   NETIF_F_GSO_PARTIAL;
dev->hw_enc_features =
NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM | NETIF_F_SG |
NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6 |
NETIF_F_GSO_IPIP |
NETIF_F_GSO_SIT |
-   NETIF_F_GSO_GRE | NETIF_F_GSO_UDP_TUNNEL;
+   NETIF_F_GSO_GRE | NETIF_F_GSO_GRE_CSUM |
+   NETIF_F_GSO_UDP_TUNNEL | NETIF_F_GSO_UDP_TUNNEL_CSUM |
+   NETIF_F_GSO_PARTIAL;
+   dev->gso_partial_features = NETIF_F_GSO_GRE_CSUM |
+   NETIF_F_GSO_UDP_TUNNEL_CSUM;
}
 
dev->vlan_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |