Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-09 Thread Thomas Graf
On 12/09/15 at 05:21pm, David Miller wrote:
> It is clearly the most appropriate middle layer representation.
> 
> The fact that BPF could be generated from any P4 program, yet the
> reverse is not true, tells me everything I need to know.
> 
> I'm sorry if you have either a mental or a time invenstment in P4, but
> I really don't see it as really suitable for this.

I don't. I like the approach and the effect it has on a currently
very vendor secrets oriented environment.

I won't drag this further. I'm perfectly fine if BPF is suitable for
a wide range of hardware models.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-09 Thread David Miller
From: Thomas Graf 
Date: Wed, 9 Dec 2015 23:03:39 +0100

> On 12/09/15 at 09:38am, Alexei Starovoitov wrote:
>> On Wed, Dec 09, 2015 at 01:58:57PM +0100, Thomas Graf wrote:
>> > 
>> > So if the goal is to make the intent available to the hardware in
>> > a format which both the kernel and the hardware can draw the same
>> > conclusions from, wouldn't something like P4 + BPF derived from P4
>> > be a possibly better fit? There is discussion on stateful P4
>> > processing now.
>> 
>> p4 is a high level language and absolutely not suitable for such purpose.
>> bpf as intermediate representation can be generated from p4 or C or other
>> language. There is room to innovate in the language definition on top
>> and in HW design at the bottom. That's the most flexible model.
> 
> If you don't want to discuss it, no problem. But stating that P4
> is a high level language (not sure what this means exactly since
> we exactly _want_ an abstraction away from hardware) and that it's
> not suitable for this purpose is just wrong. P4 has been created
> exactly for the purpose of expressing how a packet should be
> processed by a forwarding element independent of specific hardware.

Just because it was supposeduly designed for this purpose, doesn't
mean it's the most appropriate intermediate language between what
the kernel wants hardware to do and what actually has to happen for
the hardware to do that.

BPF is so much more universal and can cover everything we'd want
hardware to perform, and then some.

Plus it's everywhere in the kernel already, has a full validation and
test suite, full LLVM backend, plus JITs for several prominent
architectures with more on the way.

It is clearly the most appropriate middle layer representation.

The fact that BPF could be generated from any P4 program, yet the
reverse is not true, tells me everything I need to know.

I'm sorry if you have either a mental or a time invenstment in P4, but
I really don't see it as really suitable for this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-09 Thread Thomas Graf
On 12/09/15 at 09:38am, Alexei Starovoitov wrote:
> On Wed, Dec 09, 2015 at 01:58:57PM +0100, Thomas Graf wrote:
> > 
> > So if the goal is to make the intent available to the hardware in
> > a format which both the kernel and the hardware can draw the same
> > conclusions from, wouldn't something like P4 + BPF derived from P4
> > be a possibly better fit? There is discussion on stateful P4
> > processing now.
> 
> p4 is a high level language and absolutely not suitable for such purpose.
> bpf as intermediate representation can be generated from p4 or C or other
> language. There is room to innovate in the language definition on top
> and in HW design at the bottom. That's the most flexible model.

If you don't want to discuss it, no problem. But stating that P4
is a high level language (not sure what this means exactly since
we exactly _want_ an abstraction away from hardware) and that it's
not suitable for this purpose is just wrong. P4 has been created
exactly for the purpose of expressing how a packet should be
processed by a forwarding element independent of specific hardware.

There is a lot of interesting open source work coming out of that
space and I think we owe it to at least consider P4. The goal is
very much in line with what we want to achieve as Linux community
as well.

I'll wait for your proposal as you stated you are working on
something specific.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-09 Thread David Miller
From: Alexei Starovoitov 
Date: Wed, 9 Dec 2015 09:38:44 -0800

> p4 is a high level language and absolutely not suitable for such purpose.
> bpf as intermediate representation can be generated from p4 or C or other
> language. There is room to innovate in the language definition on top
> and in HW design at the bottom. That's the most flexible model.

+1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-09 Thread Alexei Starovoitov
On Wed, Dec 09, 2015 at 01:58:57PM +0100, Thomas Graf wrote:
> 
> So if the goal is to make the intent available to the hardware in
> a format which both the kernel and the hardware can draw the same
> conclusions from, wouldn't something like P4 + BPF derived from P4
> be a possibly better fit? There is discussion on stateful P4
> processing now.

p4 is a high level language and absolutely not suitable for such purpose.
bpf as intermediate representation can be generated from p4 or C or other
language. There is room to innovate in the language definition on top
and in HW design at the bottom. That's the most flexible model.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-09 Thread Thomas Graf
On 12/08/15 at 09:45pm, Alexei Starovoitov wrote:
> definetely not 1, not 2 and hardly 3.
> bpf verifier in 2k lines does full code analysis with all branches,
> memory accesses and so on, so it's not hard to understand _intent_
> of the program by any HW backend.
> I agree with John that it's not trivial to convert bpf program into
> parse graph that intel asic understands, but it's not hard either.
> fpga based nic/switch can convert a program into parallel gates.
> netronom nic can JIT it into their instruction set.
> Programmable switch asics can equally understand intent of the
> program and convert it into their firmware.
> The easiest would be arm-based nics.
> In all cases HW will not be able to convert all possible programs,
> but it's not a limitation of instruction set. That's why 1 and 2
> above doesn't really apply.
> Different explanation of the above:
> think of bpf as intermediate representation. When C or some other
> language is used to describe what dataplane suppose to do
> the compiler generates bpf==IR which is later compiled by hw specific
> backend into target. That target can be fpga, asic, npu, etc.
> Some backends will be simple and small enough to stay completely
> within kernel. Some backends (like fpga) would need to
> call_usermodehelper() or similar, since netlist compilation is
> tedious and slow process.

Trying to summarize that, the definition of a BPF program in the
context of this discussion is: a BPF program of which the driver
or firmware/NIC can understand the original intent. Unless the NIC
can JIT, this implies reverse engineering the control flow into
a declarative model.

So if the goal is to make the intent available to the hardware in
a format which both the kernel and the hardware can draw the same
conclusions from, wouldn't something like P4 + BPF derived from P4
be a possibly better fit? There is discussion on stateful P4
processing now.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-08 Thread Alexei Starovoitov
On Wed, Dec 09, 2015 at 02:40:38AM +0100, Thomas Graf wrote:
> 
> I'm still having a difficulty trying to understand what exactly
> the intended proposal around this is. You may have just answered
> my question but just to make sure: When people refer to
> implementing or interpreting BPF in hardware, do they mean:
> 
>  1) A limited BPF instruction set used as descriptive language
> to define match/action logic?
>  2) A specific (versioned) BPF instruction set which hardware
> can support?
>  3) The full BPF instruction set of the current kernel + all
> defined helper functions and tail call support?

definetely not 1, not 2 and hardly 3.
bpf verifier in 2k lines does full code analysis with all branches,
memory accesses and so on, so it's not hard to understand _intent_
of the program by any HW backend.
I agree with John that it's not trivial to convert bpf program into
parse graph that intel asic understands, but it's not hard either.
fpga based nic/switch can convert a program into parallel gates.
netronom nic can JIT it into their instruction set.
Programmable switch asics can equally understand intent of the
program and convert it into their firmware.
The easiest would be arm-based nics.
In all cases HW will not be able to convert all possible programs,
but it's not a limitation of instruction set. That's why 1 and 2
above doesn't really apply.
Different explanation of the above:
think of bpf as intermediate representation. When C or some other
language is used to describe what dataplane suppose to do
the compiler generates bpf==IR which is later compiled by hw specific
backend into target. That target can be fpga, asic, npu, etc.
Some backends will be simple and small enough to stay completely
within kernel. Some backends (like fpga) would need to
call_usermodehelper() or similar, since netlist compilation is
tedious and slow process.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-08 Thread Thomas Graf
On 12/08/15 at 10:10am, Jamal Hadi Salim wrote:
> On 15-12-08 09:23 AM, Jamal Hadi Salim wrote:
> >On 15-12-08 02:33 AM, John Fastabend wrote:
> 
> >;-> I feel a little vindicated with this discussion.
> >
> >Of course you can implement hardware using BPF!
> 
> BTW - Just to be clear; I am not arguing for what that paper
> preaches. What the paper preaches is an academic exercise
> (square hole, round peg - must fit into OF description).
> What i am saying is you can take the ebpf instruction set and
> create a cpu that executes those instructions.

I'm still having a difficulty trying to understand what exactly
the intended proposal around this is. You may have just answered
my question but just to make sure: When people refer to
implementing or interpreting BPF in hardware, do they mean:

 1) A limited BPF instruction set used as descriptive language
to define match/action logic?
 2) A specific (versioned) BPF instruction set which hardware
can support?
 3) The full BPF instruction set of the current kernel + all
defined helper functions and tail call support?

Would programs of 2) and 3) nature be simply rejected or would
the driver convert them somehow?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-08 Thread Jamal Hadi Salim

On 15-12-08 09:23 AM, Jamal Hadi Salim wrote:

On 15-12-08 02:33 AM, John Fastabend wrote:



;-> I feel a little vindicated with this discussion.

Of course you can implement hardware using BPF!


BTW - Just to be clear; I am not arguing for what that paper
preaches. What the paper preaches is an academic exercise
(square hole, round peg - must fit into OF description).
What i am saying is you can take the ebpf instruction set and
create a cpu that executes those instructions.

cheers,
jamal


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-08 Thread Jamal Hadi Salim

On 15-12-08 02:33 AM, John Fastabend wrote:

On 15-12-02 04:15 PM, Tom Herbert wrote:




Just keying off the last statement there...

I think BPF programs are going to be hard to translate into hardware
for most devices. The problem is the BPF programs in general lack
structure. A parse graph would be much more friendly for hardware or
at minimum the BPF program would need to be a some sort of
well-structured program so a driver could turn that into a parse graph.


This might be relevant:
http://richard.systems/research/pdf/IEEE_HPSR_BPF_OPENFLOW.pdf



Thanks Tom interesting read but they seem to argue for a BPF engine in
hardware which I'm still not convinced is necessary and the numbers
provided are for a 1Gbps link where 10Gpbs/100Gbps+ would be more
valuable.

I am still leaning towards a fully programmable parse graph and a set
of basic actions push/pop/set/fwd/etc. This would be useful for other
features not just checksum offloads. I guess it doesn't necessarily
exclude also having 1s complement logic though.



;-> I feel a little vindicated with this discussion.

Of course you can implement hardware using BPF! I think there is an
opportunity for someone to build such hardware, if one is not in
progress of being built yet.
A BPF hardware implementation is just a very different approach;
instead of it being a series of TCAM table hardware implementation
(and/or other  types of implementation which use DRAM etc), it becomes
CPU instructions. Surely one can cast the EBPF bytecode into an ASIC.
My disagreement with Tom is laying a stake that this is how hardware
features are to be exposed.
My disagreement with you is laying a stake in the ground that hardware
oughta be implemented using a series of Tubes^WTCAMS.
When i build a graphics card the API is not how the internal 
implementation works. Everbody conforms to the same driver APIs.

Likewise, we have Linux APIs - and switchdev
is the right direction. Write your driver to switchdev interfaces.
Let a thousand flowers bloom.

BTW: It is bordering on the abstraction-ridiculous when I see the
P4 claims to use ebpf and then somehow translate to use
classifier/actions.


cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-07 Thread John Fastabend
On 15-12-02 04:15 PM, Tom Herbert wrote:
> On Wed, Dec 2, 2015 at 3:35 PM, John Fastabend  
> wrote:
>> [...]
>>

 I wonder why we need protocol generic offloads? I know there are
 currently a lot of overlay encapsulation protocols. Are there many more
 coming?

>>> Yes, and assume that there are more coming with an unbounded limit
>>> (for instance I just noticed today that there is a netdev1.1 talk on
>>> supporting GTP in the kernel). Besides, this problem space not just
>>> limited to offload of encapsulation protocols, but how to generalize
>>> offload of any transport, IPv[46], application protocols, protocol
>>> implemented in user space, security protocols, etc.
>>>
 Besides, this offload is about TSO and RSS and they do need to parse the
 packet to get the information where the inner header starts. It is not
 only about checksum offloading.

>>> RSS does not require the device to parse the inner header. All the UDP
>>> encapsulations protocols being defined set the source port to entropy
>>> flow value and most devices already support RSS+UDP (just needs to be
>>> enabled) so this works just fine with dumb NICs. In fact, this is one
>>> of the main motivations of encapsulating UDP in the first place, to
>>> leverage existing RSS and ECMP mechanisms. The more general solution
>>> is to use IPv6 flow label (RFC6438). We need HW support to include the
>>> flow label into the hash for ECMP and RSS, but once we have that much
>>> of the motivation for using UDP goes away and we can get back to just
>>> doing GRE/IP, IPIP, MPLS/IP, etc. (hence eliminate overhead and
>>> complexity of UDP encap).
>>>
 Please provide a sketch up for a protocol generic api that can tell
 hardware where a inner protocol header starts that supports vxlan,
 vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is
 starting at that point.

>>> BPF. Implementing protocol generic offloads are not just a HW concern
>>> either, adding kernel GRO code for every possible protocol that comes
>>> along doesn't scale well. This becomes especially obvious when we
>>> consider how to provide offloads for applications protocols. If the
>>> kernel provides a programmable framework for the offloads then
>>> application protocols, such as QUIC, could use use that without
>>> needing to hack the kernel to support the specific protocol (which no
>>> one wants!). Application protocol parsing in KCM and some other use
>>> cases of BPF have already foreshadowed this, and we are working on a
>>> prototype for a BPF programmable engine in the kernel. Presumably,
>>> this same model could eventually be applied as the HW API to
>>> programmable offload.
>>
>> Just keying off the last statement there...
>>
>> I think BPF programs are going to be hard to translate into hardware
>> for most devices. The problem is the BPF programs in general lack
>> structure. A parse graph would be much more friendly for hardware or
>> at minimum the BPF program would need to be a some sort of
>> well-structured program so a driver could turn that into a parse graph.
>>
> This might be relevant:
> http://richard.systems/research/pdf/IEEE_HPSR_BPF_OPENFLOW.pdf
> 

Thanks Tom interesting read but they seem to argue for a BPF engine in
hardware which I'm still not convinced is necessary and the numbers
provided are for a 1Gbps link where 10Gpbs/100Gbps+ would be more
valuable.

I am still leaning towards a fully programmable parse graph and a set
of basic actions push/pop/set/fwd/etc. This would be useful for other
features not just checksum offloads. I guess it doesn't necessarily
exclude also having 1s complement logic though.

.John

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-07 Thread Jesse Gross
On Sun, Dec 6, 2015 at 7:02 PM, David Ahern  wrote:
> On 12/6/15 6:20 PM, Alexander Duyck wrote:
>>
>> That works for Linux to Linux, but what about the cases where you have
>> a non-Linux endpoint on the other end such as something like a Cisco
>> switch?
>
>
> Why does is matter what kind of switch the NIC is connected to?

I think Cisco was just an example, not anything particular about their
switches. But there are two general problems:

 * Some protocols, like VXLAN, recommend that the UDP checksum be zero
so this is what pretty much everyone implements. As a result,
independent of the merits of using the checksum, most non-Linux
endpoints won't support it.

* The reason why this recommendation exists in the first place is that
most ASIC based switches can't compute/verify UDP checksums. They
slice off the headers and only run that through the chip's core
memory, so the rest of the packet isn't available to compute a
checksum over.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-06 Thread David Ahern

On 12/6/15 6:20 PM, Alexander Duyck wrote:

That works for Linux to Linux, but what about the cases where you have
a non-Linux endpoint on the other end such as something like a Cisco
switch?


Why does is matter what kind of switch the NIC is connected to?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-06 Thread Alexander Duyck
On Sun, Dec 6, 2015 at 1:30 PM, Tom Herbert  wrote:
> On Sun, Dec 6, 2015 at 10:44 AM, Alexander Duyck
>  wrote:
>> On Sun, Dec 6, 2015 at 8:31 AM, Tom Herbert  wrote:
 The only spot I think you and I disagreed on was the approach.  I
 don't know if the hard push back does anything but punish the users by
 delaying the time needed to find a reasonable solution.  I really
 think if we are going to get the hardware vendors to change their
 behavior we have to create a market demand for it.  Having a bit of
 marketable data showing the folly of this approach versus the 1's
 compliment checksum would probably do more to encourage and/or shame
 them into it than simply pushing for this based on engineering
 opinion.

>>> I don't know what "marketable data" means. But I do know that we're
>>> like 70 postings into this thread, into the third patch set regarding
>>> this, yet nobody has bothered to contribute any data on what these
>>> patches do and what the quantifiable benefits are with HW offload of
>>> these protocols. I would test this stuff myself, but I don't have
>>> access to any NICs with necessary support. If someone else can start
>>> testing and providing meaningful data it would be most helpful...
>>
>> Here is an example of something kind of like what I am talking about:
>> http://www.mellanox.com/related-docs/whitepapers/CB_Intel_XL710.pdf
>>
>> I have seen evidence of the gains first hand.  The biggest gain ends
>> up being the result of GRO, and you cannot make use of GRO without
>> some form of Rx checksum offload.
>>
> Right, but we recoup the gains of GRO simply by enabling the UDP
> checksum. This works for all the UDP encapsulations, and probably
> about all NICs in deployment. You don't need protocol specific
> offloads for this. I have posted performance data many times on this,
> it is a clear win.

That works for Linux to Linux, but what about the cases where you have
a non-Linux endpoint on the other end such as something like a Cisco
switch?  That is where having the protocol specific offload is useful
as long as the hardware has sufficient capabilities to support it.

As far as trying to get the vendors to give up their protocol parsing,
it will probably never happen.  It wouldn't surprise me if it is due
to product requirements from Microsoft in order to support things like
RSS, RSC, and filtering on inner header fields.  If Linux doesn't care
about that we can drop support for it, but it still doesn't mean they
can drop those bits from the hardware design so they would likely
interpret this as a request to add a new feature instead of fixing or
replacing the existing checksum approach.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-06 Thread Tom Herbert
On Sun, Dec 6, 2015 at 10:44 AM, Alexander Duyck
 wrote:
> On Sun, Dec 6, 2015 at 8:31 AM, Tom Herbert  wrote:
>>> The only spot I think you and I disagreed on was the approach.  I
>>> don't know if the hard push back does anything but punish the users by
>>> delaying the time needed to find a reasonable solution.  I really
>>> think if we are going to get the hardware vendors to change their
>>> behavior we have to create a market demand for it.  Having a bit of
>>> marketable data showing the folly of this approach versus the 1's
>>> compliment checksum would probably do more to encourage and/or shame
>>> them into it than simply pushing for this based on engineering
>>> opinion.
>>>
>> I don't know what "marketable data" means. But I do know that we're
>> like 70 postings into this thread, into the third patch set regarding
>> this, yet nobody has bothered to contribute any data on what these
>> patches do and what the quantifiable benefits are with HW offload of
>> these protocols. I would test this stuff myself, but I don't have
>> access to any NICs with necessary support. If someone else can start
>> testing and providing meaningful data it would be most helpful...
>
> Here is an example of something kind of like what I am talking about:
> http://www.mellanox.com/related-docs/whitepapers/CB_Intel_XL710.pdf
>
> I have seen evidence of the gains first hand.  The biggest gain ends
> up being the result of GRO, and you cannot make use of GRO without
> some form of Rx checksum offload.
>
Right, but we recoup the gains of GRO simply by enabling the UDP
checksum. This works for all the UDP encapsulations, and probably
about all NICs in deployment. You don't need protocol specific
offloads for this. I have posted performance data many times on this,
it is a clear win.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-06 Thread Alexander Duyck
On Sun, Dec 6, 2015 at 8:31 AM, Tom Herbert  wrote:
>> The only spot I think you and I disagreed on was the approach.  I
>> don't know if the hard push back does anything but punish the users by
>> delaying the time needed to find a reasonable solution.  I really
>> think if we are going to get the hardware vendors to change their
>> behavior we have to create a market demand for it.  Having a bit of
>> marketable data showing the folly of this approach versus the 1's
>> compliment checksum would probably do more to encourage and/or shame
>> them into it than simply pushing for this based on engineering
>> opinion.
>>
> I don't know what "marketable data" means. But I do know that we're
> like 70 postings into this thread, into the third patch set regarding
> this, yet nobody has bothered to contribute any data on what these
> patches do and what the quantifiable benefits are with HW offload of
> these protocols. I would test this stuff myself, but I don't have
> access to any NICs with necessary support. If someone else can start
> testing and providing meaningful data it would be most helpful...

Here is an example of something kind of like what I am talking about:
http://www.mellanox.com/related-docs/whitepapers/CB_Intel_XL710.pdf

I have seen evidence of the gains first hand.  The biggest gain ends
up being the result of GRO, and you cannot make use of GRO without
some form of Rx checksum offload.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-06 Thread Tom Herbert
> The only spot I think you and I disagreed on was the approach.  I
> don't know if the hard push back does anything but punish the users by
> delaying the time needed to find a reasonable solution.  I really
> think if we are going to get the hardware vendors to change their
> behavior we have to create a market demand for it.  Having a bit of
> marketable data showing the folly of this approach versus the 1's
> compliment checksum would probably do more to encourage and/or shame
> them into it than simply pushing for this based on engineering
> opinion.
>
I don't know what "marketable data" means. But I do know that we're
like 70 postings into this thread, into the third patch set regarding
this, yet nobody has bothered to contribute any data on what these
patches do and what the quantifiable benefits are with HW offload of
these protocols. I would test this stuff myself, but I don't have
access to any NICs with necessary support. If someone else can start
testing and providing meaningful data it would be most helpful...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-05 Thread Alexander Duyck
On Sat, Dec 5, 2015 at 2:27 PM, David Miller  wrote:
> From: Alexander Duyck 
> Date: Sat, 5 Dec 2015 11:34:47 -0800
>
>> I'm only really interested in what options the customers has in order
>> to get this all configured.  As long as there eventually ends up being
>> some path forward I'll be good with whatever ends up happening, though
>> my preference would be to see some option available in the kernel.
>
> Fair enough.
>
> BTW, I don't entirely buy your hardware complexity argument.
>
> When we're looking at checksumming offload via 1's complement in the
> RX descriptor, that is heaps simpler than having a seperate RTL path
> for N different encapsulation technologies and/or protocols.

I fully agree with you.  Basically what I had mentioned is some of the
explanation I was given from the hardware engineers when they pushed
back on my request.

> I'd rather maintain a single 1's complement circuit than N header
> parser engines that trigger the sum at the right range.

Yes, but the thing is the port number and parsers are also needed for
other things like RSS.  You also have to take into account there are
also requirements placed on the vendors by other organizations such as
Microsoft that end up impacting the final design.  As such there are
some parts of the design we cannot convince the hardware vendors to
give up.

> There is definitely long term maintainability and stability value in
> this.

Agreed.  That is why I fully support the request to add the feature,
it is just a matter of convincing the vendors to do so.

The only spot I think you and I disagreed on was the approach.  I
don't know if the hard push back does anything but punish the users by
delaying the time needed to find a reasonable solution.  I really
think if we are going to get the hardware vendors to change their
behavior we have to create a market demand for it.  Having a bit of
marketable data showing the folly of this approach versus the 1's
compliment checksum would probably do more to encourage and/or shame
them into it than simply pushing for this based on engineering
opinion.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-05 Thread David Miller
From: Alexander Duyck 
Date: Sat, 5 Dec 2015 11:34:47 -0800

> I'm only really interested in what options the customers has in order
> to get this all configured.  As long as there eventually ends up being
> some path forward I'll be good with whatever ends up happening, though
> my preference would be to see some option available in the kernel.

Fair enough.

BTW, I don't entirely buy your hardware complexity argument.

When we're looking at checksumming offload via 1's complement in the
RX descriptor, that is heaps simpler than having a seperate RTL path
for N different encapsulation technologies and/or protocols.

I'd rather maintain a single 1's complement circuit than N header
parser engines that trigger the sum at the right range.

There is definitely long term maintainability and stability value in
this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-05 Thread Alexander Duyck
On Sat, Dec 5, 2015 at 10:03 AM, David Miller  wrote:
> From: Alexander Duyck 
> Date: Sat, 5 Dec 2015 00:24:55 -0800
>
>> Still, hard blocking this isn't necessarily going to push the
>> vendors to change their ways.
>
> Pushing back is different from blocking entirely.

Sorry.  I had the mistaken impression that you were planning to block
this entirely based on earlier comments.

> That means I'm going to be very difficult and make a lot of noise
> until I see the message has seeped in.
>
> It doesn't mean that I won't allow a means to use existing hardware
> offloads.  You'll just have to bear with me, be patient, and survive
> my tantrum on this matter.

I'm only really interested in what options the customers has in order
to get this all configured.  As long as there eventually ends up being
some path forward I'll be good with whatever ends up happening, though
my preference would be to see some option available in the kernel.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-05 Thread Alexander Duyck
On Sat, Dec 5, 2015 at 9:53 AM, Tom Herbert  wrote:
>> Keep in mind I don't represent one of the hardware vendors here
>> anymore.  I am approaching this from the customer point of view.  I
>> would like to have the performance I can get out of the parts I have.
>
> Trying enabling UDP checksum, GRO/GSO, and Remote Checksum Offload in
> VXLAN. Assuming you have a NIC that at least provides UDP checksum
> offload and RSS for UDP (may need to be enabled) you can get good
> VXLAN performance across a varietyof legacy "dumb" NICs.

VXLAN offload support is already there so I can make use of that in
hardware.  I assume we aren't talking about introducing a performance
regression by removing the VXLAN Rx port notification code.

Setting up any given environment I have to work with the hand I am
dealt.  If I can make use of the work you did to do offloading with
standard NICs then I will, but more solutions for the performance with
tunnels is always better.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-05 Thread David Miller
From: Alexander Duyck 
Date: Sat, 5 Dec 2015 00:24:55 -0800

> Still, hard blocking this isn't necessarily going to push the
> vendors to change their ways.

Pushing back is different from blocking entirely.

That means I'm going to be very difficult and make a lot of noise
until I see the message has seeped in.

It doesn't mean that I won't allow a means to use existing hardware
offloads.  You'll just have to bear with me, be patient, and survive
my tantrum on this matter.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-05 Thread Tom Herbert
> Keep in mind I don't represent one of the hardware vendors here
> anymore.  I am approaching this from the customer point of view.  I
> would like to have the performance I can get out of the parts I have.

Trying enabling UDP checksum, GRO/GSO, and Remote Checksum Offload in
VXLAN. Assuming you have a NIC that at least provides UDP checksum
offload and RSS for UDP (may need to be enabled) you can get good
VXLAN performance across a varietyof legacy "dumb" NICs.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-05 Thread Alexander Duyck
On Fri, Dec 4, 2015 at 10:49 PM, David Miller  wrote:
> From: Alexander Duyck 
> Date: Fri, 4 Dec 2015 21:45:09 -0800
>
>> Not having this feature has to in some way impact sales.
>
> I'm glad money trumps clean design and performance these days.
>
> Would they ship a literal turd until some customer asked for
> something better?  You have to be kidding me.

You think they wouldn't?  It all comes down to the bottom line.

Also, do you really think not having support for CHECKSUM_COMPLETE
makes the part a complete turd?  That hasn't stopped anyone from
buying many of the NICs out there that are using the port based
approach for VXLAN up to now.  Really what we are arguing about here
is a "nice to have feature" not something that will make or break a
sale for most people.  It was implemented just well enough to be able
to show gains on marketing data but there are enough corner cases
where the feature won't do much of anything since there is always some
upper limit on the number of ports supported.

> If it's true, then what a sad world we live in.

That is the nature of business.  If there is isn't any significant
impact on the bottom line most companies won't be pushed to take
action.

> And part of this is bogus, the circuit is already there and
> implemented already.  The missing part is putting the value computed
> by that circuit into the receive descriptor.

Yes, but that part doesn't complete the whole piece.  As I stated in
my other email it is still a bit of effort to complete something like
this.

> And furthermore, nobody is going to drop to BSD or DPDK for VXLAN
> tunnels just because I push back hard on this facility.

I agree, that was a bit of hyperbole on my part.  Still, hard blocking
this isn't necessarily going to push the vendors to change their ways.
It just ends up punishing the customers who already own the devices.
You may think the port based approach for the UDP tunnel offloads is a
"literal turd" but the fact is no amount of software changes is going
to do anything to fundamentally change how the hardware was designed,
and like I said there is likely more of that to come.

Keep in mind I don't represent one of the hardware vendors here
anymore.  I am approaching this from the customer point of view.  I
would like to have the performance I can get out of the parts I have.
In the future Mirantis may not buy nor recommend the devices, but I
have an OpenStack environment filled with NICs from various vendors
that support this UDP port number based offload.  I have to come up
with a means of polishing these "turds" in such a way that we can get
the maximum benefit out of them.  The question I would have is if you
see a constructive way of me doing this and working it out through the
kernel network stack, or do I need to suggest a bypass solution such
as ovs-dpdk and just give up on the hope of using kernel networking
with these parts?

- Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-04 Thread Alexander Duyck
On Fri, Dec 4, 2015 at 8:50 PM, David Miller  wrote:
> From: Alexander Duyck 
> Date: Fri, 4 Dec 2015 14:44:00 -0800
>
>> I actually tried to push the generic checksum idea for fm10k back
>> during hardware development but ended up losing that battle.
>
> This chips already have a circuit calculating the 1's complement sum
> over the data as is passed through the FIFO in the RTL, it's merely a
> matter of putting the result it in the descriptor.

Actually it is a bit trickier than that.  The problem is the L4
checksum doesn't include all of the L3 header in the pseudo header.
So in order to add this feature and maintain the current feature you
have to essentially compute two checksums.  One with the pseudo header
and one with the entire L3 header.  In addition there are the fiddly
little details like what to do about VLAN headers since I think they
were included in the checksum if you leave them in the header, but you
have to exclude them if they aren't.  There is also the matter of if
we include the L2 header or not.  Basically there are number of odd
corner cases and such where this really starts to become a pain.

> Relatively speaking, the feature would be almost free.

Right.  I made similar arguments.  The problem is the RTL they have
works, and they don't want to change it unless they have to.  Having a
couple engineers write the RTL, then have some validation engineers
test it, and some driver developer code it up costs money.  In
addition there is risk involved if some flaw slips through one of the
validation efforts resulting in a silicon spin, or even worse if some
flaw ends up being released.

What it comes down to is that from the engineering side we likely
won't be able to influence any change on the hardware design.  We have
to convince the sales and marketing folks for one of the vendors that
implementing this feature would be beneficial to their bottom line if
we want to see any actual change.  Without that impetus there is no
motivation on the vendors side to risk any of their capital trying to
implement a feature that a few of us kernel engineers think would be a
really good idea.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-04 Thread David Miller
From: Alexander Duyck 
Date: Fri, 4 Dec 2015 21:45:09 -0800

> Not having this feature has to in some way impact sales.

I'm glad money trumps clean design and performance these days.

Would they ship a literal turd until some customer asked for
something better?  You have to be kidding me.

If it's true, then what a sad world we live in.

And part of this is bogus, the circuit is already there and
implemented already.  The missing part is putting the value computed
by that circuit into the receive descriptor.

And furthermore, nobody is going to drop to BSD or DPDK for VXLAN
tunnels just because I push back hard on this facility.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-04 Thread Alexander Duyck

On 12/04/2015 04:53 PM, Tom Herbert wrote:

I actually tried to push the generic checksum idea for fm10k back
during hardware development but ended up losing that battle.  The
problem is you have to have some customer willing to spend the cash in
order to get a feature, and the fact is nobody other than Tom has been
pushing for this.


Very well, it is true that I only represent one user of networking
protocols, grant it a very large one. I will shut up now. If other
USERS want to chime in on what is best I'll certainly listen.


Tom,

I'm sorry, but I have a hard time believing you are actually 
representing a large user here.  By large user I assume you are implying 
Facebook?  I agree that your point is very valid on the merits of the 
1's compliment checksum likely being a useful feature, but I just think 
you are going about this the wrong way as obstructing things like this 
does little to impact hardware design decisions.


If we want to win over the manufacturers we would have to speak with 
money as they aren't going to make something unless they are convinced 
they can sell it.  Simply insisting we want some feature doesn't do much 
without the customer demand to back it up.  So, unless you are telling 
me Facebook is going to let this feature influence a purchasing decision 
in the future, the argument is B.S.


Not having this feature has to in some way impact sales.  You need to 
make the 1's compliment checksum a check box type item that if the part 
doesn't have the customer won't buy.  If you were to come up with some 
sort of data demonstrating the need for the feature and were to 
associate it with something such as Open Compute then you would start to 
go a long way towards winning over consumers that they need the feature 
and as a result convincing the manufacturers that they have to provide it.


- Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-04 Thread David Miller
From: Alexander Duyck 
Date: Fri, 4 Dec 2015 14:44:00 -0800

> I actually tried to push the generic checksum idea for fm10k back
> during hardware development but ended up losing that battle.

This chips already have a circuit calculating the 1's complement sum
over the data as is passed through the FIFO in the RTL, it's merely a
matter of putting the result it in the descriptor.

Relatively speaking, the feature would be almost free.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-04 Thread Tom Herbert
> I actually tried to push the generic checksum idea for fm10k back
> during hardware development but ended up losing that battle.  The
> problem is you have to have some customer willing to spend the cash in
> order to get a feature, and the fact is nobody other than Tom has been
> pushing for this.

Very well, it is true that I only represent one user of networking
protocols, grant it a very large one. I will shut up now. If other
USERS want to chime in on what is best I'll certainly listen.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-04 Thread Alexander Duyck
On Fri, Dec 4, 2015 at 12:06 PM, David Miller  wrote:
> From: Hannes Frederic Sowa 
> Date: Fri, 04 Dec 2015 20:59:05 +0100
>
>> Yes, I agree, I am totally with you here. If generic offloading can be
>> realized by NICs I am totally with you that this should be the way to
>> go. I don't see that coming in the next (small number of) years, so I
>> don't see a reason to stop this patchset.
>
> If I just apply this and say "yeah ok", the message is completely lost
> and your prediction about "small number of years" indeed will occur.

It is going to take several years regardless.  It isn't as if any of
these manufacturers can spin a design overnight.  It would likely take
a few years even if they suddenly all decided it was an important
feature to have tomorrow.  I suspect we will probably see more cards
with similar offloads long before any updated cards could come out as
there are already likely a number in the pipeline.

> However if I push back hard on this, as I will, then the message has
> some chance of seeping back to the people designing these chips.
>
> So that's what I'm going to do, like it or not.

The problem is the Linux kernel itself doesn't hold much sway over
hardware manufacturers.  A push back on something like this means they
will just bypass the upstream kernel entirely and only support this
type of offload out-of-tree on Linux or in DPDK.

If you are actually wanting to see the manufacturers change their
habits then the consumers of said cards really need to push back on
this kind of stuff.  As the saying goes money talks, B.S. walks.

> Or can someone convince me that someone who understand this stuff
> is telling the hardware guys to universally put 2's complement
> checksums into the descriptors?
>
> Who is doing that at each and every prominent ethernet hardware
> verndor?
>
> Who?

I actually tried to push the generic checksum idea for fm10k back
during hardware development but ended up losing that battle.  The
problem is you have to have some customer willing to spend the cash in
order to get a feature, and the fact is nobody other than Tom has been
pushing for this.  If it was one of Tom's employer, either Google or
Facebook, that had been telling manufacturers that they wouldn't buy
their product unless it had the feature then you can bet they would
have changed their tune.

If you want to push the manufacturers to change you basically need to
have someone put out some sort of marketable data on how a 1's
compliment checksum approach is superior to the current solution that
just indicates if the checksum is valid.  The problem is I haven't
seen anything like that so either this is due to nobody providing a
part that actually takes this approach, or because the approach is not
superior in terms of performance.  The test that should demonstrate
the superiority of using the 1's compliment checksum would be
something like having a number of VXLAN tunnel ports that exceed the
capabilities of the port filters for a given netdev.  With that you
would have the 1's compliment competing essentially against no offload
at all.

> If I get silence, or some vague non-specific response, then I'm going
> to hold my ground and keep pushing back on this stuff.

Trying to get driver developers to change this is far too late in the
process.  In many cases they hold little sway on the hardware design
which was likely locked down a year or more ago.  It is just preaching
to the choir as I am sure they have plenty of other parts of the
hardware implementation they are not happy with as well.

If anything I would say we need to be able to support the existing
hardware that has some number of filters that will identify these
tunnels via some form of ntuple filter.  The fact is there are already
5 different drivers that do this for VXLAN using vxlan_get_rx_port, I
suspect we will probably see others popping up soon to support GENEVE
and VXLAN-GPE.  By providing support for the existing hardware we can
at least let people make use of their hardware features without having
to circumvent the kernel.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-04 Thread David Miller
From: Tom Herbert 
Date: Fri, 4 Dec 2015 12:13:53 -0800

> On Fri, Dec 4, 2015 at 12:06 PM, David Miller  wrote:
>> From: Hannes Frederic Sowa 
>> Date: Fri, 04 Dec 2015 20:59:05 +0100
>>
>>> Yes, I agree, I am totally with you here. If generic offloading can be
>>> realized by NICs I am totally with you that this should be the way to
>>> go. I don't see that coming in the next (small number of) years, so I
>>> don't see a reason to stop this patchset.
>>
>> If I just apply this and say "yeah ok", the message is completely lost
>> and your prediction about "small number of years" indeed will occur.
>>
>> However if I push back hard on this, as I will, then the message has
>> some chance of seeping back to the people designing these chips.
>>
>> So that's what I'm going to do, like it or not.
>>
>> Or can someone convince me that someone who understand this stuff
>> is telling the hardware guys to universally put 2's complement
>> checksums into the descriptors?
>>
> We're talking about 1's complement checksum (RFC1701). Just to be clear :-)

Right :)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-04 Thread Hannes Frederic Sowa
On Fri, Dec 4, 2015, at 21:43, Tom Herbert wrote:
> On Fri, Dec 4, 2015 at 12:26 PM, Hannes Frederic Sowa
>  wrote:
> > Hi Dave,
> >
> > On Fri, Dec 4, 2015, at 21:06, David Miller wrote:
> >> From: Hannes Frederic Sowa 
> >> Date: Fri, 04 Dec 2015 20:59:05 +0100
> >>
> >> > Yes, I agree, I am totally with you here. If generic offloading can be
> >> > realized by NICs I am totally with you that this should be the way to
> >> > go. I don't see that coming in the next (small number of) years, so I
> >> > don't see a reason to stop this patchset.
> >>
> >> If I just apply this and say "yeah ok", the message is completely lost
> >> and your prediction about "small number of years" indeed will occur.
> >>
> >> However if I push back hard on this, as I will, then the message has
> >> some chance of seeping back to the people designing these chips.
> >>
> >> So that's what I'm going to do, like it or not.
> >>
> >> Or can someone convince me that someone who understand this stuff
> >> is telling the hardware guys to universally put 2's complement
> >> checksums into the descriptors?
> >>
> >> Who is doing that at each and every prominent ethernet hardware
> >> verndor?
> >>
> >> Who?
> >>
> >> If I get silence, or some vague non-specific response, then I'm going
> >> to hold my ground and keep pushing back on this stuff.
> >
> > This is not only about 1's checksumming but also about TSO (and to some
> > smaller degree about RSS, as I tried to explain): if we attach a geneve
> > header in front of a skb we expect the hardware to recognize it and
> > duplicate it while doing the hardware segmentation. The hardware can
> > only do so if it is in knowledge of the specific port (in this case UDP
> > port used for geneve) which is in use for this particular tunneling
> > transport protocol. We currently cannot describe this in a generic way,
> > thus this patchset. (Please correct me if I am wrong!)
> >
> Yes, you are wrong. Port numbers are not used in transmit path to
> signal offload. To perform TSO on UDP encapsulated packets the skb is
> marked with SKB_GSO_UDP_TUNNEL or SKB_GSO_UDP_TUNNEL_CSUM and
> SKB_GSO_TCP, etc. The driver can use information along with the
> offsets of the inner and outer headers in the packet to set up the
> operation in the device.  Some devices only support TSO for VXLAN, but
> regardless SKB_GSO_UDP_TUNNEL is generic for all known UDP
> encapsulations. Protocol specific offload is not needed. Please start
> looking at
> http://people.netfilter.org/pablo/netdev0.1/papers/UDP-Encapsulation-in-Linux.pdf
> and the kernel code to see how things _actually_ work.

I am sorry. Of course we have _inner_ header pointers and the
corresponding gso_types in skb_shinfo to signal that already. Probably I
got confused by some driver sources and commit descriptions I looked
into lately. Thanks for the correction! :)

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-04 Thread Jesse Gross
On Fri, Dec 4, 2015 at 12:26 PM, Hannes Frederic Sowa
 wrote:
> Hi Dave,
>
> On Fri, Dec 4, 2015, at 21:06, David Miller wrote:
>> From: Hannes Frederic Sowa 
>> Date: Fri, 04 Dec 2015 20:59:05 +0100
>>
>> > Yes, I agree, I am totally with you here. If generic offloading can be
>> > realized by NICs I am totally with you that this should be the way to
>> > go. I don't see that coming in the next (small number of) years, so I
>> > don't see a reason to stop this patchset.
>>
>> If I just apply this and say "yeah ok", the message is completely lost
>> and your prediction about "small number of years" indeed will occur.
>>
>> However if I push back hard on this, as I will, then the message has
>> some chance of seeping back to the people designing these chips.
>>
>> So that's what I'm going to do, like it or not.
>>
>> Or can someone convince me that someone who understand this stuff
>> is telling the hardware guys to universally put 2's complement
>> checksums into the descriptors?
>>
>> Who is doing that at each and every prominent ethernet hardware
>> verndor?
>>
>> Who?
>>
>> If I get silence, or some vague non-specific response, then I'm going
>> to hold my ground and keep pushing back on this stuff.
>
> This is not only about 1's checksumming but also about TSO (and to some
> smaller degree about RSS, as I tried to explain): if we attach a geneve
> header in front of a skb we expect the hardware to recognize it and
> duplicate it while doing the hardware segmentation. The hardware can
> only do so if it is in knowledge of the specific port (in this case UDP
> port used for geneve) which is in use for this particular tunneling
> transport protocol. We currently cannot describe this in a generic way,
> thus this patchset. (Please correct me if I am wrong!)

This isn't really about TSO so much as receive side offloads. However,
the general point still stands.

Checksum is only one component and really the only one that has this
type of generalizable mathematical property. n-tuple offloads, LRO,
etc. are things that are currently supported by the stack and need
this type of support. And encryption, which people are already pushing
for, has the same issue. The fact is that there is no real plan to be
able to support these types of things in a way other than what is
being done in this patchset.

I do believe that there is genuine interest in working to find
solutions to these types of problems as John and et. al. have already
been doing. However, a real, fully general solution is not something
that exists as this point in time, as you can see from all of the
discussion in this thread.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-04 Thread Tom Herbert
On Fri, Dec 4, 2015 at 12:26 PM, Hannes Frederic Sowa
 wrote:
> Hi Dave,
>
> On Fri, Dec 4, 2015, at 21:06, David Miller wrote:
>> From: Hannes Frederic Sowa 
>> Date: Fri, 04 Dec 2015 20:59:05 +0100
>>
>> > Yes, I agree, I am totally with you here. If generic offloading can be
>> > realized by NICs I am totally with you that this should be the way to
>> > go. I don't see that coming in the next (small number of) years, so I
>> > don't see a reason to stop this patchset.
>>
>> If I just apply this and say "yeah ok", the message is completely lost
>> and your prediction about "small number of years" indeed will occur.
>>
>> However if I push back hard on this, as I will, then the message has
>> some chance of seeping back to the people designing these chips.
>>
>> So that's what I'm going to do, like it or not.
>>
>> Or can someone convince me that someone who understand this stuff
>> is telling the hardware guys to universally put 2's complement
>> checksums into the descriptors?
>>
>> Who is doing that at each and every prominent ethernet hardware
>> verndor?
>>
>> Who?
>>
>> If I get silence, or some vague non-specific response, then I'm going
>> to hold my ground and keep pushing back on this stuff.
>
> This is not only about 1's checksumming but also about TSO (and to some
> smaller degree about RSS, as I tried to explain): if we attach a geneve
> header in front of a skb we expect the hardware to recognize it and
> duplicate it while doing the hardware segmentation. The hardware can
> only do so if it is in knowledge of the specific port (in this case UDP
> port used for geneve) which is in use for this particular tunneling
> transport protocol. We currently cannot describe this in a generic way,
> thus this patchset. (Please correct me if I am wrong!)
>
Yes, you are wrong. Port numbers are not used in transmit path to
signal offload. To perform TSO on UDP encapsulated packets the skb is
marked with SKB_GSO_UDP_TUNNEL or SKB_GSO_UDP_TUNNEL_CSUM and
SKB_GSO_TCP, etc. The driver can use information along with the
offsets of the inner and outer headers in the packet to set up the
operation in the device.  Some devices only support TSO for VXLAN, but
regardless SKB_GSO_UDP_TUNNEL is generic for all known UDP
encapsulations. Protocol specific offload is not needed. Please start
looking at 
http://people.netfilter.org/pablo/netdev0.1/papers/UDP-Encapsulation-in-Linux.pdf
and the kernel code to see how things _actually_ work.

> The other way to do it would probably be to enlarge the skb and push the
> structure of the packet into it, so hardware has more semantic knowledge
> about the frames structure. I guess(!!!) DPDK does it like that?
>
> If it would only be about checksuming I probably would agree.
>
> Bye,
> Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-04 Thread Hannes Frederic Sowa
Hi Dave,

On Fri, Dec 4, 2015, at 21:06, David Miller wrote:
> From: Hannes Frederic Sowa 
> Date: Fri, 04 Dec 2015 20:59:05 +0100
> 
> > Yes, I agree, I am totally with you here. If generic offloading can be
> > realized by NICs I am totally with you that this should be the way to
> > go. I don't see that coming in the next (small number of) years, so I
> > don't see a reason to stop this patchset.
> 
> If I just apply this and say "yeah ok", the message is completely lost
> and your prediction about "small number of years" indeed will occur.
> 
> However if I push back hard on this, as I will, then the message has
> some chance of seeping back to the people designing these chips.
> 
> So that's what I'm going to do, like it or not.
> 
> Or can someone convince me that someone who understand this stuff
> is telling the hardware guys to universally put 2's complement
> checksums into the descriptors?
> 
> Who is doing that at each and every prominent ethernet hardware
> verndor?
> 
> Who?
> 
> If I get silence, or some vague non-specific response, then I'm going
> to hold my ground and keep pushing back on this stuff.

This is not only about 1's checksumming but also about TSO (and to some
smaller degree about RSS, as I tried to explain): if we attach a geneve
header in front of a skb we expect the hardware to recognize it and
duplicate it while doing the hardware segmentation. The hardware can
only do so if it is in knowledge of the specific port (in this case UDP
port used for geneve) which is in use for this particular tunneling
transport protocol. We currently cannot describe this in a generic way,
thus this patchset. (Please correct me if I am wrong!)

The other way to do it would probably be to enlarge the skb and push the
structure of the packet into it, so hardware has more semantic knowledge
about the frames structure. I guess(!!!) DPDK does it like that?

If it would only be about checksuming I probably would agree.

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-04 Thread Tom Herbert
On Fri, Dec 4, 2015 at 12:06 PM, David Miller  wrote:
> From: Hannes Frederic Sowa 
> Date: Fri, 04 Dec 2015 20:59:05 +0100
>
>> Yes, I agree, I am totally with you here. If generic offloading can be
>> realized by NICs I am totally with you that this should be the way to
>> go. I don't see that coming in the next (small number of) years, so I
>> don't see a reason to stop this patchset.
>
> If I just apply this and say "yeah ok", the message is completely lost
> and your prediction about "small number of years" indeed will occur.
>
> However if I push back hard on this, as I will, then the message has
> some chance of seeping back to the people designing these chips.
>
> So that's what I'm going to do, like it or not.
>
> Or can someone convince me that someone who understand this stuff
> is telling the hardware guys to universally put 2's complement
> checksums into the descriptors?
>
We're talking about 1's complement checksum (RFC1701). Just to be clear :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-04 Thread David Miller
From: Hannes Frederic Sowa 
Date: Fri, 04 Dec 2015 20:59:05 +0100

> Yes, I agree, I am totally with you here. If generic offloading can be
> realized by NICs I am totally with you that this should be the way to
> go. I don't see that coming in the next (small number of) years, so I
> don't see a reason to stop this patchset.

If I just apply this and say "yeah ok", the message is completely lost
and your prediction about "small number of years" indeed will occur.

However if I push back hard on this, as I will, then the message has
some chance of seeping back to the people designing these chips.

So that's what I'm going to do, like it or not.

Or can someone convince me that someone who understand this stuff
is telling the hardware guys to universally put 2's complement
checksums into the descriptors?

Who is doing that at each and every prominent ethernet hardware
verndor?

Who?

If I get silence, or some vague non-specific response, then I'm going
to hold my ground and keep pushing back on this stuff.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-04 Thread Hannes Frederic Sowa


On Fri, Dec 4, 2015, at 20:59, Hannes Frederic Sowa wrote:
> And then filling out those fields using the offsetof and sizeof of the
> headers, but this seemed to be very difficult a) because they use
> bitmasks (which of course could be converted) or in case of IPv6 a
> schema would have to be specified how to walk down the IPv6 extensions.
> This seems also to be true for NSH. Maybe gcc could help with
> compile-time introspection with bitfields in the future but I doubt that
> for now. Duplicating and maintaining two header structs for one
> tunneling protoco

Seems like I accidentally removed something here:

I wanted to write that maintaining multiple descriptions of tunneling
headers seems not worthwhile for now.

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-04 Thread Hannes Frederic Sowa
Hi Tom,

On Fri, Dec 4, 2015, at 19:28, Tom Herbert wrote:
> > I do know that, but fact is, the current drivers do it. I am concerned
> > about the amount of entropy in one single 16 bit field used to
> > distinguish flows. Flow labels fine and good, but if current hardware
> > does not support it, it does not help. Imagine containers with lots of
> > applications, 16 bit doesn't seem to fit here.
> >
> Based on what? RSS indirection table is only seven bits so even 16
> bits would be overkill for that. Please provide a concrete example,
> data where 16 bits wouldn't be sufficient.

I don't have concrete evidence: I just noticed that drivers already
implement RSS based on the data we push them over the vxlan offloading
ndos. This patchset achieves the same for geneve. Also if people would
like to implement ntuple filtering within encapsulated packets on the
NIC this is a requirement. I agree this is a bit far fetched but losing
this capability right now doesn't seem worthwhile for now and also not
stopping new protocols being deployed in this manner with specific
offloads.

Also I am not sure if hardware does only provide a 7 bit indirection
table, that would limit them to 128 receive queues/cores where they can
steer their packets to, but I absolutely don't know, it is just a guess.
Given that FM10K has 256 max queues, I could imagine they also use a
larger indirection table, no? But yeah, obviously this would still be
enough. This also very much depends on the hw used hash function,
probably toeplitz hash, and the distribution thereof. I would need to do
more research on this and check out biases.

> >> > Please provide a sketch up for a protocol generic api that can tell
> >> > hardware where a inner protocol header starts that supports vxlan,
> >> > vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is
> >> > starting at that point.
> >> >
> >> BPF. Implementing protocol generic offloads are not just a HW concern
> >> either, adding kernel GRO code for every possible protocol that comes
> >> along doesn't scale well. This becomes especially obvious when we
> >> consider how to provide offloads for applications protocols. If the
> >> kernel provides a programmable framework for the offloads then
> >> application protocols, such as QUIC, could use use that without
> >> needing to hack the kernel to support the specific protocol (which no
> >> one wants!). Application protocol parsing in KCM and some other use
> >> cases of BPF have already foreshadowed this, and we are working on a
> >> prototype for a BPF programmable engine in the kernel. Presumably,
> >> this same model could eventually be applied as the HW API to
> >> programmable offload.
> >
> > So your proposal is like this:
> >
> > dev->ops->ndo_add_offload(struct net_device *, struct bpf_prog *) ?
> >
> > What do network cards do when they don't support bpf in hardware as
> > currently all cards. Should they do program equivalence testing on the
> > bpf program to check if it conforms some of its offload capabilities and
> > activate those for the port they parsed out of the bpf program? I don't
> > really care about more function pointers in struct net_device_ops
> > because it really doesn't matter but what really concerns me is the huge
> > size of the drivers in the kernel. Just tell the driver specifically
> > what is wanted and let them do that. Don't force them to do program
> > inspection or anything.
> >
> Nobody is forcing anyone to do anything. If someone implements generic
> offload like this it's treated just like any other optional feature of
> a NIC.

Yes, I agree, I am totally with you here. If generic offloading can be
realized by NICs I am totally with you that this should be the way to
go. I don't see that coming in the next (small number of) years, so I
don't see a reason to stop this patchset. (Or the more specific one
posted recently.)

All protocols can try to push down their offloading needs to the NIC via
a special generic ndo op hopefully in the future. But hardware currently
doesn't support that, so I can understand why this patchset implements
more specific offloads for specific IETF drafts.

I favor the new ndo op slightly more which is implemented in the new
patch set.

> > About your argument regarding GRO for every possible protocol:
> >
> > Adding GRO for QUIC or SPUD transparently does not work as it breaks the
> > semantics of UDP. UDP is a framed protocol not a streamed one so it does
> > not make sense to add that. You can implement GRO for fragmented UDP,
> > though. The length of the packet is end-to-end information. If you add a
> > new protocol with a new socket type, sure you can add GRO engine
> > transparently for that but not simply peeking data inside UDP if you
> > don't know how the local application uses this data. In case of
> > forwarding you can never do that, it will break the internet actually.
> > In case you are the end host GRO engine can ask the socket what type it
> > 

Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-04 Thread John Fastabend
[...]

 Please provide a sketch up for a protocol generic api that can tell
 hardware where a inner protocol header starts that supports vxlan,
 vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is
 starting at that point.

>>> BPF. Implementing protocol generic offloads are not just a HW concern
>>> either, adding kernel GRO code for every possible protocol that comes
>>> along doesn't scale well. This becomes especially obvious when we
>>> consider how to provide offloads for applications protocols. If the
>>> kernel provides a programmable framework for the offloads then
>>> application protocols, such as QUIC, could use use that without
>>> needing to hack the kernel to support the specific protocol (which no
>>> one wants!). Application protocol parsing in KCM and some other use
>>> cases of BPF have already foreshadowed this, and we are working on a
>>> prototype for a BPF programmable engine in the kernel. Presumably,
>>> this same model could eventually be applied as the HW API to
>>> programmable offload.
>>
>> So your proposal is like this:
>>
>> dev->ops->ndo_add_offload(struct net_device *, struct bpf_prog *) ?
>>
>> What do network cards do when they don't support bpf in hardware as
>> currently all cards. Should they do program equivalence testing on the
>> bpf program to check if it conforms some of its offload capabilities and
>> activate those for the port they parsed out of the bpf program? I don't
>> really care about more function pointers in struct net_device_ops
>> because it really doesn't matter but what really concerns me is the huge
>> size of the drivers in the kernel. Just tell the driver specifically
>> what is wanted and let them do that. Don't force them to do program
>> inspection or anything.
>>
> Nobody is forcing anyone to do anything. If someone implements generic
> offload like this it's treated just like any other optional feature of
> a NIC.
> 

My concern with this approach is it seems to imply either you have
a BPF engine in hardware (via FPGA or NPU) or you do a program
transformation of a BPF program into registers. Possibly by building
the control flow graph and mapping that onto a parse graph. Maybe
this could be done in some library code for drivers to use but it
seems a bit unnecessary to me when we could make an API map to this
class of hardware.

Note I think a ndo_add_opffload is really useful and needed for
one class  of devices but misses the mark slightly for a large class of
devices we have today/tomorrow.

>> About your argument regarding GRO for every possible protocol:
>>
>> Adding GRO for QUIC or SPUD transparently does not work as it breaks the
>> semantics of UDP. UDP is a framed protocol not a streamed one so it does
>> not make sense to add that. You can implement GRO for fragmented UDP,
>> though. The length of the packet is end-to-end information. If you add a
>> new protocol with a new socket type, sure you can add GRO engine
>> transparently for that but not simply peeking data inside UDP if you
>> don't know how the local application uses this data. In case of
>> forwarding you can never do that, it will break the internet actually.
>> In case you are the end host GRO engine can ask the socket what type it
>> is or what framing inside UDP is used. Thus this cannot work on hardware
>> either.
>>
> This is not correct, We already have many instances of GRO being used
> over UDP in several UDP encapsulations, there is no issue with
> breaking UDP semantics. QUIC is a stream based transport like TCP so
> it will fit into the model (granted the fact that this incoming from
> userspace and the per packet security will make it little more
> challenging to implement offload). I don't know if this is needed, but
> I can only assume that server performance in QUIC must be miserable if
> all the I/O is 1350 bytes.
> 
>> I am not very happy with the use cases of BPF outside of tracing and
>> cls_bpf and packet steering.
>>
>> Please don't propose that we should use BPF as the API for HW
>> programmable offloading currently. It does not make sense.
>>
> If you have an alternative, please propose it now.

My proposal is still the Flow API I proposed back in Feb. It maps
well to at least the segment of hardware that exists today and/or will
exist in the very near future. And also requires less mangling by the
driver, kernel, etc.

As a reminder here are the operations I proposed,

On the read-only side for parse graphs,

 get_hdrs : returns a list of header types supported
 get_hdr_graph : returns a parse graph of the header types
 get_actions : returns a list of actions the device supports on nodes
   in the above graph.

Then I also proposed some operations for reading out table formats
but I think you could ignore that for the time being if your main
concern is parsing headers for queue mappings, RSS, etc. These were
more about building pipelines of operations. For completeness the
operations were get

Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-04 Thread Tom Herbert
> I do know that, but fact is, the current drivers do it. I am concerned
> about the amount of entropy in one single 16 bit field used to
> distinguish flows. Flow labels fine and good, but if current hardware
> does not support it, it does not help. Imagine containers with lots of
> applications, 16 bit doesn't seem to fit here.
>
Based on what? RSS indirection table is only seven bits so even 16
bits would be overkill for that. Please provide a concrete example,
data where 16 bits wouldn't be sufficient.

>> > Please provide a sketch up for a protocol generic api that can tell
>> > hardware where a inner protocol header starts that supports vxlan,
>> > vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is
>> > starting at that point.
>> >
>> BPF. Implementing protocol generic offloads are not just a HW concern
>> either, adding kernel GRO code for every possible protocol that comes
>> along doesn't scale well. This becomes especially obvious when we
>> consider how to provide offloads for applications protocols. If the
>> kernel provides a programmable framework for the offloads then
>> application protocols, such as QUIC, could use use that without
>> needing to hack the kernel to support the specific protocol (which no
>> one wants!). Application protocol parsing in KCM and some other use
>> cases of BPF have already foreshadowed this, and we are working on a
>> prototype for a BPF programmable engine in the kernel. Presumably,
>> this same model could eventually be applied as the HW API to
>> programmable offload.
>
> So your proposal is like this:
>
> dev->ops->ndo_add_offload(struct net_device *, struct bpf_prog *) ?
>
> What do network cards do when they don't support bpf in hardware as
> currently all cards. Should they do program equivalence testing on the
> bpf program to check if it conforms some of its offload capabilities and
> activate those for the port they parsed out of the bpf program? I don't
> really care about more function pointers in struct net_device_ops
> because it really doesn't matter but what really concerns me is the huge
> size of the drivers in the kernel. Just tell the driver specifically
> what is wanted and let them do that. Don't force them to do program
> inspection or anything.
>
Nobody is forcing anyone to do anything. If someone implements generic
offload like this it's treated just like any other optional feature of
a NIC.

> About your argument regarding GRO for every possible protocol:
>
> Adding GRO for QUIC or SPUD transparently does not work as it breaks the
> semantics of UDP. UDP is a framed protocol not a streamed one so it does
> not make sense to add that. You can implement GRO for fragmented UDP,
> though. The length of the packet is end-to-end information. If you add a
> new protocol with a new socket type, sure you can add GRO engine
> transparently for that but not simply peeking data inside UDP if you
> don't know how the local application uses this data. In case of
> forwarding you can never do that, it will break the internet actually.
> In case you are the end host GRO engine can ask the socket what type it
> is or what framing inside UDP is used. Thus this cannot work on hardware
> either.
>
This is not correct, We already have many instances of GRO being used
over UDP in several UDP encapsulations, there is no issue with
breaking UDP semantics. QUIC is a stream based transport like TCP so
it will fit into the model (granted the fact that this incoming from
userspace and the per packet security will make it little more
challenging to implement offload). I don't know if this is needed, but
I can only assume that server performance in QUIC must be miserable if
all the I/O is 1350 bytes.

> I am not very happy with the use cases of BPF outside of tracing and
> cls_bpf and packet steering.
>
> Please don't propose that we should use BPF as the API for HW
> programmable offloading currently. It does not make sense.
>
If you have an alternative, please propose it now.

> Bye,
> Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-03 Thread Andreas Schultz



On 12/03/2015 04:59 PM, Hannes Frederic Sowa wrote:

Hi Tom,

On Wed, Dec 2, 2015, at 20:15, Tom Herbert wrote:

On Wed, Dec 2, 2015 at 8:35 AM, Hannes Frederic Sowa
 wrote:

On Wed, Dec 2, 2015, at 04:50, Tom Herbert wrote:

That completely misses the whole point of the rest of this thread.
Protocol specific offloads are what we are trying to discourage not
encourage. Adding any more ndo functions for this purpose should be an
exception, not the norm. The bar should be naturally high considering
the cost of exposing this to ndo.


Why?

I wonder why we need protocol generic offloads? I know there are
currently a lot of overlay encapsulation protocols. Are there many more
coming?


Yes, and assume that there are more coming with an unbounded limit
(for instance I just noticed today that there is a netdev1.1 talk on
supporting GTP in the kernel). Besides, this problem space not just
limited to offload of encapsulation protocols, but how to generalize
offload of any transport, IPv[46], application protocols, protocol
implemented in user space, security protocols, etc.


GTP seems to be a tunneling protocol also based on TCP, I hope the same
standards apply to it as STT at that time (depending on the
implementation, of course). There are some other protocols on its way, I
see but they can just be realized as kernel modules and that's it.


GTP is UDP based. The standard permits a variable length header (one can
add extensions after a fixed header), but that is seldom (or even never)
used. Tunnel are identified by a 32bit tunnel endpoint id for GTPv1 and
a 64bit flow id for GTPv0. UDP destination ports differ for v1 and v0,
so it's easy to distinguish.

The biggest pain when implementing GTP are the path maintenance procedures.
But this really has nothing to do with offloads

Andreas


I am also not sure I can follow, some time ago the use of TOE (TCP
Offload Engine) were pretty much banished from entering the linux
kernel, has this really changed? It would be needed to do hardware
offloading of all other protocols inside TCP, no?

There are really a lot of tunneling protocols nowadays.


Besides, this offload is about TSO and RSS and they do need to parse the
packet to get the information where the inner header starts. It is not
only about checksum offloading.


RSS does not require the device to parse the inner header. All the UDP
encapsulations protocols being defined set the source port to entropy
flow value and most devices already support RSS+UDP (just needs to be
enabled) so this works just fine with dumb NICs. In fact, this is one
of the main motivations of encapsulating UDP in the first place, to
leverage existing RSS and ECMP mechanisms. The more general solution
is to use IPv6 flow label (RFC6438). We need HW support to include the
flow label into the hash for ECMP and RSS, but once we have that much
of the motivation for using UDP goes away and we can get back to just
doing GRE/IP, IPIP, MPLS/IP, etc. (hence eliminate overhead and
complexity of UDP encap).


I do know that, but fact is, the current drivers do it. I am concerned
about the amount of entropy in one single 16 bit field used to
distinguish flows. Flow labels fine and good, but if current hardware
does not support it, it does not help. Imagine containers with lots of
applications, 16 bit doesn't seem to fit here.


Please provide a sketch up for a protocol generic api that can tell
hardware where a inner protocol header starts that supports vxlan,
vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is
starting at that point.


BPF. Implementing protocol generic offloads are not just a HW concern
either, adding kernel GRO code for every possible protocol that comes
along doesn't scale well. This becomes especially obvious when we
consider how to provide offloads for applications protocols. If the
kernel provides a programmable framework for the offloads then
application protocols, such as QUIC, could use use that without
needing to hack the kernel to support the specific protocol (which no
one wants!). Application protocol parsing in KCM and some other use
cases of BPF have already foreshadowed this, and we are working on a
prototype for a BPF programmable engine in the kernel. Presumably,
this same model could eventually be applied as the HW API to
programmable offload.


So your proposal is like this:

dev->ops->ndo_add_offload(struct net_device *, struct bpf_prog *) ?

What do network cards do when they don't support bpf in hardware as
currently all cards. Should they do program equivalence testing on the
bpf program to check if it conforms some of its offload capabilities and
activate those for the port they parsed out of the bpf program? I don't
really care about more function pointers in struct net_device_ops
because it really doesn't matter but what really concerns me is the huge
size of the drivers in the kernel. Just tell the driver specifically
what is wanted and let them do that. Don't force them to do prog

Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-03 Thread Hannes Frederic Sowa
On Thu, Dec 3, 2015, at 17:35, Andreas Schultz wrote:
> On 12/03/2015 04:59 PM, Hannes Frederic Sowa wrote:
> > Hi Tom,
> >
> > On Wed, Dec 2, 2015, at 20:15, Tom Herbert wrote:
> >> On Wed, Dec 2, 2015 at 8:35 AM, Hannes Frederic Sowa
> >>  wrote:
> >>> On Wed, Dec 2, 2015, at 04:50, Tom Herbert wrote:
>  That completely misses the whole point of the rest of this thread.
>  Protocol specific offloads are what we are trying to discourage not
>  encourage. Adding any more ndo functions for this purpose should be an
>  exception, not the norm. The bar should be naturally high considering
>  the cost of exposing this to ndo.
> >>>
> >>> Why?
> >>>
> >>> I wonder why we need protocol generic offloads? I know there are
> >>> currently a lot of overlay encapsulation protocols. Are there many more
> >>> coming?
> >>>
> >> Yes, and assume that there are more coming with an unbounded limit
> >> (for instance I just noticed today that there is a netdev1.1 talk on
> >> supporting GTP in the kernel). Besides, this problem space not just
> >> limited to offload of encapsulation protocols, but how to generalize
> >> offload of any transport, IPv[46], application protocols, protocol
> >> implemented in user space, security protocols, etc.
> >
> > GTP seems to be a tunneling protocol also based on TCP, I hope the same
> > standards apply to it as STT at that time (depending on the
> > implementation, of course). There are some other protocols on its way, I
> > see but they can just be realized as kernel modules and that's it.
> 
> GTP is UDP based. The standard permits a variable length header (one can
> add extensions after a fixed header), but that is seldom (or even never)
> used. Tunnel are identified by a 32bit tunnel endpoint id for GTPv1 and
> a 64bit flow id for GTPv0. UDP destination ports differ for v1 and v0,
> so it's easy to distinguish.

Ok, thanks for letting me know. Browsing in Wikipedia first mentioned
both TCP and UDP. But I see that v1 only uses UDP.

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-03 Thread Hannes Frederic Sowa
Hi Tom,

On Wed, Dec 2, 2015, at 20:15, Tom Herbert wrote:
> On Wed, Dec 2, 2015 at 8:35 AM, Hannes Frederic Sowa
>  wrote:
> > On Wed, Dec 2, 2015, at 04:50, Tom Herbert wrote:
> >> That completely misses the whole point of the rest of this thread.
> >> Protocol specific offloads are what we are trying to discourage not
> >> encourage. Adding any more ndo functions for this purpose should be an
> >> exception, not the norm. The bar should be naturally high considering
> >> the cost of exposing this to ndo.
> >
> > Why?
> >
> > I wonder why we need protocol generic offloads? I know there are
> > currently a lot of overlay encapsulation protocols. Are there many more
> > coming?
> >
> Yes, and assume that there are more coming with an unbounded limit
> (for instance I just noticed today that there is a netdev1.1 talk on
> supporting GTP in the kernel). Besides, this problem space not just
> limited to offload of encapsulation protocols, but how to generalize
> offload of any transport, IPv[46], application protocols, protocol
> implemented in user space, security protocols, etc.

GTP seems to be a tunneling protocol also based on TCP, I hope the same
standards apply to it as STT at that time (depending on the
implementation, of course). There are some other protocols on its way, I
see but they can just be realized as kernel modules and that's it.

I am also not sure I can follow, some time ago the use of TOE (TCP
Offload Engine) were pretty much banished from entering the linux
kernel, has this really changed? It would be needed to do hardware
offloading of all other protocols inside TCP, no?

There are really a lot of tunneling protocols nowadays.

> > Besides, this offload is about TSO and RSS and they do need to parse the
> > packet to get the information where the inner header starts. It is not
> > only about checksum offloading.
> >
> RSS does not require the device to parse the inner header. All the UDP
> encapsulations protocols being defined set the source port to entropy
> flow value and most devices already support RSS+UDP (just needs to be
> enabled) so this works just fine with dumb NICs. In fact, this is one
> of the main motivations of encapsulating UDP in the first place, to
> leverage existing RSS and ECMP mechanisms. The more general solution
> is to use IPv6 flow label (RFC6438). We need HW support to include the
> flow label into the hash for ECMP and RSS, but once we have that much
> of the motivation for using UDP goes away and we can get back to just
> doing GRE/IP, IPIP, MPLS/IP, etc. (hence eliminate overhead and
> complexity of UDP encap).

I do know that, but fact is, the current drivers do it. I am concerned
about the amount of entropy in one single 16 bit field used to
distinguish flows. Flow labels fine and good, but if current hardware
does not support it, it does not help. Imagine containers with lots of
applications, 16 bit doesn't seem to fit here.

> > Please provide a sketch up for a protocol generic api that can tell
> > hardware where a inner protocol header starts that supports vxlan,
> > vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is
> > starting at that point.
> >
> BPF. Implementing protocol generic offloads are not just a HW concern
> either, adding kernel GRO code for every possible protocol that comes
> along doesn't scale well. This becomes especially obvious when we
> consider how to provide offloads for applications protocols. If the
> kernel provides a programmable framework for the offloads then
> application protocols, such as QUIC, could use use that without
> needing to hack the kernel to support the specific protocol (which no
> one wants!). Application protocol parsing in KCM and some other use
> cases of BPF have already foreshadowed this, and we are working on a
> prototype for a BPF programmable engine in the kernel. Presumably,
> this same model could eventually be applied as the HW API to
> programmable offload.

So your proposal is like this:

dev->ops->ndo_add_offload(struct net_device *, struct bpf_prog *) ?

What do network cards do when they don't support bpf in hardware as
currently all cards. Should they do program equivalence testing on the
bpf program to check if it conforms some of its offload capabilities and
activate those for the port they parsed out of the bpf program? I don't
really care about more function pointers in struct net_device_ops
because it really doesn't matter but what really concerns me is the huge
size of the drivers in the kernel. Just tell the driver specifically
what is wanted and let them do that. Don't force them to do program
inspection or anything.

About your argument regarding GRO for every possible protocol:

Adding GRO for QUIC or SPUD transparently does not work as it breaks the
semantics of UDP. UDP is a framed protocol not a streamed one so it does
not make sense to add that. You can implement GRO for fragmented UDP,
though. The length of the packet is end-to-end information. If y

Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-02 Thread Alexei Starovoitov
On Wed, Dec 02, 2015 at 03:35:53PM -0800, John Fastabend wrote:
> [...]
> > BPF. Implementing protocol generic offloads are not just a HW concern
> > either, adding kernel GRO code for every possible protocol that comes
> > along doesn't scale well. This becomes especially obvious when we
> > consider how to provide offloads for applications protocols. If the
> > kernel provides a programmable framework for the offloads then
> > application protocols, such as QUIC, could use use that without
> > needing to hack the kernel to support the specific protocol (which no
> > one wants!). Application protocol parsing in KCM and some other use
> > cases of BPF have already foreshadowed this, and we are working on a
> > prototype for a BPF programmable engine in the kernel. Presumably,
> > this same model could eventually be applied as the HW API to
> > programmable offload.
> 
> Just keying off the last statement there...
> 
> I think BPF programs are going to be hard to translate into hardware
> for most devices. The problem is the BPF programs in general lack
> structure. A parse graph would be much more friendly for hardware or
> at minimum the BPF program would need to be a some sort of
> well-structured program so a driver could turn that into a parse graph.

I'm looking at bpf as a way to describe the intent of what HW or SW has to do
and in case of SW it's easy to JIT and execute, but nic/switch doesn't
have to 'execute' bpf instructions. If it's fpga based it can compile
bpf program into parallel gates. Less flexible HW would not be able
to off-load all programs. That's fine. Long term flexible SW will
push HW to be flexible.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-02 Thread Tom Herbert
On Wed, Dec 2, 2015 at 3:35 PM, John Fastabend  wrote:
> [...]
>
>>>
>>> I wonder why we need protocol generic offloads? I know there are
>>> currently a lot of overlay encapsulation protocols. Are there many more
>>> coming?
>>>
>> Yes, and assume that there are more coming with an unbounded limit
>> (for instance I just noticed today that there is a netdev1.1 talk on
>> supporting GTP in the kernel). Besides, this problem space not just
>> limited to offload of encapsulation protocols, but how to generalize
>> offload of any transport, IPv[46], application protocols, protocol
>> implemented in user space, security protocols, etc.
>>
>>> Besides, this offload is about TSO and RSS and they do need to parse the
>>> packet to get the information where the inner header starts. It is not
>>> only about checksum offloading.
>>>
>> RSS does not require the device to parse the inner header. All the UDP
>> encapsulations protocols being defined set the source port to entropy
>> flow value and most devices already support RSS+UDP (just needs to be
>> enabled) so this works just fine with dumb NICs. In fact, this is one
>> of the main motivations of encapsulating UDP in the first place, to
>> leverage existing RSS and ECMP mechanisms. The more general solution
>> is to use IPv6 flow label (RFC6438). We need HW support to include the
>> flow label into the hash for ECMP and RSS, but once we have that much
>> of the motivation for using UDP goes away and we can get back to just
>> doing GRE/IP, IPIP, MPLS/IP, etc. (hence eliminate overhead and
>> complexity of UDP encap).
>>
>>> Please provide a sketch up for a protocol generic api that can tell
>>> hardware where a inner protocol header starts that supports vxlan,
>>> vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is
>>> starting at that point.
>>>
>> BPF. Implementing protocol generic offloads are not just a HW concern
>> either, adding kernel GRO code for every possible protocol that comes
>> along doesn't scale well. This becomes especially obvious when we
>> consider how to provide offloads for applications protocols. If the
>> kernel provides a programmable framework for the offloads then
>> application protocols, such as QUIC, could use use that without
>> needing to hack the kernel to support the specific protocol (which no
>> one wants!). Application protocol parsing in KCM and some other use
>> cases of BPF have already foreshadowed this, and we are working on a
>> prototype for a BPF programmable engine in the kernel. Presumably,
>> this same model could eventually be applied as the HW API to
>> programmable offload.
>
> Just keying off the last statement there...
>
> I think BPF programs are going to be hard to translate into hardware
> for most devices. The problem is the BPF programs in general lack
> structure. A parse graph would be much more friendly for hardware or
> at minimum the BPF program would need to be a some sort of
> well-structured program so a driver could turn that into a parse graph.
>
This might be relevant:
http://richard.systems/research/pdf/IEEE_HPSR_BPF_OPENFLOW.pdf

> .John
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-02 Thread John Fastabend
[...]

>>
>> I wonder why we need protocol generic offloads? I know there are
>> currently a lot of overlay encapsulation protocols. Are there many more
>> coming?
>>
> Yes, and assume that there are more coming with an unbounded limit
> (for instance I just noticed today that there is a netdev1.1 talk on
> supporting GTP in the kernel). Besides, this problem space not just
> limited to offload of encapsulation protocols, but how to generalize
> offload of any transport, IPv[46], application protocols, protocol
> implemented in user space, security protocols, etc.
> 
>> Besides, this offload is about TSO and RSS and they do need to parse the
>> packet to get the information where the inner header starts. It is not
>> only about checksum offloading.
>>
> RSS does not require the device to parse the inner header. All the UDP
> encapsulations protocols being defined set the source port to entropy
> flow value and most devices already support RSS+UDP (just needs to be
> enabled) so this works just fine with dumb NICs. In fact, this is one
> of the main motivations of encapsulating UDP in the first place, to
> leverage existing RSS and ECMP mechanisms. The more general solution
> is to use IPv6 flow label (RFC6438). We need HW support to include the
> flow label into the hash for ECMP and RSS, but once we have that much
> of the motivation for using UDP goes away and we can get back to just
> doing GRE/IP, IPIP, MPLS/IP, etc. (hence eliminate overhead and
> complexity of UDP encap).
> 
>> Please provide a sketch up for a protocol generic api that can tell
>> hardware where a inner protocol header starts that supports vxlan,
>> vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is
>> starting at that point.
>>
> BPF. Implementing protocol generic offloads are not just a HW concern
> either, adding kernel GRO code for every possible protocol that comes
> along doesn't scale well. This becomes especially obvious when we
> consider how to provide offloads for applications protocols. If the
> kernel provides a programmable framework for the offloads then
> application protocols, such as QUIC, could use use that without
> needing to hack the kernel to support the specific protocol (which no
> one wants!). Application protocol parsing in KCM and some other use
> cases of BPF have already foreshadowed this, and we are working on a
> prototype for a BPF programmable engine in the kernel. Presumably,
> this same model could eventually be applied as the HW API to
> programmable offload.

Just keying off the last statement there...

I think BPF programs are going to be hard to translate into hardware
for most devices. The problem is the BPF programs in general lack
structure. A parse graph would be much more friendly for hardware or
at minimum the BPF program would need to be a some sort of
well-structured program so a driver could turn that into a parse graph.

.John
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-02 Thread Tom Herbert
On Wed, Dec 2, 2015 at 8:35 AM, Hannes Frederic Sowa
 wrote:
> On Wed, Dec 2, 2015, at 04:50, Tom Herbert wrote:
>> On Tue, Dec 1, 2015 at 7:49 AM, Hannes Frederic Sowa
>>  wrote:
>> > On Tue, Dec 1, 2015, at 16:44, John W. Linville wrote:
>> >> On Mon, Nov 30, 2015 at 09:26:51PM -0800, Tom Herbert wrote:
>> >> > On Mon, Nov 30, 2015 at 5:28 PM, Jesse Gross  wrote:
>> >>
>> >> > > Based on what we can do today, I see only two real choices: do some
>> >> > > refactoring to clean up the stack a bit or remove the existing VXLAN
>> >> > > offloading altogether. I think this series is trying to do the former
>> >> > > and the result is that the stack is cleaner after than before. That
>> >> > > seems like a good thing.
>> >> >
>> >> > There is a third choice which is to do nothing. Creating an
>> >> > infrastructure that claims to "Generalize udp based tunnel offload"
>> >> > but actually doesn't generalize the mechanism is nothing more than
>> >> > window dressing-- this does nothing to help with the VXLAN to
>> >> > VXLAN-GPE transition for instance. If geneve specific offload is
>> >> > really needed now then that can be should with another ndo function,
>> >> > or alternatively ntuple filter with a device specific action would at
>> >> > least get the stack out of needing to be concerned with that.
>> >> > Regardless, we will work optimize the rest of the stack for devices
>> >> > that implement protocol agnostic mechanisms.
>> >>
>> >> Is there no concern about NDO proliferation? Does the size of the
>> >> netdev_ops structure matter? Beyond that, I can see how a single
>> >> entry point with an enum specifying the offload type isn't really any
>> >> different in the grand scheme of things than having multiple NDOs,
>> >> one per offload.
>> >>
>> >> Given the need to live with existing hardware offloads, I would lean
>> >> toward a consolidated NDO. But if a different NDO per tunnel type is
>> >> preferred, I can be satisified with that.
>> >
>> > Having per-offloading NDOs helps the stack to gather further information
>> > what kind of offloads the driver has even maybe without trying to call
>> > down into the layer (just by comparing to NULL). Checking this inside
>> > the driver offload function clearly does not have this feature. So we
>> > finally can have "ip tunnel please-recommend-type" feature. :)
>> >
>> That completely misses the whole point of the rest of this thread.
>> Protocol specific offloads are what we are trying to discourage not
>> encourage. Adding any more ndo functions for this purpose should be an
>> exception, not the norm. The bar should be naturally high considering
>> the cost of exposing this to ndo.
>
> Why?
>
> I wonder why we need protocol generic offloads? I know there are
> currently a lot of overlay encapsulation protocols. Are there many more
> coming?
>
Yes, and assume that there are more coming with an unbounded limit
(for instance I just noticed today that there is a netdev1.1 talk on
supporting GTP in the kernel). Besides, this problem space not just
limited to offload of encapsulation protocols, but how to generalize
offload of any transport, IPv[46], application protocols, protocol
implemented in user space, security protocols, etc.

> Besides, this offload is about TSO and RSS and they do need to parse the
> packet to get the information where the inner header starts. It is not
> only about checksum offloading.
>
RSS does not require the device to parse the inner header. All the UDP
encapsulations protocols being defined set the source port to entropy
flow value and most devices already support RSS+UDP (just needs to be
enabled) so this works just fine with dumb NICs. In fact, this is one
of the main motivations of encapsulating UDP in the first place, to
leverage existing RSS and ECMP mechanisms. The more general solution
is to use IPv6 flow label (RFC6438). We need HW support to include the
flow label into the hash for ECMP and RSS, but once we have that much
of the motivation for using UDP goes away and we can get back to just
doing GRE/IP, IPIP, MPLS/IP, etc. (hence eliminate overhead and
complexity of UDP encap).

> Please provide a sketch up for a protocol generic api that can tell
> hardware where a inner protocol header starts that supports vxlan,
> vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is
> starting at that point.
>
BPF. Implementing protocol generic offloads are not just a HW concern
either, adding kernel GRO code for every possible protocol that comes
along doesn't scale well. This becomes especially obvious when we
consider how to provide offloads for applications protocols. If the
kernel provides a programmable framework for the offloads then
application protocols, such as QUIC, could use use that without
needing to hack the kernel to support the specific protocol (which no
one wants!). Application protocol parsing in KCM and some other use
cases of BPF have already foreshadowed this, and we are working on a
prototyp

Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-02 Thread Hannes Frederic Sowa
On Wed, Dec 2, 2015, at 04:50, Tom Herbert wrote:
> On Tue, Dec 1, 2015 at 7:49 AM, Hannes Frederic Sowa
>  wrote:
> > On Tue, Dec 1, 2015, at 16:44, John W. Linville wrote:
> >> On Mon, Nov 30, 2015 at 09:26:51PM -0800, Tom Herbert wrote:
> >> > On Mon, Nov 30, 2015 at 5:28 PM, Jesse Gross  wrote:
> >>
> >> > > Based on what we can do today, I see only two real choices: do some
> >> > > refactoring to clean up the stack a bit or remove the existing VXLAN
> >> > > offloading altogether. I think this series is trying to do the former
> >> > > and the result is that the stack is cleaner after than before. That
> >> > > seems like a good thing.
> >> >
> >> > There is a third choice which is to do nothing. Creating an
> >> > infrastructure that claims to "Generalize udp based tunnel offload"
> >> > but actually doesn't generalize the mechanism is nothing more than
> >> > window dressing-- this does nothing to help with the VXLAN to
> >> > VXLAN-GPE transition for instance. If geneve specific offload is
> >> > really needed now then that can be should with another ndo function,
> >> > or alternatively ntuple filter with a device specific action would at
> >> > least get the stack out of needing to be concerned with that.
> >> > Regardless, we will work optimize the rest of the stack for devices
> >> > that implement protocol agnostic mechanisms.
> >>
> >> Is there no concern about NDO proliferation? Does the size of the
> >> netdev_ops structure matter? Beyond that, I can see how a single
> >> entry point with an enum specifying the offload type isn't really any
> >> different in the grand scheme of things than having multiple NDOs,
> >> one per offload.
> >>
> >> Given the need to live with existing hardware offloads, I would lean
> >> toward a consolidated NDO. But if a different NDO per tunnel type is
> >> preferred, I can be satisified with that.
> >
> > Having per-offloading NDOs helps the stack to gather further information
> > what kind of offloads the driver has even maybe without trying to call
> > down into the layer (just by comparing to NULL). Checking this inside
> > the driver offload function clearly does not have this feature. So we
> > finally can have "ip tunnel please-recommend-type" feature. :)
> >
> That completely misses the whole point of the rest of this thread.
> Protocol specific offloads are what we are trying to discourage not
> encourage. Adding any more ndo functions for this purpose should be an
> exception, not the norm. The bar should be naturally high considering
> the cost of exposing this to ndo.

Why?

I wonder why we need protocol generic offloads? I know there are
currently a lot of overlay encapsulation protocols. Are there many more
coming?

Besides, this offload is about TSO and RSS and they do need to parse the
packet to get the information where the inner header starts. It is not
only about checksum offloading.

If those protocols always carry an option length in the header we
probably could make it a little bit more generic, so the protocol
implementation could sound like:

  "Generic Tunnel Offloading besides protocols with chained options"

Unfortunately IPv6 extension headers are exactly a very good example
were this generic offloading would fail horribly as hardware has to
parse the header chain to reach the final (inner) protocol.

How to deal with the next protocol field in vxlan-gpe in a protocol
agnostic way (whoever came up with this)? (it has a special numbering
based on the ietf draft and I don't see any other way to say a network
card please interpret this field as specified by that rfc and this
is not protocol agnostic at all any more). I don't see how this
technically makes any sense and to implement this protocol agnostic.

Checksums maybe can, rest really does not make sense. Especially for NSH
I currently don't see how this can be done in general.

Please provide a sketch up for a protocol generic api that can tell
hardware where a inner protocol header starts that supports vxlan,
vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is
starting at that point.

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-01 Thread Tom Herbert
On Tue, Dec 1, 2015 at 7:49 AM, Hannes Frederic Sowa
 wrote:
> On Tue, Dec 1, 2015, at 16:44, John W. Linville wrote:
>> On Mon, Nov 30, 2015 at 09:26:51PM -0800, Tom Herbert wrote:
>> > On Mon, Nov 30, 2015 at 5:28 PM, Jesse Gross  wrote:
>>
>> > > Based on what we can do today, I see only two real choices: do some
>> > > refactoring to clean up the stack a bit or remove the existing VXLAN
>> > > offloading altogether. I think this series is trying to do the former
>> > > and the result is that the stack is cleaner after than before. That
>> > > seems like a good thing.
>> >
>> > There is a third choice which is to do nothing. Creating an
>> > infrastructure that claims to "Generalize udp based tunnel offload"
>> > but actually doesn't generalize the mechanism is nothing more than
>> > window dressing-- this does nothing to help with the VXLAN to
>> > VXLAN-GPE transition for instance. If geneve specific offload is
>> > really needed now then that can be should with another ndo function,
>> > or alternatively ntuple filter with a device specific action would at
>> > least get the stack out of needing to be concerned with that.
>> > Regardless, we will work optimize the rest of the stack for devices
>> > that implement protocol agnostic mechanisms.
>>
>> Is there no concern about NDO proliferation? Does the size of the
>> netdev_ops structure matter? Beyond that, I can see how a single
>> entry point with an enum specifying the offload type isn't really any
>> different in the grand scheme of things than having multiple NDOs,
>> one per offload.
>>
>> Given the need to live with existing hardware offloads, I would lean
>> toward a consolidated NDO. But if a different NDO per tunnel type is
>> preferred, I can be satisified with that.
>
> Having per-offloading NDOs helps the stack to gather further information
> what kind of offloads the driver has even maybe without trying to call
> down into the layer (just by comparing to NULL). Checking this inside
> the driver offload function clearly does not have this feature. So we
> finally can have "ip tunnel please-recommend-type" feature. :)
>
That completely misses the whole point of the rest of this thread.
Protocol specific offloads are what we are trying to discourage not
encourage. Adding any more ndo functions for this purpose should be an
exception, not the norm. The bar should be naturally high considering
the cost of exposing this to ndo.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-01 Thread Singhai, Anjali



On 12/1/2015 8:08 AM, John W. Linville wrote:

On Tue, Dec 01, 2015 at 04:49:28PM +0100, Hannes Frederic Sowa wrote:

On Tue, Dec 1, 2015, at 16:44, John W. Linville wrote:

On Mon, Nov 30, 2015 at 09:26:51PM -0800, Tom Herbert wrote:

On Mon, Nov 30, 2015 at 5:28 PM, Jesse Gross  wrote:

Based on what we can do today, I see only two real choices: do some
refactoring to clean up the stack a bit or remove the existing VXLAN
offloading altogether. I think this series is trying to do the former
and the result is that the stack is cleaner after than before. That
seems like a good thing.

There is a third choice which is to do nothing. Creating an
infrastructure that claims to "Generalize udp based tunnel offload"
but actually doesn't generalize the mechanism is nothing more than
window dressing-- this does nothing to help with the VXLAN to
VXLAN-GPE transition for instance. If geneve specific offload is
really needed now then that can be should with another ndo function,
or alternatively ntuple filter with a device specific action would at
least get the stack out of needing to be concerned with that.
Regardless, we will work optimize the rest of the stack for devices
that implement protocol agnostic mechanisms.

Is there no concern about NDO proliferation? Does the size of the
netdev_ops structure matter? Beyond that, I can see how a single
entry point with an enum specifying the offload type isn't really any
different in the grand scheme of things than having multiple NDOs,
one per offload.

Given the need to live with existing hardware offloads, I would lean
toward a consolidated NDO. But if a different NDO per tunnel type is
preferred, I can be satisified with that.

Having per-offloading NDOs helps the stack to gather further information
what kind of offloads the driver has even maybe without trying to call
down into the layer (just by comparing to NULL). Checking this inside
the driver offload function clearly does not have this feature. So we
finally can have "ip tunnel please-recommend-type" feature. :)

That is a valuable insight! Maybe the per-offload NDO isn't such a
bad idea afterall... :-)

John
This helps me understand why having a separate ndo op might still be ok. 
Thanks for the feedback. I will go back to that model.  Also  I think I 
did finally understand the discussion on using a single 2's compliment 
checksum method

for future silicon.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-01 Thread John W. Linville
On Tue, Dec 01, 2015 at 04:49:28PM +0100, Hannes Frederic Sowa wrote:
> On Tue, Dec 1, 2015, at 16:44, John W. Linville wrote:
> > On Mon, Nov 30, 2015 at 09:26:51PM -0800, Tom Herbert wrote:
> > > On Mon, Nov 30, 2015 at 5:28 PM, Jesse Gross  wrote:
> > 
> > > > Based on what we can do today, I see only two real choices: do some
> > > > refactoring to clean up the stack a bit or remove the existing VXLAN
> > > > offloading altogether. I think this series is trying to do the former
> > > > and the result is that the stack is cleaner after than before. That
> > > > seems like a good thing.
> > > 
> > > There is a third choice which is to do nothing. Creating an
> > > infrastructure that claims to "Generalize udp based tunnel offload"
> > > but actually doesn't generalize the mechanism is nothing more than
> > > window dressing-- this does nothing to help with the VXLAN to
> > > VXLAN-GPE transition for instance. If geneve specific offload is
> > > really needed now then that can be should with another ndo function,
> > > or alternatively ntuple filter with a device specific action would at
> > > least get the stack out of needing to be concerned with that.
> > > Regardless, we will work optimize the rest of the stack for devices
> > > that implement protocol agnostic mechanisms.
> > 
> > Is there no concern about NDO proliferation? Does the size of the
> > netdev_ops structure matter? Beyond that, I can see how a single
> > entry point with an enum specifying the offload type isn't really any
> > different in the grand scheme of things than having multiple NDOs,
> > one per offload.
> > 
> > Given the need to live with existing hardware offloads, I would lean
> > toward a consolidated NDO. But if a different NDO per tunnel type is
> > preferred, I can be satisified with that.
> 
> Having per-offloading NDOs helps the stack to gather further information
> what kind of offloads the driver has even maybe without trying to call
> down into the layer (just by comparing to NULL). Checking this inside
> the driver offload function clearly does not have this feature. So we
> finally can have "ip tunnel please-recommend-type" feature. :)

That is a valuable insight! Maybe the per-offload NDO isn't such a
bad idea afterall... :-)

John
-- 
John W. LinvilleSomeday the world will need a hero, and you
linvi...@tuxdriver.com  might be all we have.  Be ready.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-01 Thread Hannes Frederic Sowa
On Tue, Dec 1, 2015, at 16:44, John W. Linville wrote:
> On Mon, Nov 30, 2015 at 09:26:51PM -0800, Tom Herbert wrote:
> > On Mon, Nov 30, 2015 at 5:28 PM, Jesse Gross  wrote:
> 
> > > Based on what we can do today, I see only two real choices: do some
> > > refactoring to clean up the stack a bit or remove the existing VXLAN
> > > offloading altogether. I think this series is trying to do the former
> > > and the result is that the stack is cleaner after than before. That
> > > seems like a good thing.
> > 
> > There is a third choice which is to do nothing. Creating an
> > infrastructure that claims to "Generalize udp based tunnel offload"
> > but actually doesn't generalize the mechanism is nothing more than
> > window dressing-- this does nothing to help with the VXLAN to
> > VXLAN-GPE transition for instance. If geneve specific offload is
> > really needed now then that can be should with another ndo function,
> > or alternatively ntuple filter with a device specific action would at
> > least get the stack out of needing to be concerned with that.
> > Regardless, we will work optimize the rest of the stack for devices
> > that implement protocol agnostic mechanisms.
> 
> Is there no concern about NDO proliferation? Does the size of the
> netdev_ops structure matter? Beyond that, I can see how a single
> entry point with an enum specifying the offload type isn't really any
> different in the grand scheme of things than having multiple NDOs,
> one per offload.
> 
> Given the need to live with existing hardware offloads, I would lean
> toward a consolidated NDO. But if a different NDO per tunnel type is
> preferred, I can be satisified with that.

Having per-offloading NDOs helps the stack to gather further information
what kind of offloads the driver has even maybe without trying to call
down into the layer (just by comparing to NULL). Checking this inside
the driver offload function clearly does not have this feature. So we
finally can have "ip tunnel please-recommend-type" feature. :)

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-01 Thread John W. Linville
On Mon, Nov 30, 2015 at 09:26:51PM -0800, Tom Herbert wrote:
> On Mon, Nov 30, 2015 at 5:28 PM, Jesse Gross  wrote:

> > Based on what we can do today, I see only two real choices: do some
> > refactoring to clean up the stack a bit or remove the existing VXLAN
> > offloading altogether. I think this series is trying to do the former
> > and the result is that the stack is cleaner after than before. That
> > seems like a good thing.
> 
> There is a third choice which is to do nothing. Creating an
> infrastructure that claims to "Generalize udp based tunnel offload"
> but actually doesn't generalize the mechanism is nothing more than
> window dressing-- this does nothing to help with the VXLAN to
> VXLAN-GPE transition for instance. If geneve specific offload is
> really needed now then that can be should with another ndo function,
> or alternatively ntuple filter with a device specific action would at
> least get the stack out of needing to be concerned with that.
> Regardless, we will work optimize the rest of the stack for devices
> that implement protocol agnostic mechanisms.

Is there no concern about NDO proliferation? Does the size of the
netdev_ops structure matter? Beyond that, I can see how a single
entry point with an enum specifying the offload type isn't really any
different in the grand scheme of things than having multiple NDOs,
one per offload.

Given the need to live with existing hardware offloads, I would lean
toward a consolidated NDO. But if a different NDO per tunnel type is
preferred, I can be satisified with that.

John
-- 
John W. LinvilleSomeday the world will need a hero, and you
linvi...@tuxdriver.com  might be all we have.  Be ready.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-30 Thread Alexander Duyck
On Mon, Nov 30, 2015 at 7:48 PM, David Miller  wrote:
> From: "Singhai, Anjali" 
> Date: Mon, 30 Nov 2015 21:42:37 +
>
>> The reason for receive being different than transmit is, on TX side
>> driver can provide the meta data for where the checksum field is and
>> what is the length that needs to be check summed to the HW on a per
>> packet basis. On Rx the HW parser has to parse the packet to
>> identify the tunnel type and based on that figure out the checksum
>> locations and length in the packet, so definitely HW has to parse
>> the packet and it can parse only based on next header type
>> information or in case of udp tunnels based on udp port mapping to a
>> particular protocol. I am not sure why you say it doesn't need to
>> parse the packet, maybe I am miss- understanding something.
>> Although it's not difficult to reduce protocol ossification on the
>> RX side but it is certainly different and particularly in case of
>> udp-tunnels it needs the port to protocol mapping.
>
> You're just proving more and more why doing anything other than 2's
> complement checksum provision in the RX descriptor is stupid.
>
> Let me know when you guys enter this century.
>
> I'll tell you right now that your arguments are akin to trying to
> climb up a wall which is vertical.  I can assure you that you will not
> reach your destination, so save your self some scratching and clawing
> and accept reality.
>
> Doing anything other than providing 2's complement checksums in the RX
> descriptor doesn't work.  We know this.

While I fully agree that the 2's compliment is the way to go we still
have to deal with all of the legacy hardware out there.  In addition
while the 2's compliment bit works for the checksums what are vendors
expected to do about other offloads that will need to parse inner
headers such as RSS, LRO, or network flow classificiation?  The
problem is that at some point the hardware does need to know how to
parse the tunnel headers for the purposes of doing offloads besides
checksum.

> So we will not add to our core architecture and frameworks anything
> that directly facilitates designs which we know are suboptimal.  And
> protocol specific support for tunnel offloading is suboptimal and not
> the way forward.
>
> I completly agree with Tom, his goals, his vision, and his priorities
> when it comes to handling this stuff.  Don't fight it.

I have to disagree here.  I really feel that going beyond
check-summing what Tom has proposed might be a step backwards.

We end up needing some sort of mechanism for identifying what the
tunnel frames will look like when doing any sort of Rx parsing.  Just
applying a generic offload for everything only really works on things
that are truly generic such as the 2's compliment checksum.  Tom had
mentioned possibly using something like the ntuple/nfc interface.  I
would argue that is kind of the direction this is going in but with a
few flaws.   For example, the notifier probably should pass a pointer
to the udp_port_cfg structure instead of just the family and port.
This way if a given UDP tunnel has IP endpoints it would be possible
to setup an ntuple/nfc style filter rule to only offload that tunnel
instead generically just basing things off of the port and family.

In addition it might be worthwhile to add a type field similar to what
is already in the fou_cfg block to the udp_port_cfg.  It could
probably just be moved from fou_cfg into udp_port_cfg for use by the
other tunnels, you might even move the protocol field while you are at
it.  Then they could be treated as a type of "action" indication to
the drivers that the given attributes will be associated with this
type of tunnel.  I would want to make sure that we give each unique
tunnel type its own value for the "action" so that if for example in
the future someone decided to offload fou, gue, l2tp, or whatever in
hardware they would have a way of identifying the incoming frames as
such and parse the inner headers.

Anyway that is my $0.02 on this.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-30 Thread Tom Herbert
On Mon, Nov 30, 2015 at 5:28 PM, Jesse Gross  wrote:
> On Mon, Nov 30, 2015 at 5:02 PM, Tom Herbert  wrote:
>> On Mon, Nov 30, 2015 at 4:25 PM, Jesse Gross  wrote:
>>> On Sun, Nov 29, 2015 at 7:21 PM, David Miller  wrote:
 From: Tom Herbert 
 Date: Mon, 23 Nov 2015 13:53:44 -0800

> The bad effect of this model is that it is encourages HW vendors to
> continue implement HW protocol specific support for encapsulations, we
> get so much more benefit if they implement protocol generic
> mechanisms.

 +1
>>>
>>> Regardless of what happens in the future, I think the main question is
>>> how this relates to the code that is currently present in the tree. We
>>> already have NDOs for VXLAN offloading, which is about as protocol
>>> specific as you can get. In my mind, this series is strictly an
>>> improvement to what is already there - it pulls all hardware
>>> offloading code out of the various protocol implementations and VXLAN
>>> out of the driver interface. That seems like a pretty nice cleanup to
>>> me.
>>
>> Jesse,
>>
>> I don't think VXLAN is a good role model here. Consider that Cisco now
>> is basically trying to obsolete VXLAN in favor of VXLAN-GPE. VXLAN-GPE
>> is not compatible with VXLAN, so in order to get the same HW offloads
>> talking VXLAN-GPE users will probably need to swap out their HW. If I
>> am misreading this situation let me know, but to me this doesn't sound
>> like a model the stack should endorse.
>
> The point that I was trying to make is that we already have VXLAN
> offloading in the stack and it exists there today. The way that it is
> implemented is about as protocol specific as you can get - protocols
> need to know about offloads and drivers need to know about protocols.
> I would like to find a way to at least make the code cleaner while we
> wait for hardware to evolve.
>
> Based on what we can do today, I see only two real choices: do some
> refactoring to clean up the stack a bit or remove the existing VXLAN
> offloading altogether. I think this series is trying to do the former
> and the result is that the stack is cleaner after than before. That
> seems like a good thing.

There is a third choice which is to do nothing. Creating an
infrastructure that claims to "Generalize udp based tunnel offload"
but actually doesn't generalize the mechanism is nothing more than
window dressing-- this does nothing to help with the VXLAN to
VXLAN-GPE transition for instance. If geneve specific offload is
really needed now then that can be should with another ndo function,
or alternatively ntuple filter with a device specific action would at
least get the stack out of needing to be concerned with that.
Regardless, we will work optimize the rest of the stack for devices
that implement protocol agnostic mechanisms.

Tom
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-30 Thread David Miller

Please learn how to properly quote people and respond to list postings.

The material from Tom you are quoting looks like it is something you
are writing.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-30 Thread David Miller
From: Tom Herbert 
Date: Mon, 30 Nov 2015 13:48:36 -0800

> Please look at how CHECKSUM_COMPLETE interface works. Description is
> in sk_buff.h or
> http://people.netfilter.org/pablo/netdev0.1/papers/UDP-Encapsulation-in-Linux.pdf.

+1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-30 Thread David Miller
From: "Singhai, Anjali" 
Date: Mon, 30 Nov 2015 21:42:37 +

> The reason for receive being different than transmit is, on TX side
> driver can provide the meta data for where the checksum field is and
> what is the length that needs to be check summed to the HW on a per
> packet basis. On Rx the HW parser has to parse the packet to
> identify the tunnel type and based on that figure out the checksum
> locations and length in the packet, so definitely HW has to parse
> the packet and it can parse only based on next header type
> information or in case of udp tunnels based on udp port mapping to a
> particular protocol. I am not sure why you say it doesn't need to
> parse the packet, maybe I am miss- understanding something.
> Although it's not difficult to reduce protocol ossification on the
> RX side but it is certainly different and particularly in case of
> udp-tunnels it needs the port to protocol mapping.

You're just proving more and more why doing anything other than 2's
complement checksum provision in the RX descriptor is stupid.

Let me know when you guys enter this century.

I'll tell you right now that your arguments are akin to trying to
climb up a wall which is vertical.  I can assure you that you will not
reach your destination, so save your self some scratching and clawing
and accept reality.

Doing anything other than providing 2's complement checksums in the RX
descriptor doesn't work.  We know this.

So we will not add to our core architecture and frameworks anything
that directly facilitates designs which we know are suboptimal.  And
protocol specific support for tunnel offloading is suboptimal and not
the way forward.

I completly agree with Tom, his goals, his vision, and his priorities
when it comes to handling this stuff.  Don't fight it.

You also need to learn how to properly reply to list postings, it
looks terrible, as if you're replying to Tom then adding in my comment
which is what you are actually repling to.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-30 Thread Jesse Gross
On Mon, Nov 30, 2015 at 5:02 PM, Tom Herbert  wrote:
> On Mon, Nov 30, 2015 at 4:25 PM, Jesse Gross  wrote:
>> On Sun, Nov 29, 2015 at 7:21 PM, David Miller  wrote:
>>> From: Tom Herbert 
>>> Date: Mon, 23 Nov 2015 13:53:44 -0800
>>>
 The bad effect of this model is that it is encourages HW vendors to
 continue implement HW protocol specific support for encapsulations, we
 get so much more benefit if they implement protocol generic
 mechanisms.
>>>
>>> +1
>>
>> Regardless of what happens in the future, I think the main question is
>> how this relates to the code that is currently present in the tree. We
>> already have NDOs for VXLAN offloading, which is about as protocol
>> specific as you can get. In my mind, this series is strictly an
>> improvement to what is already there - it pulls all hardware
>> offloading code out of the various protocol implementations and VXLAN
>> out of the driver interface. That seems like a pretty nice cleanup to
>> me.
>
> Jesse,
>
> I don't think VXLAN is a good role model here. Consider that Cisco now
> is basically trying to obsolete VXLAN in favor of VXLAN-GPE. VXLAN-GPE
> is not compatible with VXLAN, so in order to get the same HW offloads
> talking VXLAN-GPE users will probably need to swap out their HW. If I
> am misreading this situation let me know, but to me this doesn't sound
> like a model the stack should endorse.

The point that I was trying to make is that we already have VXLAN
offloading in the stack and it exists there today. The way that it is
implemented is about as protocol specific as you can get - protocols
need to know about offloads and drivers need to know about protocols.
I would like to find a way to at least make the code cleaner while we
wait for hardware to evolve.

Based on what we can do today, I see only two real choices: do some
refactoring to clean up the stack a bit or remove the existing VXLAN
offloading altogether. I think this series is trying to do the former
and the result is that the stack is cleaner after than before. That
seems like a good thing.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-30 Thread Tom Herbert
On Mon, Nov 30, 2015 at 4:25 PM, Jesse Gross  wrote:
> On Sun, Nov 29, 2015 at 7:21 PM, David Miller  wrote:
>> From: Tom Herbert 
>> Date: Mon, 23 Nov 2015 13:53:44 -0800
>>
>>> The bad effect of this model is that it is encourages HW vendors to
>>> continue implement HW protocol specific support for encapsulations, we
>>> get so much more benefit if they implement protocol generic
>>> mechanisms.
>>
>> +1
>
> Regardless of what happens in the future, I think the main question is
> how this relates to the code that is currently present in the tree. We
> already have NDOs for VXLAN offloading, which is about as protocol
> specific as you can get. In my mind, this series is strictly an
> improvement to what is already there - it pulls all hardware
> offloading code out of the various protocol implementations and VXLAN
> out of the driver interface. That seems like a pretty nice cleanup to
> me.

Jesse,

I don't think VXLAN is a good role model here. Consider that Cisco now
is basically trying to obsolete VXLAN in favor of VXLAN-GPE. VXLAN-GPE
is not compatible with VXLAN, so in order to get the same HW offloads
talking VXLAN-GPE users will probably need to swap out their HW. If I
am misreading this situation let me know, but to me this doesn't sound
like a model the stack should endorse.

Tom
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-30 Thread Jesse Gross
On Sun, Nov 29, 2015 at 7:21 PM, David Miller  wrote:
> From: Tom Herbert 
> Date: Mon, 23 Nov 2015 13:53:44 -0800
>
>> The bad effect of this model is that it is encourages HW vendors to
>> continue implement HW protocol specific support for encapsulations, we
>> get so much more benefit if they implement protocol generic
>> mechanisms.
>
> +1

Regardless of what happens in the future, I think the main question is
how this relates to the code that is currently present in the tree. We
already have NDOs for VXLAN offloading, which is about as protocol
specific as you can get. In my mind, this series is strictly an
improvement to what is already there - it pulls all hardware
offloading code out of the various protocol implementations and VXLAN
out of the driver interface. That seems like a pretty nice cleanup to
me.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-30 Thread Singhai, Anjali


-Original Message-
From: Tom Herbert [mailto:t...@herbertland.com] 
Sent: Monday, November 30, 2015 8:36 AM
To: Singhai, Anjali 
Cc: Linux Kernel Network Developers ; Jesse Gross 
; Patil, Kiran 
Subject: Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

On Mon, Nov 23, 2015 at 1:02 PM, Anjali Singhai Jain  
wrote:
> Replace add/del ndo ops for vxlan_port with tunnel_port so that all 
> UDP based tunnels can use the same ndo op. Add a parameter to pass 
> tunnel type to the ndo_op.
>
Please consider using RX ntuple filters for this instead of a new ndo op. The 
vxlan ndo op essentailly implements a limited filter with a rule to match a 
destination UDP port and the the action of processing the packet as vxlan. 
ntuple filters generalizes that so that the filtering becomes arbitrary. We'll 
need the ability to filter on 4-tuple when we implement tunnels to go through 
firewalls or for offloading other UDP protocols such SPUD or QUIC.

Tom

- Tom I am not sure I agree with this suggestion. The easiest way to let the 
hardware know about port to protocol mapping in case of udp-based tunnels is 
when we add udp offloads for the ports aka gro etc in the stack. This way the 
user gets benefit of tunnel offloads from the HWs that support it without 
having to do any extra filter setups from ethtool. Just like ip/tcp/udp 
checksum and TSO support, the user does not have to turn this ON specifically 
if they plan to use those protocols (of course they can turn it off). Besides 
these are not true filters in that sense, they are not used to guide packets to 
any particular destination in this case, rather used to identify packets for 
checksum and TSO purpose.
And I agree with your patch series that reduces protocol ossification of the 
stack and driver interface. My point is this set of patches help with that goal 
and not really hurt because any new tunnel support would mean no change in the 
interface and just a new type in the enum and then the drivers can decide to do 
the magic setup in the HW in their driver based on this new type without ever 
having to touch the interface. So try to explain to me why this is causing 
protocol ossification because I don't believe so. And I think the ntupe 
interface should remain for the purpose of filters which are used to route 
packet or drop them. Not for packet identification and checksum offload support.

> Change all drivers to use the generalized udp tunnel offload
>
> Patch was compile tested with x86_64_defconfig.
>
> Signed-off-by: Kiran Patil 
> Signed-off-by: Anjali Singhai Jain 
> ---
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 15 ++---
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c| 13 +---
>  drivers/net/ethernet/emulex/benet/be_main.c  | 14 +---
>  drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  | 27 
>  drivers/net/ethernet/intel/i40e/i40e_main.c  | 41 
> +---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c| 17 +++---
>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c   | 21 
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 17 +++---
>  drivers/net/vxlan.c  | 23 +++--
>  include/linux/netdevice.h| 34 ++--
>  include/net/udp_tunnel.h |  6 
>  11 files changed, 157 insertions(+), 71 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
> b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> index 2273576..ad2782f 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> @@ -47,6 +47,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -10124,11 +10125,14 @@ static void __bnx2x_add_vxlan_port(struct 
> bnx2x *bp, u16 port)  }
>
>  static void bnx2x_add_vxlan_port(struct net_device *netdev,
> -sa_family_t sa_family, __be16 port)
> +sa_family_t sa_family, __be16 port,
> +u32 type)
>  {
> struct bnx2x *bp = netdev_priv(netdev);
> u16 t_port = ntohs(port);
>
> +   if (type != UDP_TUNNEL_VXLAN)
> +   return;
> __bnx2x_add_vxlan_port(bp, t_port);  }
>
> @@ -10152,11 +10156,14 @@ static void __bnx2x_del_vxlan_port(struct 
> bnx2x *bp, u16 port)  }
>
>  static void bnx2x_del_vxlan_port(struct net_device *netdev,
> -sa_family_t sa_family, __be16 port)
> +sa_family_t sa_family, __be16 port,
> +u32 type)
>  {
> struct bnx2x *bp = netdev_priv(netdev);
> u16 t_port = 

Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-30 Thread Tom Herbert
On Mon, Nov 30, 2015 at 1:42 PM, Singhai, Anjali
 wrote:
>
>
> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Sunday, November 29, 2015 7:23 PM
> To: t...@herbertland.com
> Cc: Brandeburg, Jesse ; Singhai, Anjali 
> ; je...@kernel.org; netdev@vger.kernel.org; Patil, 
> Kiran 
> Subject: Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
>
> From: Tom Herbert 
> Date: Tue, 24 Nov 2015 09:32:11 -0800
>
>>>
>>> FWIW, I've brought the issue to the attention of the architects here,
>>> and we will likely be able to make changes in this space.  Intel
>>> hardware (as demonstrated by your patches) already is able to deal
>>> with this de-ossification on transmit.  Receive is a whole different beast.
>>>
>> Please provide the specifics on why "Receive is a whole different
>> beast.". Generic receive checksum is already a subset of the
>> functionality that you must have implement to support the protocol
>> specific offloads. All the hardware needs to do is calculate the 1's
>> complement checksum of the packet and return the value on the to the
>> host with that packet. That's it. No parsing of headers, no worrying
>> about the pseudo header, no dealing with any encapsulation. Just do
>> the calculation, return the result to the host and the driver converts
>> this to CHECKSUM_COMPLETE. I find it very hard to believe that this is
>> any harder than specific support the next protocol du jour.
>
> The reason for receive being different than transmit is, on TX side driver 
> can provide the meta data for where the checksum field is and what is the 
> length that needs to be check summed to the HW on a per packet basis. On Rx 
> the HW parser has to parse the packet to identify the tunnel type and based 
> on that figure out the checksum locations and length in the packet, so 
> definitely HW has to parse the packet and it can parse only based on next 
> header type information or in case of udp tunnels based on udp port mapping 
> to a particular protocol. I am not sure why you say it doesn't need to parse 
> the packet, maybe I am miss- understanding something.  Although it's not 
> difficult to reduce protocol ossification on the RX side but it is certainly 
> different and particularly in case of udp-tunnels it needs the port to 
> protocol mapping.
>
Please look at how CHECKSUM_COMPLETE interface works. Description is
in sk_buff.h or
http://people.netfilter.org/pablo/netdev0.1/papers/UDP-Encapsulation-in-Linux.pdf.

Thanks,
Tom
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-30 Thread Singhai, Anjali


-Original Message-
From: David Miller [mailto:da...@davemloft.net] 
Sent: Sunday, November 29, 2015 7:23 PM
To: t...@herbertland.com
Cc: Brandeburg, Jesse ; Singhai, Anjali 
; je...@kernel.org; netdev@vger.kernel.org; Patil, 
Kiran 
Subject: Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

From: Tom Herbert 
Date: Tue, 24 Nov 2015 09:32:11 -0800

>>
>> FWIW, I've brought the issue to the attention of the architects here, 
>> and we will likely be able to make changes in this space.  Intel 
>> hardware (as demonstrated by your patches) already is able to deal 
>> with this de-ossification on transmit.  Receive is a whole different beast.
>>
> Please provide the specifics on why "Receive is a whole different 
> beast.". Generic receive checksum is already a subset of the 
> functionality that you must have implement to support the protocol 
> specific offloads. All the hardware needs to do is calculate the 1's 
> complement checksum of the packet and return the value on the to the 
> host with that packet. That's it. No parsing of headers, no worrying 
> about the pseudo header, no dealing with any encapsulation. Just do 
> the calculation, return the result to the host and the driver converts 
> this to CHECKSUM_COMPLETE. I find it very hard to believe that this is 
> any harder than specific support the next protocol du jour.

The reason for receive being different than transmit is, on TX side driver can 
provide the meta data for where the checksum field is and what is the length 
that needs to be check summed to the HW on a per packet basis. On Rx the HW 
parser has to parse the packet to identify the tunnel type and based on that 
figure out the checksum locations and length in the packet, so definitely HW 
has to parse the packet and it can parse only based on next header type 
information or in case of udp tunnels based on udp port mapping to a particular 
protocol. I am not sure why you say it doesn't need to parse the packet, maybe 
I am miss- understanding something.  Although it's not difficult to reduce 
protocol ossification on the RX side but it is certainly different and 
particularly in case of udp-tunnels it needs the port to protocol mapping.

+1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-30 Thread Singhai, Anjali


-Original Message-
From: David Miller [mailto:da...@davemloft.net] 
Sent: Sunday, November 29, 2015 7:22 PM
To: t...@herbertland.com
Cc: Singhai, Anjali ; netdev@vger.kernel.org; 
je...@kernel.org; Patil, Kiran 
Subject: Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

From: Tom Herbert 
Date: Mon, 23 Nov 2015 13:53:44 -0800

> The bad effect of this model is that it is encourages HW vendors to 
> continue implement HW protocol specific support for encapsulations, we 
> get so much more benefit if they implement protocol generic 
> mechanisms.
Dave, at least Intel parts have a protocol generic model for tunneled packet 
offloads and hence we are able to extend our support to newer tunnel types. We 
do  not have protocol specific support in the HW, but since the udp based 
tunnels do not have a packet type for the tunnel header, the HW needs to know 
which udp port should be mapped to which specific encapsulation. Otherwise 
encapsulated types like NVGRE we can identify through packet type and program 
the HW to account for the header. The newer patches for sure reduce the 
protocol ossification since in communalizes all the different tunnels into one 
interface so that any further support to a newer udp tunnel type requires just 
a type definition and if the driver/HW can support it, minor driver changes to 
set the right bits for HW. No interface change for sure. And I think that is 
definitely a step in the right direction.

+1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-30 Thread Tom Herbert
On Mon, Nov 23, 2015 at 1:02 PM, Anjali Singhai Jain
 wrote:
> Replace add/del ndo ops for vxlan_port with tunnel_port so that all UDP
> based tunnels can use the same ndo op. Add a parameter to pass tunnel
> type to the ndo_op.
>
Please consider using RX ntuple filters for this instead of a new ndo
op. The vxlan ndo op essentailly implements a limited filter with a
rule to match a destination UDP port and the the action of processing
the packet as vxlan. ntuple filters generalizes that so that the
filtering becomes arbitrary. We'll need the ability to filter on
4-tuple when we implement tunnels to go through firewalls or for
offloading other UDP protocols such SPUD or QUIC.

Tom

> Change all drivers to use the generalized udp tunnel offload
>
> Patch was compile tested with x86_64_defconfig.
>
> Signed-off-by: Kiran Patil 
> Signed-off-by: Anjali Singhai Jain 
> ---
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 15 ++---
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c| 13 +---
>  drivers/net/ethernet/emulex/benet/be_main.c  | 14 +---
>  drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  | 27 
>  drivers/net/ethernet/intel/i40e/i40e_main.c  | 41 
> +---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c| 17 +++---
>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c   | 21 
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 17 +++---
>  drivers/net/vxlan.c  | 23 +++--
>  include/linux/netdevice.h| 34 ++--
>  include/net/udp_tunnel.h |  6 
>  11 files changed, 157 insertions(+), 71 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
> b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> index 2273576..ad2782f 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> @@ -47,6 +47,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -10124,11 +10125,14 @@ static void __bnx2x_add_vxlan_port(struct bnx2x 
> *bp, u16 port)
>  }
>
>  static void bnx2x_add_vxlan_port(struct net_device *netdev,
> -sa_family_t sa_family, __be16 port)
> +sa_family_t sa_family, __be16 port,
> +u32 type)
>  {
> struct bnx2x *bp = netdev_priv(netdev);
> u16 t_port = ntohs(port);
>
> +   if (type != UDP_TUNNEL_VXLAN)
> +   return;
> __bnx2x_add_vxlan_port(bp, t_port);
>  }
>
> @@ -10152,11 +10156,14 @@ static void __bnx2x_del_vxlan_port(struct bnx2x 
> *bp, u16 port)
>  }
>
>  static void bnx2x_del_vxlan_port(struct net_device *netdev,
> -sa_family_t sa_family, __be16 port)
> +sa_family_t sa_family, __be16 port,
> +u32 type)
>  {
> struct bnx2x *bp = netdev_priv(netdev);
> u16 t_port = ntohs(port);
>
> +   if (type != UDP_TUNNEL_VXLAN)
> +   return;
> __bnx2x_del_vxlan_port(bp, t_port);
>  }
>  #endif
> @@ -13008,8 +13015,8 @@ static const struct net_device_ops bnx2x_netdev_ops = 
> {
> .ndo_set_vf_link_state  = bnx2x_set_vf_link_state,
> .ndo_features_check = bnx2x_features_check,
>  #ifdef CONFIG_BNX2X_VXLAN
> -   .ndo_add_vxlan_port = bnx2x_add_vxlan_port,
> -   .ndo_del_vxlan_port = bnx2x_del_vxlan_port,
> +   .ndo_add_udp_tunnel_port= bnx2x_add_vxlan_port,
> +   .ndo_del_udp_tunnel_port= bnx2x_del_vxlan_port,
>  #endif
>  };
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
> b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index f2d0dc9..5b96ddf 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -5421,7 +5421,7 @@ static void bnxt_cfg_ntp_filters(struct bnxt *bp)
>  #endif /* CONFIG_RFS_ACCEL */
>
>  static void bnxt_add_vxlan_port(struct net_device *dev, sa_family_t 
> sa_family,
> -   __be16 port)
> +   __be16 port, u32 type)
>  {
> struct bnxt *bp = netdev_priv(dev);
>
> @@ -5431,6 +5431,9 @@ static void bnxt_add_vxlan_port(struct net_device *dev, 
> sa_family_t sa_family,
> if (sa_family != AF_INET6 && sa_family != AF_INET)
> return;
>
> +   if (type != UDP_TUNNEL_VXLAN)
> +   return;
> +
> if (bp->vxlan_port_cnt && bp->vxlan_port != port)
> return;
>
> @@ -5443,7 +5446,7 @@ static void bnxt_add_vxlan_port(struct net_device *dev, 
> sa_family_t sa_family,
>  }
>
>  static void bnxt_del_vxlan_port(struct net_device *dev, sa_family_t 
> sa_family,
> -   __be16 port)
> +   __be16 port, u32 type)
>  {
> struct bnxt *bp = ne

Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-29 Thread David Miller
From: Tom Herbert 
Date: Tue, 24 Nov 2015 09:32:11 -0800

>>
>> FWIW, I've brought the issue to the attention of the architects here,
>> and we will likely be able to make changes in this space.  Intel
>> hardware (as demonstrated by your patches) already is able to deal with
>> this de-ossification on transmit.  Receive is a whole different beast.
>>
> Please provide the specifics on why "Receive is a whole different
> beast.". Generic receive checksum is already a subset of the
> functionality that you must have implement to support the protocol
> specific offloads. All the hardware needs to do is calculate the 1's
> complement checksum of the packet and return the value on the to the
> host with that packet. That's it. No parsing of headers, no worrying
> about the pseudo header, no dealing with any encapsulation. Just do
> the calculation, return the result to the host and the driver converts
> this to CHECKSUM_COMPLETE. I find it very hard to believe that this is
> any harder than specific support the next protocol du jour.

+1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-29 Thread David Miller
From: Tom Herbert 
Date: Mon, 23 Nov 2015 13:53:44 -0800

> The bad effect of this model is that it is encourages HW vendors to
> continue implement HW protocol specific support for encapsulations, we
> get so much more benefit if they implement protocol generic
> mechanisms.

+1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-24 Thread Tom Herbert
On Tue, Nov 24, 2015 at 10:37 AM, David Miller  wrote:
> From: Hannes Frederic Sowa 
> Date: Tue, 24 Nov 2015 18:43:35 +0100
>
>> On Tue, Nov 24, 2015, at 18:32, Tom Herbert wrote:
>>> As you said this in only feedback and nobody is forcing anyone to do
>>> anything. But encouraging HW vendors to provide generic mechanisms so
>>> that your users can use whatever protocol they want is the exact
>>> _opposite_ of punishing users, this is very much a pro-user direction.
>>
>> Some users will suffer worse performance in case we don't correctly set
>> ip_summed for a specific protocol before we do the copy operations from
>> user space into skbs but if they are always done in the driver.
>
> Your concern presumes that looking backwards is as important as looking
> forward.
>
> We want to simplify things _and_ move away from protocol specific
> csums, and if some old crufty hardware based systems pay some performance
> cost for this I say so be it.
>
> So this is not a valid argument against Tom's changes in my mind.
>
And for that matter these arguments have nothing to do with these UDP
encapsulation patches at all, they seem to be directed to the patches
to eliminate NETIF_F_IP{V6}_CSUM so please post on that thread.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-24 Thread Hannes Frederic Sowa
On Tue, Nov 24, 2015, at 19:37, David Miller wrote:
> From: Hannes Frederic Sowa 
> Date: Tue, 24 Nov 2015 18:43:35 +0100
> 
> > On Tue, Nov 24, 2015, at 18:32, Tom Herbert wrote:
> >> As you said this in only feedback and nobody is forcing anyone to do
> >> anything. But encouraging HW vendors to provide generic mechanisms so
> >> that your users can use whatever protocol they want is the exact
> >> _opposite_ of punishing users, this is very much a pro-user direction.
> > 
> > Some users will suffer worse performance in case we don't correctly set
> > ip_summed for a specific protocol before we do the copy operations from
> > user space into skbs but if they are always done in the driver.
> 
> Your concern presumes that looking backwards is as important as looking
> forward.
> 
> We want to simplify things _and_ move away from protocol specific
> csums, and if some old crufty hardware based systems pay some performance
> cost for this I say so be it.
> 
> So this is not a valid argument against Tom's changes in my mind.

I agree with you and we should move forward with this. It is just
something to keep in mind IMHO. Otherwise maintenance of those
additional bits did not hurt a lot IMHO.

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-24 Thread David Miller
From: Hannes Frederic Sowa 
Date: Tue, 24 Nov 2015 18:43:35 +0100

> On Tue, Nov 24, 2015, at 18:32, Tom Herbert wrote:
>> As you said this in only feedback and nobody is forcing anyone to do
>> anything. But encouraging HW vendors to provide generic mechanisms so
>> that your users can use whatever protocol they want is the exact
>> _opposite_ of punishing users, this is very much a pro-user direction.
> 
> Some users will suffer worse performance in case we don't correctly set
> ip_summed for a specific protocol before we do the copy operations from
> user space into skbs but if they are always done in the driver.

Your concern presumes that looking backwards is as important as looking
forward.

We want to simplify things _and_ move away from protocol specific
csums, and if some old crufty hardware based systems pay some performance
cost for this I say so be it.

So this is not a valid argument against Tom's changes in my mind.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-24 Thread Hannes Frederic Sowa


On Tue, Nov 24, 2015, at 18:52, Tom Herbert wrote:
> On Tue, Nov 24, 2015 at 9:43 AM, Hannes Frederic Sowa
>  wrote:
> > On Tue, Nov 24, 2015, at 18:32, Tom Herbert wrote:
> >> As you said this in only feedback and nobody is forcing anyone to do
> >> anything. But encouraging HW vendors to provide generic mechanisms so
> >> that your users can use whatever protocol they want is the exact
> >> _opposite_ of punishing users, this is very much a pro-user direction.
> >
> > Some users will suffer worse performance in case we don't correctly set
> > ip_summed for a specific protocol before we do the copy operations from
> > user space into skbs but if they are always done in the driver.
> >
> Please be specific. Who are the users, what is exact performance
> regression, what are specific protocols in question?

Depending on ip_summed after a connect on a socket and having the
correct dst_entry we either copy data from user space and calculate the
checksum concurrently or let the hardware do that
(csum_partial_from_user).

In case someone owns a IPv4 capable but not IPv6 capable checksum
offloading NIC they will end up in the slow path in the driver either
for IPv4 xor IPv6 because out of one we could not set the correct
ip_summed mode before copy to the kernel.

Data during transmit seems to be cold already, so a proper checksum
while we touch the data anyway would be favorable.

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-24 Thread Tom Herbert
On Tue, Nov 24, 2015 at 9:43 AM, Hannes Frederic Sowa
 wrote:
> On Tue, Nov 24, 2015, at 18:32, Tom Herbert wrote:
>> As you said this in only feedback and nobody is forcing anyone to do
>> anything. But encouraging HW vendors to provide generic mechanisms so
>> that your users can use whatever protocol they want is the exact
>> _opposite_ of punishing users, this is very much a pro-user direction.
>
> Some users will suffer worse performance in case we don't correctly set
> ip_summed for a specific protocol before we do the copy operations from
> user space into skbs but if they are always done in the driver.
>
Please be specific. Who are the users, what is exact performance
regression, what are specific protocols in question?

> Bye,
> Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-24 Thread Hannes Frederic Sowa
On Tue, Nov 24, 2015, at 18:32, Tom Herbert wrote:
> As you said this in only feedback and nobody is forcing anyone to do
> anything. But encouraging HW vendors to provide generic mechanisms so
> that your users can use whatever protocol they want is the exact
> _opposite_ of punishing users, this is very much a pro-user direction.

Some users will suffer worse performance in case we don't correctly set
ip_summed for a specific protocol before we do the copy operations from
user space into skbs but if they are always done in the driver.

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-24 Thread Tom Herbert
>
> FWIW, I've brought the issue to the attention of the architects here,
> and we will likely be able to make changes in this space.  Intel
> hardware (as demonstrated by your patches) already is able to deal with
> this de-ossification on transmit.  Receive is a whole different beast.
>
Please provide the specifics on why "Receive is a whole different
beast.". Generic receive checksum is already a subset of the
functionality that you must have implement to support the protocol
specific offloads. All the hardware needs to do is calculate the 1's
complement checksum of the packet and return the value on the to the
host with that packet. That's it. No parsing of headers, no worrying
about the pseudo header, no dealing with any encapsulation. Just do
the calculation, return the result to the host and the driver converts
this to CHECKSUM_COMPLETE. I find it very hard to believe that this is
any harder than specific support the next protocol du jour.

> I think that trying to force an agenda with no fore-warning and also
> punishing the users in order to get hardware vendors to change is the
> wrong way to go about this.  All you end up with is people just asking
> you why their hardware doesn't work in the kernel.
>
As you said this in only feedback and nobody is forcing anyone to do
anything. But encouraging HW vendors to provide generic mechanisms so
that your users can use whatever protocol they want is the exact
_opposite_ of punishing users, this is very much a pro-user direction.

> You have a proposal, let's codify it and enable it for the future, and
> especially be *really* clear what you want hardware vendors to
> implement so that they get it right.  MS does this by publishing
> specifications and being clear what MUST be implemented and what COULD
> be implemented.
>

Linux does not mandate HW implementation like MS, what we we do is
define driver interfaces which allow for a variety of different HW
implementations. The stack-driver checksum interface is described at
the top of skbuff.h. If this interface description is not clear enough
please let me know and we can fix that. If it is helpful we can
publish our requirements of new NICs at Facebook for reference.

Tom
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-23 Thread Alexander Duyck

On 11/23/2015 01:02 PM, Anjali Singhai Jain wrote:

Replace add/del ndo ops for vxlan_port with tunnel_port so that all UDP
based tunnels can use the same ndo op. Add a parameter to pass tunnel
type to the ndo_op.

Change all drivers to use the generalized udp tunnel offload

Patch was compile tested with x86_64_defconfig.

Signed-off-by: Kiran Patil 
Signed-off-by: Anjali Singhai Jain 
---


[...]


diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
index cb2f89f..72415aa 100644
--- a/include/net/udp_tunnel.h
+++ b/include/net/udp_tunnel.h
@@ -9,6 +9,12 @@
  #include 
  #endif

+enum udp_tunnel_type {
+   UDP_TUNNEL_UNSPEC,
+   UDP_TUNNEL_VXLAN,
+   UDP_TUNNEL_GENEVE,
+};
+
  struct udp_port_cfg {
u8  family;




I'm not a fan of UDP_TUNNEL_UNSPEC.  If you are going to implement a 
"tunnel type" field it should specify tunnel type 1:1, not just 
generically refer to UNSPEC for everything that isn't VXLAN or GENEVE. 
This way we can avoid any issues with anyone implementing an offload 
that later relies on their tunnel type value being equal to 0.


- Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-23 Thread Jesse Brandeburg
On Mon, 23 Nov 2015 16:38:59 -0800
Tom Herbert  wrote:
> >> >
> >> > Sorry, I still don't like this. Grant it least it gets rid of of VXLAN
> >> > specific ops, but the problem is there no such things as a common set
> >> > of encapsulations in the kernel (e.g. foo-over-udp adds a bunch of
> >> > encapsulations not represented here), no defined common set of device

Tom, thanks for your feedback.

Is anyone using foo-over-udp besides you?
I know a lot of people using VxLAN and many who want Geneve offloads.
The performance gain of using hardware offload in this area is
non-trivial (like 300% or more)

> >> > functionality that needs this, and this precludes the use of the RX
> >> > accelerations to be available from a userpsace  implementation.
> >>
> >> Regardless, I think this is at least a good cleanup of what is already
> >> there compared to having VXLAN-specific NDOs. We can always add
> >> additional things in the future.
> >
> > Agreed with Jesse that this will help not hurt,  when we are ready to 
> > cross the bridge for removing RX side Protocol ossification.
> >
> The time is now to address the protocol ossification problem. HW
> vendors leaking out support for random protocols one at a time really
> isn't helpful at all. Unfortunately, it's pretty clear that many
> vendors aren't going to fix this on their own volition. Fixing the
> interfaces to "encourage" change seems to be a way we can help things
> a long from kernel perspective.

So we (as a kernel community) have users *NOW* who want this
feature, and hardware that is available *now* that has this feature.
Do you think we should wait for a unicorn to arrive that has a fully
programmable de-ossified checksum engine?  How long?

Agree that we can start to address the Protocol Ossification problem by
working with hardware vendors, but that is a multi-year process to get
to new silicon with these changes.  Those with fully programmable
firmware engines might be able to get a change done sooner, but that
requires a non-trivial effort by the vendor that isn't reusable in
other operating systems, or maybe isn't possible at all due to hardware
limits.

FWIW, I've brought the issue to the attention of the architects here,
and we will likely be able to make changes in this space.  Intel
hardware (as demonstrated by your patches) already is able to deal with
this de-ossification on transmit.  Receive is a whole different beast.

I think that trying to force an agenda with no fore-warning and also
punishing the users in order to get hardware vendors to change is the
wrong way to go about this.  All you end up with is people just asking
you why their hardware doesn't work in the kernel.

You have a proposal, let's codify it and enable it for the future, and
especially be *really* clear what you want hardware vendors to
implement so that they get it right.  MS does this by publishing
specifications and being clear what MUST be implemented and what COULD
be implemented.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-23 Thread Tom Herbert
On Mon, Nov 23, 2015 at 4:32 PM, Singhai, Anjali
 wrote:
>
>
>> -Original Message-
>> From: Jesse Gross [mailto:je...@kernel.org]
>> Sent: Monday, November 23, 2015 2:50 PM
>> To: Tom Herbert
>> Cc: Singhai, Anjali; Linux Kernel Network Developers; Patil, Kiran
>> Subject: Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
>>
>> On Mon, Nov 23, 2015 at 1:53 PM, Tom Herbert 
>> wrote:
>> >> diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
>> >> index cb2f89f..72415aa 100644
>> >> --- a/include/net/udp_tunnel.h
>> >> +++ b/include/net/udp_tunnel.h
>> >> @@ -9,6 +9,12 @@
>> >>  #include 
>> >>  #endif
>> >>
>> >> +enum udp_tunnel_type {
>> >> +   UDP_TUNNEL_UNSPEC,
>> >> +   UDP_TUNNEL_VXLAN,
>> >> +   UDP_TUNNEL_GENEVE,
>> >> +};
>> >> +
>> >
>> > Sorry, I still don't like this. Grant it least it gets rid of of VXLAN
>> > specific ops, but the problem is there no such things as a common set
>> > of encapsulations in the kernel (e.g. foo-over-udp adds a bunch of
>> > encapsulations not represented here), no defined common set of device
>> > functionality that needs this, and this precludes the use of the RX
>> > accelerations to be available from a userpsace  implementation.
>>
>> Regardless, I think this is at least a good cleanup of what is already
>> there compared to having VXLAN-specific NDOs. We can always add
>> additional things in the future.
>
> Agreed with Jesse that this will help not hurt,  when we are ready to cross 
> the bridge for removing RX side Protocol ossification.
>
The time is now to address the protocol ossification problem. HW
vendors leaking out support for random protocols one at a time really
isn't helpful at all. Unfortunately, it's pretty clear that many
vendors aren't going to fix this on their own volition. Fixing the
interfaces to "encourage" change seems to be a way we can help things
a long from kernel perspective.

Tom
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-23 Thread Singhai, Anjali


> -Original Message-
> From: Jesse Gross [mailto:je...@kernel.org]
> Sent: Monday, November 23, 2015 2:50 PM
> To: Tom Herbert
> Cc: Singhai, Anjali; Linux Kernel Network Developers; Patil, Kiran
> Subject: Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
> 
> On Mon, Nov 23, 2015 at 1:53 PM, Tom Herbert 
> wrote:
> >> diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
> >> index cb2f89f..72415aa 100644
> >> --- a/include/net/udp_tunnel.h
> >> +++ b/include/net/udp_tunnel.h
> >> @@ -9,6 +9,12 @@
> >>  #include 
> >>  #endif
> >>
> >> +enum udp_tunnel_type {
> >> +   UDP_TUNNEL_UNSPEC,
> >> +   UDP_TUNNEL_VXLAN,
> >> +   UDP_TUNNEL_GENEVE,
> >> +};
> >> +
> >
> > Sorry, I still don't like this. Grant it least it gets rid of of VXLAN
> > specific ops, but the problem is there no such things as a common set
> > of encapsulations in the kernel (e.g. foo-over-udp adds a bunch of
> > encapsulations not represented here), no defined common set of device
> > functionality that needs this, and this precludes the use of the RX
> > accelerations to be available from a userpsace  implementation.
> 
> Regardless, I think this is at least a good cleanup of what is already
> there compared to having VXLAN-specific NDOs. We can always add
> additional things in the future.

Agreed with Jesse that this will help not hurt,  when we are ready to cross the 
bridge for removing RX side Protocol ossification. 



Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-23 Thread Jesse Gross
On Mon, Nov 23, 2015 at 1:53 PM, Tom Herbert  wrote:
>> diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
>> index cb2f89f..72415aa 100644
>> --- a/include/net/udp_tunnel.h
>> +++ b/include/net/udp_tunnel.h
>> @@ -9,6 +9,12 @@
>>  #include 
>>  #endif
>>
>> +enum udp_tunnel_type {
>> +   UDP_TUNNEL_UNSPEC,
>> +   UDP_TUNNEL_VXLAN,
>> +   UDP_TUNNEL_GENEVE,
>> +};
>> +
>
> Sorry, I still don't like this. Grant it least it gets rid of of VXLAN
> specific ops, but the problem is there no such things as a common set
> of encapsulations in the kernel (e.g. foo-over-udp adds a bunch of
> encapsulations not represented here), no defined common set of device
> functionality that needs this, and this precludes the use of the RX
> accelerations to be available from a userpsace  implementation.

Regardless, I think this is at least a good cleanup of what is already
there compared to having VXLAN-specific NDOs. We can always add
additional things in the future.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-23 Thread Tom Herbert
> diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
> index cb2f89f..72415aa 100644
> --- a/include/net/udp_tunnel.h
> +++ b/include/net/udp_tunnel.h
> @@ -9,6 +9,12 @@
>  #include 
>  #endif
>
> +enum udp_tunnel_type {
> +   UDP_TUNNEL_UNSPEC,
> +   UDP_TUNNEL_VXLAN,
> +   UDP_TUNNEL_GENEVE,
> +};
> +

Sorry, I still don't like this. Grant it least it gets rid of of VXLAN
specific ops, but the problem is there no such things as a common set
of encapsulations in the kernel (e.g. foo-over-udp adds a bunch of
encapsulations not represented here), no defined common set of device
functionality that needs this, and this precludes the use of the RX
accelerations to be available from a userpsace  implementation.

The bad effect of this model is that it is encourages HW vendors to
continue implement HW protocol specific support for encapsulations, we
get so much more benefit if they implement protocol generic
mechanisms. For instance, it is much better that they return
CHECKSUM_COMPLETE rather than giving us checksum unnecessary
indication for a TCP checksum within VXLAN.

If the devices needs to be configured for some protocol specific
actions then ntuple filters on the port seems like the right
interface. Really the only common NIC offload that might need this at
all is LRO. RSS, the checksum offloads, and LSO (for UDP tunnels) can
all be implemented generically without regard to the specific
encapsulation being used.

Tom
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-23 Thread kbuild test robot
Hi Anjali,

[auto build test ERROR on net/master]
[also build test ERROR on v4.4-rc2 next-20151123]

url:
https://github.com/0day-ci/linux/commits/Anjali-Singhai-Jain/Generalize-udp-based-tunnels-and-add-geneve-offload/20151124-044959
config: i386-randconfig-x006-11230317 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:53:0:
   include/net/udp_tunnel.h: In function 'udp_tunnel_handle_offloads':
>> include/net/udp_tunnel.h:112:9: error: implicit declaration of function 
>> 'iptunnel_handle_offloads' [-Werror=implicit-function-declaration]
 return iptunnel_handle_offloads(skb, udp_csum, type);
^
   include/net/udp_tunnel.h:112:9: warning: return makes pointer from integer 
without a cast [-Wint-conversion]
   cc1: some warnings being treated as errors

vim +/iptunnel_handle_offloads +112 include/net/udp_tunnel.h

c29a70d2 Pravin B Shelar 2015-08-26  106  
6a93cc90 Andy Zhou   2014-09-16  107  static inline struct sk_buff 
*udp_tunnel_handle_offloads(struct sk_buff *skb,
6a93cc90 Andy Zhou   2014-09-16  108
 bool udp_csum)
6a93cc90 Andy Zhou   2014-09-16  109  {
6a93cc90 Andy Zhou   2014-09-16  110int type = udp_csum ? 
SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
6a93cc90 Andy Zhou   2014-09-16  111  
6a93cc90 Andy Zhou   2014-09-16 @112return 
iptunnel_handle_offloads(skb, udp_csum, type);
6a93cc90 Andy Zhou   2014-09-16  113  }
6a93cc90 Andy Zhou   2014-09-16  114  
cfdf1e1b Jesse Gross 2014-11-10  115  static inline void 
udp_tunnel_gro_complete(struct sk_buff *skb, int nhoff)

:: The code at line 112 was first introduced by commit
:: 6a93cc9052748c6355ec9d5b6c38b77f85f1cb0d udp-tunnel: Add a few more UDP 
tunnel APIs

:: TO: Andy Zhou 
:: CC: David S. Miller 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-23 Thread kbuild test robot
Hi Anjali,

[auto build test ERROR on net/master]
[also build test ERROR on v4.4-rc2 next-20151123]

url:
https://github.com/0day-ci/linux/commits/Anjali-Singhai-Jain/Generalize-udp-based-tunnels-and-add-geneve-offload/20151124-044959
config: x86_64-randconfig-x001-11230704 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/net/ethernet/broadcom/bnxt/bnxt.c: In function 'bnxt_add_vxlan_port':
>> drivers/net/ethernet/broadcom/bnxt/bnxt.c:5436:14: error: 'UDP_TUNNEL_VXLAN' 
>> undeclared (first use in this function)
 if (type != UDP_TUNNEL_VXLAN)
 ^
   drivers/net/ethernet/broadcom/bnxt/bnxt.c:5436:14: note: each undeclared 
identifier is reported only once for each function it appears in
   drivers/net/ethernet/broadcom/bnxt/bnxt.c: In function 'bnxt_del_vxlan_port':
   drivers/net/ethernet/broadcom/bnxt/bnxt.c:5461:14: error: 'UDP_TUNNEL_VXLAN' 
undeclared (first use in this function)
 if (type != UDP_TUNNEL_VXLAN)
 ^

vim +/UDP_TUNNEL_VXLAN +5436 drivers/net/ethernet/broadcom/bnxt/bnxt.c

  5430  if (!netif_running(dev))
  5431  return;
  5432  
  5433  if (sa_family != AF_INET6 && sa_family != AF_INET)
  5434  return;
  5435  
> 5436  if (type != UDP_TUNNEL_VXLAN)
  5437  return;
  5438  
  5439  if (bp->vxlan_port_cnt && bp->vxlan_port != port)

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


[PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-11-23 Thread Anjali Singhai Jain
Replace add/del ndo ops for vxlan_port with tunnel_port so that all UDP
based tunnels can use the same ndo op. Add a parameter to pass tunnel
type to the ndo_op.

Change all drivers to use the generalized udp tunnel offload

Patch was compile tested with x86_64_defconfig.

Signed-off-by: Kiran Patil 
Signed-off-by: Anjali Singhai Jain 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 15 ++---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c| 13 +---
 drivers/net/ethernet/emulex/benet/be_main.c  | 14 +---
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  | 27 
 drivers/net/ethernet/intel/i40e/i40e_main.c  | 41 +---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c| 17 +++---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   | 21 
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 17 +++---
 drivers/net/vxlan.c  | 23 +++--
 include/linux/netdevice.h| 34 ++--
 include/net/udp_tunnel.h |  6 
 11 files changed, 157 insertions(+), 71 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 2273576..ad2782f 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -47,6 +47,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -10124,11 +10125,14 @@ static void __bnx2x_add_vxlan_port(struct bnx2x *bp, 
u16 port)
 }
 
 static void bnx2x_add_vxlan_port(struct net_device *netdev,
-sa_family_t sa_family, __be16 port)
+sa_family_t sa_family, __be16 port,
+u32 type)
 {
struct bnx2x *bp = netdev_priv(netdev);
u16 t_port = ntohs(port);
 
+   if (type != UDP_TUNNEL_VXLAN)
+   return;
__bnx2x_add_vxlan_port(bp, t_port);
 }
 
@@ -10152,11 +10156,14 @@ static void __bnx2x_del_vxlan_port(struct bnx2x *bp, 
u16 port)
 }
 
 static void bnx2x_del_vxlan_port(struct net_device *netdev,
-sa_family_t sa_family, __be16 port)
+sa_family_t sa_family, __be16 port,
+u32 type)
 {
struct bnx2x *bp = netdev_priv(netdev);
u16 t_port = ntohs(port);
 
+   if (type != UDP_TUNNEL_VXLAN)
+   return;
__bnx2x_del_vxlan_port(bp, t_port);
 }
 #endif
@@ -13008,8 +13015,8 @@ static const struct net_device_ops bnx2x_netdev_ops = {
.ndo_set_vf_link_state  = bnx2x_set_vf_link_state,
.ndo_features_check = bnx2x_features_check,
 #ifdef CONFIG_BNX2X_VXLAN
-   .ndo_add_vxlan_port = bnx2x_add_vxlan_port,
-   .ndo_del_vxlan_port = bnx2x_del_vxlan_port,
+   .ndo_add_udp_tunnel_port= bnx2x_add_vxlan_port,
+   .ndo_del_udp_tunnel_port= bnx2x_del_vxlan_port,
 #endif
 };
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index f2d0dc9..5b96ddf 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -5421,7 +5421,7 @@ static void bnxt_cfg_ntp_filters(struct bnxt *bp)
 #endif /* CONFIG_RFS_ACCEL */
 
 static void bnxt_add_vxlan_port(struct net_device *dev, sa_family_t sa_family,
-   __be16 port)
+   __be16 port, u32 type)
 {
struct bnxt *bp = netdev_priv(dev);
 
@@ -5431,6 +5431,9 @@ static void bnxt_add_vxlan_port(struct net_device *dev, 
sa_family_t sa_family,
if (sa_family != AF_INET6 && sa_family != AF_INET)
return;
 
+   if (type != UDP_TUNNEL_VXLAN)
+   return;
+
if (bp->vxlan_port_cnt && bp->vxlan_port != port)
return;
 
@@ -5443,7 +5446,7 @@ static void bnxt_add_vxlan_port(struct net_device *dev, 
sa_family_t sa_family,
 }
 
 static void bnxt_del_vxlan_port(struct net_device *dev, sa_family_t sa_family,
-   __be16 port)
+   __be16 port, u32 type)
 {
struct bnxt *bp = netdev_priv(dev);
 
@@ -5453,6 +5456,8 @@ static void bnxt_del_vxlan_port(struct net_device *dev, 
sa_family_t sa_family,
if (sa_family != AF_INET6 && sa_family != AF_INET)
return;
 
+   if (type != UDP_TUNNEL_VXLAN)
+   return;
if (bp->vxlan_port_cnt && bp->vxlan_port == port) {
bp->vxlan_port_cnt--;
 
@@ -5491,8 +5496,8 @@ static const struct net_device_ops bnxt_netdev_ops = {
 #ifdef CONFIG_RFS_ACCEL
.ndo_rx_flow_steer  = bnxt_rx_flow_steer,
 #endif
-   .ndo_add_vxlan_port = bnxt_add_vxlan_port,
-   .ndo_del_vxlan_port = bnxt_del_vxlan_port,
+   .ndo_add_udp_tunnel_port= bnxt_add_vxlan_port,
+   .ndo_del_udp_tunnel_port