On Thu, Jun 9, 2016 at 12:23 PM, Hannes Frederic Sowa <han...@redhat.com> wrote:
> On 09.06.2016 18:14, Alexander Duyck wrote:
>> On Thu, Jun 9, 2016 at 3:57 AM, Hannes Frederic Sowa <han...@redhat.com> 
>> wrote:
>>> On 09.06.2016 04:33, Alexander Duyck wrote:
>>>> On Wed, Jun 8, 2016 at 3:20 PM, Hannes Frederic Sowa <han...@redhat.com> 
>>>> wrote:
>>>>> The remaining problem regarding offloads would be, that we by default
>>>>> get into the situation that without the special offloading rule the
>>>>> vxlan stream will only be processed on one single core, as we tell
>>>>> network cards not to hash the udp ports into rxhash, which hurts a lot
>>>>> in case of vxlan, where we bias the flow identification on the source
>>>>> port without offloading available.
>>>>
>>>> Most NICs offer the option of hashing on UDP ports.  In the case of
>>>> the Intel NICs I know you can turn on UDP port hashing by using
>>>> ethtool and setting UDP hasing to be enabled via "ethtool -N <iface>
>>>> udp4 sdfn".  You can do the same thing using "udp6" for IPv6 based
>>>> tunnels.  That is usually enough to cover all the bases and the fact
>>>> is not too many people are passing fragmented UDP traffic and as long
>>>> as that is the case enabling UDP hashing isn't too big of a deal.
>>>
>>> True, I am wondering how safe it is given the reordering effects it has
>>> on UDP and thus other non vxlan management protocols on the hypervisors.
>>>
>>> At that time, when UDP port hashing was disabled, the message was pretty
>>> clear by upstream and I don't think for the default case anything should
>>> change here.
>>>
>>> Are the port hashing features also global or tweakable per VF?
>>
>> That one depends on the device.  I think in the case of some of the
>> newer NICs the VFs support separate RSS tables.  The ones that have
>> shared RSS tables typically share how they compute the hashes.  So for
>> example with igb and ixgbe you get a shared has computation where the
>> PF will impact the VFs.  One easy fix for the reordering though is to
>> simply disable RSS on the VFs which in many cases will likely occur
>> anyway unless the guest has multiple VCPUs.
>
> Sounds like a bad limitation. I assume multiple VCPUs are used in VMs (I
> even do that).

Right.  So do I.  However many VFs are still greatly limited in the
number of queues they can support.

Also in terms of impact on the VFs having the UDP hashing enabled for
RSS is only really an issue if you have a mix of fragmented and
non-fragmented traffic for the same flow.

> Hypothetically for IPv4 vxlan in a datacenter, can't we randomize the
> IPv4 address bits and isolate it properly just as the transport protocol
> for vxlan (e.g. the lower 2 bytes)? But that is becoming ugly...

Mangling the address would probably be even worse.

> We break ICMP already with UDP source port randomization.

I hadn't thought about that before.  Is that also the reason why we
don't have any PMTU discovery for UDP tunnels?

>> In the case of ixgbe it just occurred to me that there is also an
>> option of applying flow director rules and it would be possible to
>> just add a rule for each CPU so that you split the UDP source port
>> space up based on something like the lower 4 bits assuming 16 queues
>> for instance.
>
> The deployment of that based on the used hardware will also be terrible.

Right.  I never said it was going to be completely pretty.  Still it
is no worse then the kind of stuff we already have going on since we
are applying many of these offloads per device and not isolating the
VFs from the PF.

I find all of this to be much more palatable then stuff like remote
checksum offload and the like in order to try and make this work.
With the current igb, ixgbe, or i40e driver and an outer checksum
being present I can offload just about any of the UDP tunnel types
supported by the kernel including FOU, GUE, and VXLAN-GPE and get full
hardware offloads for segmentation and Rx checksum.

- Alex
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to