On Wed, Sep 6, 2017 at 10:43 AM, Alexander Duyck
<alexander.du...@gmail.com> wrote:
> On Wed, Sep 6, 2017 at 9:17 AM, Tom Herbert <t...@herbertland.com> wrote:
>> On Tue, Sep 5, 2017 at 8:06 PM, Alexander Duyck
>> <alexander.du...@gmail.com> wrote:
>>> On Tue, Sep 5, 2017 at 2:13 PM, Tom Herbert <t...@herbertland.com> wrote:
>>>>> The situation with encapsulation is even more complicated:
>>>>>
>>>>> We are basically only interested in the UDP/vxlan/Ethernet/IP/UDP
>>>>> constellation. If we do the fragmentation inside the vxlan tunnel and
>>>>> carry over the skb hash to all resulting UDP/vxlan packets source ports,
>>>>> we are fine and reordering on the receiver NIC won't happen in this
>>>>> case. If the fragmentation happens on the outer UDP header, this will
>>>>> result in reordering of the inner L2 flow. Unfortunately this depends on
>>>>> how the vxlan tunnel was set up, how other devices do that and (I
>>>>> believe so) on the kernel version.
>>>>>
>>>> This really isn't that complicated. The assumption that an IP network
>>>> always delivers packets in order is simply wrong. The inventors of
>>>> VXLAN must have know full well that when you use IP, packets can and
>>>> eventually will be delivered out of order. This isn't just because of
>>>> fragmentation, there are many other reasons that packets can be
>>>> delivered OOO. This also must have been known when IP/GRE and any
>>>> other protocol that carries L2 over IP was invented. If OOO is an
>>>> issue for these protocols then they need to be fixed-- this is not a
>>>> concern with IP protocol nor the stack.
>>>>
>>>> Tom
>>>
>>> As far as a little background on the original patch I believe the
>>> issue that was fixed by the patch was a video streaming application
>>> that was sending/receiving a mix of fragmented and non-fragmented
>>> packets. Receiving them out of order due to the fragmentation was
>>> causing issues with stutters in the video and so we ended up disabling
>>> UDP by default in the NICs listed. We decided to go that way as UDP
>>> RSS was viewed as a performance optimization, while the out-of-order
>>> problems were viewed as a functionality issue.
>>>
>> Hi Alex,
>>
>> Thanks for the details! Were you able to find the root cause for this?
>> In particular, it would be interesting to know if it is the kernel or
>> device that introduced the jitter, or if it's the application that
>> doesn't handle OOO well...
>>
>> Tom
>
> It is hard to say since my memory of the events from 7 years ago is
> pretty vague at this point, but I'm pretty sure it was the
> application. Basically getting the frames out of order was causing
> them to have to drop video data if I recall correctly.
>
Oh, I didn't notice that patch was from 2010. Maybe the application
has been fixed by now! :-)

Perhaps, it's time to try to turn UDP hashing on again by default?
Even if NICs aren't doing this, there are network devices that are
looking at UDP for ECMP and that doesn't seem to be causing widespread
problems currently...

Tom

Reply via email to