Re: [ovs-discuss] OVN - MTU path discovery

Numan Siddique Tue, 30 Oct 2018 03:39:02 -0700

Hi All,

In the last week's OVN meeting we discussed about this issue. I want to
summarize the discussion here and
ask a few questions.


Based on the suggestions I got from Ben few weeks earlier, I took the
approach of checking the packet length
using a new OVS action - chk_pkt_len_gt(length) , which is unattractive
name (my apologies).

The idea is to use this action something like
"actions=chk_pkt_len_gt(1500)->NXM_NX_REG0[0], resubmit(..)"
The new action chk_pkt_len_gt(1500) will set the reg0[0] bit to 1 if the
packet length is greater than 1500, 0 otherwise.

And later in the pipeline have a flow like

match= reg0=0x1/01, ... actions=icmp{....} /* To send the ICMP type 3
(Destination
Unreachable), code 4
as per RFC 1191 back to the sender */

During the discussion, Ben proposed a new action instead which will check
the MTU of the outport for the given port.
something like check_mtu(router_port_xyzzy). This action would raise an
exception if the packet size is greater
than the MTU of the outport.  Please see [1] for the chat logs.

I explored a bit on this new action - check_mtu. And I am not sure if we
can solve the problem reported here using
this action. When the chassis (hosting the gateway router port) receives
the packet from the compute chassis
tunnel port, it  runs the router pipeline, does NATting and sends the
packet to the ingress pipeline of the provider
network logical switch (with the localnet port). From the localnet patch
port the packet will enter the provider bridge (br-ex)
and the packet is pushed out of the physical interface. If in case mtu of
this physical interface is lesser than the size of the
packet, OVS drops the packet.

The issue with the check_mtu action is that, OVN doesn't program the
provider bridges and is unaware of the physical
interface connected to it. So I am not sure if we can use this approach to
solve this problem. Although there can be an
usecase for the check_mtu action.

Ben, do you have any comments on this ? Did I misunderstand from what you
were trying to say ? Please correct me If my
understanding is wrong.

To solve this MTU issue with OVN, I will go ahead with the action
chk_pkt_len_gt. Please let me know if there are any
concerns here. If the approach seems fine, request to suggest some better
name for the action.

There is also another thing missing. When OVN needs to send the ICMP packet
type 3, code 4, as per the RFC 1191
***

the router MUST include the MTU of that next-hop network in the
low-order 16 bits of the

ICMP header field that is labelled "unused" in the ICMP specification

***


I  don't think we have any OVS action field presently to set this MTU
value. Either we need to support setting this field

with a new OVS action or frame the complete ICMP packet and set the
MTU value in ovn-controller itself (in which

case, we cannot use the existing OVN action "icmp{..}"). Is it
reasonable to add this new action in OVS ? Or let ovn-controller

take care of it with a new OVN action ? Any thoughts on this please ?


Thanks

Numan


[1] - https://botbot.me/freenode/openvswitch/2018-10-25/?tz=Asia/Kolkata

(Looks like botbot.me will be shut down soon -
https://lincolnloop.com/blog/saying-goodbye-botbotme/)

 So copied to a temp pastebin as well - http://paste.openstack.org/show/733616/





On Mon, Sep 24, 2018 at 6:28 PM Daniel Alvarez Sanchez <dalva...@redhat.com>
wrote:

> Resending this email as I can't see it in [0] for some reason.
> [0] https://mail.openvswitch.org/pipermail/ovs-dev/2018-September/
>
>
>
>
> On Fri, Sep 21, 2018 at 2:36 PM Daniel Alvarez Sanchez <
> dalva...@redhat.com> wrote:
>
>> Hi folks,
>>
>> After talking to Numan and reading log from IRC meeting yesterday,
>> looks like there's some confusion around the issue.
>>
>> jpettit | I should look at the initial bug report again, but is it not
>> sufficient to configure a smaller MTU within the VM?
>>
>> Imagine the case where some host from the external network (MTU 1500)
>> sends 1000B UDP packets to the VM (MTU 200). When OVN attempts to deliver
>> the packet to the VM it won't fit and the application running there will
>> never
>> get the packet.
>>
>> With reference implementation (or if namespaces were used as Han suggests
>> that this is what NSX does), the packet would be handled by the IP stack
>> on
>> the gateway node. An ICMP need-to-frag would be sent back to the sender
>> and - if they're not blocked by some firewall - the IP stack on the
>> sender node
>> will fragment this and subsequent packets to fit the MTU on the receiver.
>>
>> Also, generally we don't want to configure small MTUs on the VMs for
>> performance as it would also impact on east/west traffic where
>> Jumbo frames appear to work.
>>
>> Thanks a lot for bringing this up on the meeting!
>> Daniel
>>
>> On Mon, Aug 13, 2018 at 5:23 PM Miguel Angel Ajo Pelayo <
>> majop...@redhat.com> wrote:
>> >
>> > Yeah, later on we have found that it was, again, more important that we
>> think.
>> >
>> > For example, there are still cases not covered by TCP MSS negotiation
>> (or
>> > for UDP/other protocols):
>> >
>> > Imagine you have two clouds, both with an internal MTU (let’s imagine
>> > MTUb on cloud B, and MTUa on cloud A), and an external transit
>> > network with a 1500 MTU (MTUc).
>> >
>> > MTUb > MTUc. And MTUb > MTUc
>> >
>> > Also, imagine that VMa in cloud A, has a floating IP (DNAT_SNAT NAT),
>> > and VMb in cloud B has also a floating IP.
>> >
>> > VMa tries to establish  connection to VMb FIP, and announces
>> > MSSa = MTUa - (IP + TCP overhead), VMb ACKs the TCP SYN request
>> > with  MSSb = MTUb - (IP - TCP overhead).
>> >
>> > So the agreement will be min(MSSa,MSSb) , but… the transit network MSSc
>> > will always be smaller , min(MSSa, MSSb) < MSSc.
>> >
>> > In ML2/OVS deployments, those big packets will get fragmented at the
>> router
>> > edge, and a notification ICMP will be sent to the sender of the packets
>> to notify
>> > fragmenting in source is necessary.
>> >
>> >
>> > I guess we can also replicate this with 2 VMs on the same cloud with
>> MSSa > MSSb
>> > where they try to talk via floating IP to each other.
>> >
>> >
>> > So going back to the thing, I guess we need to implement some OpenFlow
>> extension
>> > to match packets per size, redirecting those to an slow path
>> (ovn-controller) so we can
>> > Fragment/and icmp back the source for source fragmentation?
>> >
>> > Any advise on what’s the procedure here (OpenFlow land, kernel wise,
>> even in terms
>> > of our source code and design so we could implement this) ?
>> >
>> >
>> > Best regards,
>> > Miguel Ángel.
>> >
>> >
>> > On 3 August 2018 at 17:41:05, Daniel Alvarez Sanchez (
>> dalva...@redhat.com) wrote:
>> >
>> > Maybe ICMP is not that critical but seems like not having the ICMP
>> 'need to frag' on UDP communications could break some applications that are
>> aware of this to reduce the size of the packets? I wonder...
>> >
>> > Thanks!
>> > Daniel
>> >
>> > On Fri, Aug 3, 2018 at 5:20 PM Miguel Angel Ajo Pelayo <
>> majop...@redhat.com> wrote:
>> >>
>> >>
>> >> We didn’t understand why a MTU missmatch in one direction worked (N/S),
>> >> but in other direction (S/N) didn’t work… and we found that that it’s
>> actually
>> >> working (at least for TCP, via MSS negotiation), we had a
>> missconfiguration
>> >> In one of the physical interfaces.
>> >>
>> >> So, in the case of TCP we are fine. TCP is smart enough to negotiate
>> properly.
>> >>
>> >> Other protocols like ICMP with the DF flag, or UDP… would not get the
>> ICMP
>> >> that notifies the sender about the MTU miss-match.
>> >>
>> >> I suspect that the most common cases are covered, and that it’s not
>> worth
>> >> pursuing what I was asking for at least with a high priority, but I’d
>> like to hear
>> >> opinions.
>> >>
>> >>
>> >> Best regards,
>> >> Miguel Ángel.
>> >>
>> >> On 3 August 2018 at 08:11:01, Miguel Angel Ajo Pelayo (
>> majop...@redhat.com) wrote:
>> >>
>> >> I’m going to capture some example traffic and try to figure out which
>> RFCs
>> >> talk about that behaviour so we can come up with a consistent solution.
>> >> I can document it in the project.
>> >>
>> >> To be honest, when I looked at it, I was expecting that the router
>> would
>> >> fragment, and I ended up discovering that we had this path MTU
>> discovery
>> >> mechanism in play for IPv4 .
>> >>
>> >> On 2 August 2018 at 22:21:28, Ben Pfaff (b...@ovn.org) wrote:
>> >>
>> >> On Thu, Aug 02, 2018 at 01:19:57PM -0700, Ben Pfaff wrote:
>> >> > On Wed, Aug 01, 2018 at 10:46:07AM -0400, Miguel Angel Ajo Pelayo
>> wrote:
>> >> > > Hi Ben, ICMP is used as a signal from the router to tell the sender
>> >> > > “next hop has a lower mtu, please send smaller packets”, we would
>> >> > > need at least something in OVS to slow-path the “bigger than X”
>> packets,
>> >> > > at that point ova-controller could take care of constructing the
>> ICMP packet
>> >> > > and sending it to the source.
>> >> >
>> >> > Yes.
>> >> >
>> >> > > But I guess, that we still need the kernel changes to match on
>> >> > > those “big packets”.
>> >> >
>> >> > Maybe. If we only need to worry about ICMP, though, we can set up OVN
>> >> > so that it always slow-paths ICMP.
>> >>
>> >> Oh, I think maybe I was just being slow. The ICMP is generated, not
>> >> processed. Never mind.
>>
> _______________________________________________
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Re: [ovs-discuss] OVN - MTU path discovery

Reply via email to