On Wed, Apr 20, 2022 at 3:16 PM Numan Siddique <num...@ovn.org> wrote:
>
> On Wed, Apr 20, 2022 at 1:15 PM Ihar Hrachyshka <ihrac...@redhat.com> wrote:
> >
> > Thanks for your review. See below.
> >
> > On Tue, Apr 19, 2022 at 11:43 AM Numan Siddique <num...@ovn.org> wrote:
> > >
> > > On Tue, Mar 29, 2022 at 8:47 PM Ihar Hrachyshka <ihrac...@redhat.com> 
> > > wrote:
> > > >
> > > > This version of the series switched to supporting multiple chassis set 
> > > > in
> > > > requested-chassis option. This allows for more than two chassis
> > > > specified for the same port.
> > > >
> > > > Also, this series includes a patch that allows to disable tunneling
> > > > enforcement for ports with multiple chassis set in requested-chassis.
> > > > (This is useful when tunneling network is MTU constrained.)
> > > >
> > > > v0: initial draft (single patch)
> > > > v1: split into pieces
> > > > v1: renamed options: migration-destination ->
> > > >                      requested-additional-chassis,
> > > >                      migration-unblocked ->
> > > >                      additional-chassis-activated
> > > > v1: introduced options:activation-strategy=rarp to allow for other
> > > >     strategies / having default no-op strategy
> > > > v1: implement in-memory port-activated tracking to avoid races
> > > > v1: numerous code cleanup / bug fixes
> > > > v1: special handling for localnet attached switches
> > > > v2: added ddlog implementation
> > > > v2: re-inject RARP packet after vswitch updates flows
> > > > v3: re-sent as a single series
> > > > v4: redesign to reuse requested-chassis option
> > > > v4: support >2 chassis per port
> > > > v4: allow to disable tunneling enforcement when n_chassis >= 2
> > > >
> > > > Ihar Hrachyshka (15):
> > > >   Introduce chassis_is_vtep
> > > >   northd: introduce separate function to look up chassis
> > > >   northd: separate code for nb->sb port binding chassis update
> > > >   Pass chassis and encap into get_port_binding_tun
> > > >   Introduce match_outport_dp_and_port_keys in physical.c
> > > >   Split code to set zone info into put_zones_ofpacts
> > > >   Use get_port_binding_tun instead of chassis_tunnel_find
> > > >   Tag all packets that arrived from a tunnel as LOCAL_ONLY
> > > >   Update port-up on main chassis only
> > > >   Support LSP:options:requested-chassis as a list
> > > >   Clone packets to both port chassis
> > > >   Implement RARP activation strategy for ports
> > > >   Reinject RARP packet when activation-strategy=rarp
> > > >   Allow to disable tunneling enforcement for multi-chassis port
> > > >   Update NEWS file about new live migration options
> > >
> > > Hi Ihar,
> > >
> > > Thanks for the patch series.
> > >
> > > I have a few comments.
> > >
> > > 1)  Do you think we really need to support port_binding multiple
> > > encaps for this use case ?  If you see this commit -
> > > https://github.com/ovn-org/ovn/commit/dd527a283cd8b67ca481374302a630f5635de6ac
> > >  encaps for port binding was added for a specific use case.
> > >     If CMS desires to configure multiple encaps  for a logical port,
> > > then I think its Ok to not support multiple requested-chassis for such
> > > ports.  What do you think ? Do you see any use case for this in
> > > openstack ?
> > >     IMO supporting multiple encaps is complicating the code in
> > > binding.c and I'd prefer if we can avoid it.
> > >
> >
> > I don't have an explicit request to support it for openstack. The idea
> > behind the implementation was to make binding behavior identical to
> > the base "single chassis" case.
> >
> > That said, it seems to me the explicit VF-to-PF mapping case described
> > in the commit message you linked to applies to both single chassis and
> > multiple chassis the same way. Perhaps the other chassis has a
> > different VF-PF relationship because NICs are different / VF
> > configuration is different. In this case it makes sense that every
> > chassis is able to choose its own tunnel IP preference, no?
> >
> > Re: code complexity, do you mean append_additional_encap,
> > remove_additional_encap_for_chassis,
> > update_port_additional_encap_if_needed and the way the code locates
> > the matching encap object by comparing chassis names? Or do you mean
> > something else?
>
> Yes.  These ones.  I guess we can support later if there is a use case.
> Although I won't object if you think its valuable to be consistent and
> if it's good to have multiple requested-chassis
> support for port binding encaps.
>
>
> >
> > > 2)  In patch 12 (RARP activation strategy), ovn-controller is
> > > programming OF flows in logical flow space (i.e table 8).  I think
> > > it's better to avoid it.  Instead I'd suggest adding these flows in
> > > the new physical table - table 73 (see lflow.h).
> >
> > I am happy to adjust but could you please clarify why we should avoid
> > using table=8 here? Would e.g. moving the activation flows to table=0
> > solve the concern? I may miss some basics on table/flow design so I
> > hope you can educate me. :)
>
> OVN has the separation of OF flows into 2 scopes - logical OF flows
> and physical OF flows.
> Logical OF flows are generated from the logical flows in the SB DB.
> And physical OF flows are generated by ovn-controller (in physical.c).
>
> Since ovn-controller is generating these flows,  I think it's better
> that these flows are added in the physical table scope.
>
> Yes.  I think it's also fine to add the activation flows in table 0.
>
> >
> > >      You can modify the flow in table 0 to resubmit to table 73 if the
> > > ovs interface port is created on additional chassis.  And table 73 can
> > > have a controller action for rarp packet and a generic drop flow.
> > >
> > > 2.a )  In the proposed patch series, pinctrl thread when it receives a
> > > packet due to rarp,  it deletes the flows.  Instead I'd suggest doing
> > > the below.
> > >          In table 73,  add a flow with the match = rarp and action =
> > > controller(..), learn(...)
> > >         The first action will send the packet to the controller.  When
> > > the pinctrl thread receives it,  it can tell the main thread to set
> > > the option additional-chassis-activated for the port binding.
> > >         The second OF action "learn" can add a new  higher priority
> > > flow to resubmit the packets to table 8.   This way, pinctrl thread
> > > doesn't need to modify or delete any openflows when it receives a rarp
> > > activation packet.  What do you think ?
> >
> > This is perhaps my misunderstanding, but the thinking behind deleting
> > flows was to keep the number of flows and table jumps to the minimum:
> > once a binding location is activated (presumably happening in quick
> > succession), table=8 is cleaned and looks the same way as it would
> > look like with a single-chassis binding.
>
> My understanding (maybe I misunderstood) is that once the VM is
> migrated successfully,   CMS would clear the additional
> requested-chassis option.
>

Yes, and in the context of live-migration that time when two chassis
are serving the same port binding would indeed be miniscule (counted
in seconds not hours). But consider that the scenario for the patch
series was expanded to include long running multi-chassis ports, for
functionality similar to port mirroring. (Though this series leaves it
out of scope to filter traffic copied to other chassis - this could be
implemented on top of the series. Note there is a RFC for port
mirroring in ML that, in my view, would need to be adjusted to use the
basic multi-chassis functionality instead of reimplementing the whole
traffic cloning mechanism. But that's a separate topic.)

> This is what I understand
>
> 1.  A VM with lport lp1 is present in compute chassis - CH1.
> 2.  During the migration process,  CMS will set the
> requested-chassis="CH1, CH2" and it will also set the activation
> strategy to rarp.
> 3. ovn-controller on CH2 when it sees the OVS interface for lp1, it
> will "claim (additional)" it and add the flows. Since activation
> strategy is set, it will add the rarp match flow with controller
> action and a drop flow in table 73.
> 4. VM is successfully migrated to CH2 and libvirt will send a RARP packet.
> 5. ovn-controller will handle this packet and will set the
> "additional-chassis-activated" flag in the lp1's port binding.
> 6. If learn action is used, it will do the magic of allowing the
> traffic from lp1 VM.  If not, then ovn-controller will delete the flow
> to drop the packet.
> 7. CMS sees this and it will clear CH1 from the requested-chassis option.
> 8. ovn-controller on CH1 will delete the flows.
> 9. ovn-controller on CH2 will become the primary chassis of lp1 and it
> will delete the flows in table 73 (from the main thread and not from
> pinctrl thread) and will re-add the flow in table 0 to resubmit to
> table 8.
>
>
> Is my understanding above correct ?  Will CMS do step 7 ?
>
> But I guess step (9) could disrupt the traffic from the VM for a tiny
> bit when the flow in table 0 is deleted and re-added.  I guess that's
> why you added flows in table 8 to avoid any disruption ?
>

Yes and because the delay may turn out more than a tiny bit if
something goes wrong with database connectivity / translation during
the flip to exclusive use by the new chassis.

> If this is the concern then perhaps ovn-controller can add 2 flows in
> table 0 whenever it claims a logical port and if it is not primary
>    - A higher priority flow in table 0 to resubmit to table 73
>   -  A lower priority flow in table 0 to resubmit to table 8 (which
> will take the normal path)
>
>  ovn-controller can delete the higher priority flow when it becomes primary.
>
> Regarding your comment on keeping table jumps to minimum,  I think
> it's not a big concern because flow translation happens for the
> (first) packet  when there is an upcall and subsequent packets will be
> handled by the kernel datapath flow (until the datapath flow expires).
>
>
> >
> > If we are to introduce table=73 and learn(), then we'll
> >
> > 1) always jump table=0 -> table=73;
> > 2) add another flow to jump back table=73 -> table=8 for every
> > activated multi-chassis port chassis.
> >
> > and these flows will live through the lifetime of the chassis binding.
>
> See comments above.
>
>
> > In comparison, with the current implementation, once the port is
> > activated, there are no new flows / tables retained.
> >
> > Maybe I wrongly assumed that having flows / additional tables is not
> > advisable if we can skip it, and then we could as well implement it as
> > learn() that adds new flows.
> >
> > What's bad about deleting flows on activation compared to adding new ones?
>
> IMO it is better that ovn-controller main thread does the flow
> programming and the pinctrl thread just handles
> packet-ins/packet-outs.
> This patch set programs the flows from pinctrl thread and that is my concern.
>

We could probably offload the business of flow deletion to the same
mechanism that the series uses to offload RARP reinjection to the main
thread (e.g. store in a global list, then notify to the main thread).

>
>
> >
> > >         One use case of learn which ovn-controller uses -
> > > https://github.com/ovn-org/ovn/blob/main/controller/lflow.c#L1752
> > >         Also I don't think there is a need to set the "pause" flag for
> > > the OVN action - activation_strategy_rarp
> > >         Do you see any problem with this ?
> > >
> >
> > I'll have to learn about the pause flag, but you actually raised one
> > thing that I was not sure about: to implement a controller() action in
> > pinctl.c, I had to add a new enum id for the handler, which forced me
> > to write some boilerplate in lib/actions.c to make the codebase
> > compile. BUT: as far as I understand, the code there isn't really used
> > anywhere since the OVN action translation is not injected through a
> > Logical_Flow. Instead, the flow to capture RARP is implemented as a
> > direct controller_op. Do you see any problem with this approach?
> >
>
> I'd ideally have preferred if everything was done via logical flows.
> Like how we handle virtual ports (using bind_vport() action).
> But in this case we don't want ovn-controller to update the SB DB
> (when rarp is seen), and then ovn-northd generate proper lflows etc.
>
> Regarding the "pause" flag,  we use it for put_dhcp_opts() OVN action
> for dhcp handling.  put_dhcp_opts() translates to controller action
> with pause flag set.
> What it means is that when ovs-vswitchd sends the packet to
> ovn-controller,  it pauses or freezes the packet and resumes from the
> paused pipeline when ovn-controller resumes it.
>
> In this case I don't see rarp packets  getting resumed.  Instead I see
> that patch 13 is re-injecting rarp packets as packet-out.
>

Do you mean I won't need to manually inject RARP if I do pause? I'll
give it a try!

>
>
> > >
> > > 3) Patch 11 (Clone packets to both port chassis) is adding flows in
> > > table 38 to tunnel the packet to other requested chassis.  Table 38 is
> > > meant for local output and I think it should not do this.   Table 37
> > > is meant for tunnelling.
> > >      If suppose a logical port has 2 requested chassis - ch1 and ch2,
> > > and if there is ovs port for this logical port on both ch1 and ch2,
> > > then  I'd suggest adding/modifying the flow in table 37 to tunnel to
> > > the other requested chassis and then do a resubmit to 38.
> > >
> >
> > That's fair, I'll update the series to shuffle flows in tables 37/38.
> >
> > >
> > > These are the comments I had at this point.  Let me know what you think.
> > >
> >
> > One other thing that I am still not sure about is the intended
> > localnet-LS behavior.
> >
> > Are we set on not supporting multiple requested-chassis for ports on
> > such switches? If so, do we just ignore (warn in logs) multiple
> > chassis set when a localnet is attached to a switch? (we'll also need
> > to handle the other way - when localnet is attached to a switch AFTER
> > some ports are assigned to multiple chassis)
>
> I just replied to the other thread where we discussed this.
> I'm fine supporting this for localnet switches using tunnelling.
>
> Thanks
> Numan
>
> >
> > > Thanks
> > > Numan
> > >
> > >
> > >
> > > >
> > > >  NEWS                        |    3 +
> > > >  controller/binding.c        |  284 ++++++++--
> > > >  controller/binding.h        |    5 +
> > > >  controller/if-status.c      |   15 +-
> > > >  controller/if-status.h      |    1 +
> > > >  controller/lport.c          |   42 +-
> > > >  controller/ovn-controller.c |    4 +-
> > > >  controller/physical.c       |  450 ++++++++++++----
> > > >  controller/pinctrl.c        |  293 +++++++++-
> > > >  controller/pinctrl.h        |    5 +
> > > >  include/ovn/actions.h       |    9 +
> > > >  lib/actions.c               |   40 +-
> > > >  northd/northd.c             |  109 +++-
> > > >  northd/ovn-northd.c         |    7 +-
> > > >  ovn-nb.xml                  |   25 +-
> > > >  ovn-sb.ovsschema            |   19 +-
> > > >  ovn-sb.xml                  |   73 ++-
> > > >  tests/ovn.at                | 1000 +++++++++++++++++++++++++++++++++++
> > > >  utilities/ovn-trace.c       |    3 +
> > > >  19 files changed, 2166 insertions(+), 221 deletions(-)
> > > >
> > > > --
> > > > 2.34.1
> > > >
> > > > _______________________________________________
> > > > dev mailing list
> > > > d...@openvswitch.org
> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > >
> > >
> >
> > _______________________________________________
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
>

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to