On Sat, Feb 8, 2025 at 2:34 AM Han Zhou <[email protected]> wrote:
>
>
>
> On Fri, Feb 7, 2025 at 1:09 PM Ilya Maximets <[email protected]> wrote:
> >
> > On 2/7/25 21:45, Ilya Maximets wrote:
> > > On 2/7/25 15:01, Ilya Maximets wrote:
> > >> On 2/7/25 05:31, Numan Siddique wrote:
> > >>> On Thu, Feb 6, 2025 at 3:57 AM Ilya Maximets <[email protected]> wrote:
> > >>>>
> > >>>> This patch set introduces ability to directly connect switches,
> > >>>> including transit switches, in order to achieve higher total port count
> > >>>> and locality of changes in L2 topologies spread across multiple
> > >>>> availability zones.  And while tailored for this use case, the changes
> > >>>> do not impose any limitations and should allow for all kinds of other
> > >>>> different topologies.
> > >>>>
> > >>>> Amount of the logic code changes is relatively small, most of the diff
> > >>>> are new tests for the introduced functionality.
> > >>>>
> > >>>>
> > >>>> Version 2:
> > >>>>
> > >>>>   * Rebased on top of latest changes on the main branch.
> > >>>>   * Improved validation of the peer in northd. [Mark]
> > >>>>   * Added a test for a switch port with an address set. [Mark]
> > >>>>
> > >>>>
> > >>>> Ilya Maximets (2):
> > >>>>   northd: Add support for spine-leaf logical switch topology.
> > >>>>   ic: Add support for spine-leaf topology for transit switches.
> > >>>
> > >>> Hi Ilya,
> > >>>
> > >>> Thanks for adding this feature.  I applied both the patches to the
> > >>> main.  I had to do a minor rebase.
> > >>
> > >> Thanks, Numan and Mark!
> > >>
> > >>>
> > >>> I see one small issue  with the spine switch.  Since all the spine
> > >>> switch ports have unknown address,
> > >>> the packet will be cloned to N - 1 logical switches if N switches 
> > >>> connect to it.
> > >>>
> > >>> I think we should enable "fdb learn" in the spine switch with the below 
> > >>> changes
> > >>>
> > >>> -----------------------------------------------
> > >>> diff --git a/northd/northd.c b/northd/northd.c
> > >>> index 880112c3b9..e77936fbe9 100644
> > >>> --- a/northd/northd.c
> > >>> +++ b/northd/northd.c
> > >>> @@ -5682,6 +5682,7 @@ build_lswitch_learn_fdb_op(
> > >>>      ovs_assert(op->nbsp);
> > >>>
> > >>>      if (!op->n_ps_addrs && op->has_unknown && (!strcmp(op->nbsp->type, 
> > >>> "") ||
> > >>> +        !strcmp(op->nbsp->type, "switch") ||
> > >>>          (lsp_is_localnet(op->nbsp) && 
> > >>> localnet_can_learn_mac(op->nbsp)))) {
> > >>>          ds_clear(match);
> > >>>          ds_clear(actions);
> > >>> -----------------------------------------------
> > >>>
> > >>> What do you think ?  Any concerns or objections ?   If not, can you
> > >>> submit a follow up patch to enable this ?
> > >>
> > >> This seems reasonable.  I missed the part that we enable FDB 
> > >> automatically
> > >> for some ports with "unknown".  Will look into that and submit a fix.
> > >
> > > So, I gave this some more thoughts and I'm not sure if we actually need
> > > FDB for "switch" ports in a common case.  In a non-IC setup OVN actually
> > > knows the destination, if it's part of OVN network.  The whole processing
> > > of the spine switch will happen on the source node and only the egress
> > > pipeline of the leaf switch will be executed on the destination node.
> > > If we have actual VM ports with "unknown", then those will have the FDB
> > > enabled, but we should not need to consult FDB for the spine switch itself
> > > otherwise.  Does that make sense?

You're right.  But enabling mac learning for "switch" ports does have
some benefits.
 By clone what I mean is that during the first packet upcall flow
translation, vswitchd will
clone the packet to all the leaf switch pipelines and yes, the final
datapath flow will
of course not clone the packets.    So it's not a big deal.  But I
think it's better
to enable fdb for switch ports.


> >
> > Thinking more about this again, there is still an issue if we have "unknown"
> > ports in leaf switches.  Since those switches are separate from the spine,
> > will broadcast in the spine and then every leaf that has "unknown" will also
> > broadcast within itself.  And so, "unknown"  VM ports will receive traffic
> > destined to ports on other leaf switches.  So, yes, we still need FDB for
> > this case.

Ack.

> >
> > >
> > > OTOH, what we may actually need is FDB learning for remote ports in IC
> > > setup.  In this case, OVN doesn't actually know the whole topology and
> > > can't figure out to which of the remote ports the packet should go, so it
> > > will broadcast.  The problem here, however, is that FDB stages are in the
> > > ingress pipeline, and they will not be executed on the remote node that
> > > actually needs them.  So, we may need to create FDB stages in the egress
> > > pipeline for packets coming from remote ports.  Seems like a generic
> > > issue with "unknown" ports on a transit switch.
> > >
> > > Mark, Numan, Dumitru, what do you think?
> >
> > But this issue also still stands as we need a way to learn MAC addresses
> > from packet arriving from a remote port, so we either need FDB in the egress
> > pipeline of the transit switch for remote ports, or we need an external
> > service like ovn-ic synchronizing FDB entries between AZs, which is not 
> > good.
> >
>
> Hi Ilya, I agree with you that synchronizing FDB entries between AZs is a 
> really bad idea. FDB in the egress pipeline (and learning the mac for inport 
> of the spine switch) sounds reasonable.

I agree.  FDB in the egress pipeline sounds reasonable.

If a spine transit switch has N ports,  then each packet (both request
and reply) will get flooded to
all the remote ports of the spine transit switch.  I tested it and I
can confirm.

If we can't add FDB in the egress pipeline in a reasonable time for
25.03,  I'd suggest documenting this
flood behavior and mark this feature as experimental so that the users
of 25.03 are well informed of this.


Thanks
Numan


>
> Thanks,
> Han
>
> > >
> > > Best regards, Ilya Maximets
> > _______________________________________________
> > dev mailing list
> > [email protected]
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to