On 2/9/25 9:47 PM, Numan Siddique wrote:
> On Sat, Feb 8, 2025 at 2:34 AM Han Zhou <[email protected]> wrote:
>>
>>
>>
>> On Fri, Feb 7, 2025 at 1:09 PM Ilya Maximets <[email protected]> wrote:
>>>
>>> On 2/7/25 21:45, Ilya Maximets wrote:
>>>> On 2/7/25 15:01, Ilya Maximets wrote:
>>>>> On 2/7/25 05:31, Numan Siddique wrote:
>>>>>> On Thu, Feb 6, 2025 at 3:57 AM Ilya Maximets <[email protected]> wrote:
>>>>>>>
>>>>>>> This patch set introduces ability to directly connect switches,
>>>>>>> including transit switches, in order to achieve higher total port count
>>>>>>> and locality of changes in L2 topologies spread across multiple
>>>>>>> availability zones.  And while tailored for this use case, the changes
>>>>>>> do not impose any limitations and should allow for all kinds of other
>>>>>>> different topologies.
>>>>>>>
>>>>>>> Amount of the logic code changes is relatively small, most of the diff
>>>>>>> are new tests for the introduced functionality.
>>>>>>>
>>>>>>>
>>>>>>> Version 2:
>>>>>>>
>>>>>>>   * Rebased on top of latest changes on the main branch.
>>>>>>>   * Improved validation of the peer in northd. [Mark]
>>>>>>>   * Added a test for a switch port with an address set. [Mark]
>>>>>>>
>>>>>>>
>>>>>>> Ilya Maximets (2):
>>>>>>>   northd: Add support for spine-leaf logical switch topology.
>>>>>>>   ic: Add support for spine-leaf topology for transit switches.
>>>>>>
>>>>>> Hi Ilya,
>>>>>>
>>>>>> Thanks for adding this feature.  I applied both the patches to the
>>>>>> main.  I had to do a minor rebase.
>>>>>
>>>>> Thanks, Numan and Mark!
>>>>>
>>>>>>
>>>>>> I see one small issue  with the spine switch.  Since all the spine
>>>>>> switch ports have unknown address,
>>>>>> the packet will be cloned to N - 1 logical switches if N switches 
>>>>>> connect to it.
>>>>>>
>>>>>> I think we should enable "fdb learn" in the spine switch with the below 
>>>>>> changes
>>>>>>
>>>>>> -----------------------------------------------
>>>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>>>> index 880112c3b9..e77936fbe9 100644
>>>>>> --- a/northd/northd.c
>>>>>> +++ b/northd/northd.c
>>>>>> @@ -5682,6 +5682,7 @@ build_lswitch_learn_fdb_op(
>>>>>>      ovs_assert(op->nbsp);
>>>>>>
>>>>>>      if (!op->n_ps_addrs && op->has_unknown && (!strcmp(op->nbsp->type, 
>>>>>> "") ||
>>>>>> +        !strcmp(op->nbsp->type, "switch") ||
>>>>>>          (lsp_is_localnet(op->nbsp) && 
>>>>>> localnet_can_learn_mac(op->nbsp)))) {
>>>>>>          ds_clear(match);
>>>>>>          ds_clear(actions);
>>>>>> -----------------------------------------------
>>>>>>
>>>>>> What do you think ?  Any concerns or objections ?   If not, can you
>>>>>> submit a follow up patch to enable this ?
>>>>>
>>>>> This seems reasonable.  I missed the part that we enable FDB automatically
>>>>> for some ports with "unknown".  Will look into that and submit a fix.
>>>>
>>>> So, I gave this some more thoughts and I'm not sure if we actually need
>>>> FDB for "switch" ports in a common case.  In a non-IC setup OVN actually
>>>> knows the destination, if it's part of OVN network.  The whole processing
>>>> of the spine switch will happen on the source node and only the egress
>>>> pipeline of the leaf switch will be executed on the destination node.
>>>> If we have actual VM ports with "unknown", then those will have the FDB
>>>> enabled, but we should not need to consult FDB for the spine switch itself
>>>> otherwise.  Does that make sense?
> 
> You're right.  But enabling mac learning for "switch" ports does have
> some benefits.
>  By clone what I mean is that during the first packet upcall flow
> translation, vswitchd will
> clone the packet to all the leaf switch pipelines and yes, the final
> datapath flow will
> of course not clone the packets.    So it's not a big deal.  But I
> think it's better
> to enable fdb for switch ports.
> 
> 
>>>
>>> Thinking more about this again, there is still an issue if we have "unknown"
>>> ports in leaf switches.  Since those switches are separate from the spine,
>>> will broadcast in the spine and then every leaf that has "unknown" will also
>>> broadcast within itself.  And so, "unknown"  VM ports will receive traffic
>>> destined to ports on other leaf switches.  So, yes, we still need FDB for
>>> this case.
> 
> Ack.
> 
>>>
>>>>
>>>> OTOH, what we may actually need is FDB learning for remote ports in IC
>>>> setup.  In this case, OVN doesn't actually know the whole topology and
>>>> can't figure out to which of the remote ports the packet should go, so it
>>>> will broadcast.  The problem here, however, is that FDB stages are in the
>>>> ingress pipeline, and they will not be executed on the remote node that
>>>> actually needs them.  So, we may need to create FDB stages in the egress
>>>> pipeline for packets coming from remote ports.  Seems like a generic
>>>> issue with "unknown" ports on a transit switch.
>>>>
>>>> Mark, Numan, Dumitru, what do you think?
>>>
>>> But this issue also still stands as we need a way to learn MAC addresses
>>> from packet arriving from a remote port, so we either need FDB in the egress
>>> pipeline of the transit switch for remote ports, or we need an external
>>> service like ovn-ic synchronizing FDB entries between AZs, which is not 
>>> good.
>>>
>>
>> Hi Ilya, I agree with you that synchronizing FDB entries between AZs is a 
>> really bad idea. FDB in the egress pipeline (and learning the mac for inport 
>> of the spine switch) sounds reasonable.
> 
> I agree.  FDB in the egress pipeline sounds reasonable.
> 

+1

I think it's reasonable implementation for all "learning features" in IC
mode.  We already do that (for the same reason) for IGMP/MLD:

https://github.com/ovn-org/ovn/commit/c6b20c9940ed41f028347f559b21a25fb7625eb4

> If a spine transit switch has N ports,  then each packet (both request
> and reply) will get flooded to
> all the remote ports of the spine transit switch.  I tested it and I
> can confirm.
> 
> If we can't add FDB in the egress pipeline in a reasonable time for
> 25.03,  I'd suggest documenting this
> flood behavior and mark this feature as experimental so that the users
> of 25.03 are well informed of this.
> 

Sounds like something we can add after branching (either learning in the
egress pipeline or marking the feature as experimental) indeed.

Thanks,
Dumitru

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to