[ovs-dev] [ovn] Unable to mix Load Balancers with allow-stateless ACL
Hi all, I've tried to upgrade OVN from 22.09.1 to the fresh version and our internal tests showed that commit [0] broke scenario where we do use in logical switches both: Load Balancers AND allow-stateless ACLs at the same time. Prior to this change all traffic directed to load balancer's IP address passed to conntrack and finally worked correctly, while there were allow-stateless rules, which, for example covered all other traffic, except this LB. We use such mix because of need both: LBs and stateless handling for all traffic except LB. Also, this patch was backported to a minor releases, which brought major behavior changes (now we can't upgrade to 22.09.2+ without reverting mentioned patch). Is there any advice, how this can be fixed (except revert in our local repo)? 0: https://github.com/ovn-org/ovn/commit/a0f82efdd9dfd3ef2d9606c1890e353df1097a51 -- Regards, Vladislav Odintsov ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] OVN technical community meeting - Oct 21st
Hi, On 21.10.2024 12:25, Dumitru Ceara wrote: > Hi Frode, Felix, > > On 10/21/24 08:07, Felix Huettner wrote: >> On Sat, Oct 19, 2024 at 08:52:39AM +0200, Frode Nordahl wrote: >>> fre. 18. okt. 2024, 15:56 skrev Dumitru Ceara : >>> >>>> Hi everyone, >>>> >>>> Just a quick reminder for anyone interested in joining the next OVN >>>> technical meeting. It's scheduled to happen Monday, October 21st, at >>>> 3PM UTC. >>>> >>>> Meeting details: >>>> Date/Time: Monday, October 21st, 3PM UTC >>>> Link: https://meet.google.com/zns-gqsd-jdn >>>> Agenda: >>>> >>>> https://docs.google.com/document/d/1dG4GwcYOSs4uArPGtOoaP5tH4KCto-GH_C3tIXSnZZ8/ >>>> >>>> For now I added two potential items to the agenda: >>>> - BGP series discussion - to see what parts of this can make it in 25.03 >>>> - OVN 25.03 development cycle - to see in general what new features the >>>> community is planning to add to 25.03 >>>> >>>> Feel free to add/suggest more items if you wish. >>>> >>>> Looking forward to seeing you then! >>>> >>> I just realized that I have a conflicting internal meeting/presentation >>> that i cannot move in this time slot, actually about our upstream plans >>> this cycle. >>> >>> Any chance we could move the meeting 45 minutes to 3:45PM UTC? >> would be fine for me as well. >> > I'm not sure if that would work for others that already accepted the > meeting (7 other people until now). Maybe we can wait a bit longer for > those who are in the western hemisphere to start their week? > > Me personally, I'm also available at 3:45PM UTC so another option could > be to have a follow up sync-up then? What do you guys think? 3:45 PM UTC is fine for me. > > Regards, > Dumitru > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev -- Regards, Vladislav Odintsov ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn branch-22.03] ovs: Bump submodule to OVS 3.0.7.
Thank you, Dumitru, for the bumps in all branches. regards, Vladislav Odintsov > On 18 Oct 2024, at 18:46, Dumitru Ceara wrote: > > On 10/18/24 10:13, Vladislav Odintsov wrote: >> Nit: commit subject should be adjusted to: "ovs: Bump submodule to OVS >> v3.0.7.". >> > > Thanks for pointing that out! > > I fixed it up and applied the patch to 22.03. > > Thanks, > Dumitru > >> That was my mistake in previous message. >> >>> On 18.10.2024 10:55, Dumitru Ceara wrote: >>> From: Vladislav Odintsov >>> >>> This picks up the following relevant OVS changes: >>> 4198bcdfb compiler: Fix errors in Clang 17 ubsan checks. >>> ec048ab62 vlog: Destroy async_append first then close log_fd. >>> dbaf7271c hash, jhash: Fix unaligned access to the hash remainder. >>> 85285fb45 ovs-atomic: Fix inclusion of Clang header by GCC 14. >>> e040f35b2 vconn: Count vconn_sent regardless of log level. >>> ... and others. >>> >>> Reported-at: >>> https://mail.openvswitch.org/pipermail/ovs-dev/2024-October/417627.html >>> Signed-off-by: Vladislav Odintsov >>> --- >>> ovs | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/ovs b/ovs >>> index 94191b7a49..55ee005bef 16 >>> --- a/ovs >>> +++ b/ovs >>> @@ -1 +1 @@ >>> -Subproject commit 94191b7a4926204510931770c52992c9ea24d4e2 >>> +Subproject commit 55ee005bef94566fe829b667960d7bb9ac925974 >> > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn branch-22.09] ovs: Bump submodule to OVS 3.0.7.
Nit: commit subject should be adjusted to: "ovs: Bump submodule to OVS v3.0.7.". On 18.10.2024 10:54, Dumitru Ceara wrote: > From: Vladislav Odintsov > > This picks up the following relevant OVS changes: >4198bcdfb compiler: Fix errors in Clang 17 ubsan checks. >ec048ab62 vlog: Destroy async_append first then close log_fd. >dbaf7271c hash, jhash: Fix unaligned access to the hash remainder. >85285fb45 ovs-atomic: Fix inclusion of Clang header by GCC 14. >e040f35b2 vconn: Count vconn_sent regardless of log level. >... and others. > > Reported-at: > https://mail.openvswitch.org/pipermail/ovs-dev/2024-October/417627.html > Signed-off-by: Vladislav Odintsov > --- > ovs | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/ovs b/ovs > index 94191b7a49..55ee005bef 16 > --- a/ovs > +++ b/ovs > @@ -1 +1 @@ > -Subproject commit 94191b7a4926204510931770c52992c9ea24d4e2 > +Subproject commit 55ee005bef94566fe829b667960d7bb9ac925974 -- Regards, Vladislav Odintsov ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn branch-22.03] ovs: Bump submodule to OVS 3.0.7.
Nit: commit subject should be adjusted to: "ovs: Bump submodule to OVS v3.0.7.". That was my mistake in previous message. On 18.10.2024 10:55, Dumitru Ceara wrote: > From: Vladislav Odintsov > > This picks up the following relevant OVS changes: >4198bcdfb compiler: Fix errors in Clang 17 ubsan checks. >ec048ab62 vlog: Destroy async_append first then close log_fd. >dbaf7271c hash, jhash: Fix unaligned access to the hash remainder. >85285fb45 ovs-atomic: Fix inclusion of Clang header by GCC 14. >e040f35b2 vconn: Count vconn_sent regardless of log level. >... and others. > > Reported-at: > https://mail.openvswitch.org/pipermail/ovs-dev/2024-October/417627.html > Signed-off-by: Vladislav Odintsov > --- > ovs | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/ovs b/ovs > index 94191b7a49..55ee005bef 16 > --- a/ovs > +++ b/ovs > @@ -1 +1 @@ > -Subproject commit 94191b7a4926204510931770c52992c9ea24d4e2 > +Subproject commit 55ee005bef94566fe829b667960d7bb9ac925974 -- Regards, Vladislav Odintsov ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn branch-22.06] ovs: Bump submodule to OVS 3.0.7.
Nit: commit subject should be adjusted to: "ovs: Bump submodule to OVS v3.0.7.". On 18.10.2024 10:54, Dumitru Ceara wrote: > From: Vladislav Odintsov > > This picks up the following relevant OVS changes: >4198bcdfb compiler: Fix errors in Clang 17 ubsan checks. >ec048ab62 vlog: Destroy async_append first then close log_fd. >dbaf7271c hash, jhash: Fix unaligned access to the hash remainder. >85285fb45 ovs-atomic: Fix inclusion of Clang header by GCC 14. >e040f35b2 vconn: Count vconn_sent regardless of log level. >... and others. > > Reported-at: > https://mail.openvswitch.org/pipermail/ovs-dev/2024-October/417627.html > Signed-off-by: Vladislav Odintsov > --- > ovs | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/ovs b/ovs > index 94191b7a49..55ee005bef 16 > --- a/ovs > +++ b/ovs > @@ -1 +1 @@ > -Subproject commit 94191b7a4926204510931770c52992c9ea24d4e2 > +Subproject commit 55ee005bef94566fe829b667960d7bb9ac925974 -- Regards, Vladislav Odintsov ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn branch-24.09] ovs: Bump submodule to latest OVS branch-3.4.
Hi Dumitru, On 15.10.2024 18:34, Dumitru Ceara wrote: > On 10/14/24 16:41, Vladislav Odintsov wrote: >> On 14.10.2024 16:13, Dumitru Ceara wrote: >>> On 10/14/24 15:01, Vladislav Odintsov wrote: >>>> On 14.10.2024 14:47, Dumitru Ceara wrote: >>>>> On 10/13/24 10:19, Vladislav Odintsov wrote: >>>>>> This picks up the following relevant OVS changes: >>>>>> a15ce086d ofproto-dpif: Improve load balancing in dp_hash select >>>>>> groups. >>>>>> 76ba41b5c vconn: Always properly free flow stats reply. >>>>>> 64cb90507 ovsdb-idl: Fix IDL memory leak. >>>>>> ... and others. >>>>>> >>>>>> Reported-at: >>>>>> https://mail.openvswitch.org/pipermail/ovs-dev/2024-October/417627.html >>>>>> Signed-off-by: Vladislav Odintsov >>>>>> --- >>>>> Hi Vladislav, >>>>> >>>>>> ovs | 2 +- >>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/ovs b/ovs >>>>>> index c598c05c8..a15ce086d 16 >>>>>> --- a/ovs >>>>>> +++ b/ovs >>>>>> @@ -1 +1 @@ >>>>>> -Subproject commit c598c05c85b2d38874a0ce8f7f088f6aae4fdabc >>>>>> +Subproject commit a15ce086d41f9dfe6c1589333413b8e777401ef0 >>>>> This would make branch-24.09 use a newer submodule version than the OVN >>>>> main branch does. >>>>> >>>>> I think we need this commit on main too, what do you think? >>>> Hi Dumitru, >>>> >>>> Agree. >>>> >>>> The patch: >>>> https://patchwork.ozlabs.org/project/ovn/patch/20241014125151.74094-1-vlodintsov@k2.cloud/ >>>> >>> I was talking offline to Ilya about all these bumps and he made a good >>> point: while the tip of OVS branch-3.y and the latest v3.y.z get almost >>> the same amount of testing in the OVS repo it might be a bit better to >>> use v3.y.z instead. That's because likely external users of OVS run >>> tagged OVS releases in production so those might get more external testing. >>> >>> I had a quick look at the main differences between choosing the tip of >>> branch-3.y and v3.y.z on all branches and I think we'd only miss: >>> >>> 99e7cf9cce1c vconn: Always properly free flow stats reply. >>> f59f19bf69a4 ovsdb-idl: Fix IDL memory leak. >>> >>> which might be OK. >>> >>> If you agree I can change your patches on all branches (no need to post >>> new ones) and apply them. >>> >>> What do you think? >> Well, I'm totally fine with this. Please feel free to modify my patches. >> > I prepared them here, it would be great if you could double check: > > https://github.com/dceara/ovn/tree/refs/heads/tmp-branch-24.03 > https://github.com/dceara/ovn/tree/refs/heads/tmp-branch-23.09 > https://github.com/dceara/ovn/tree/refs/heads/tmp-branch-23.03 > https://github.com/dceara/ovn/tree/refs/heads/tmp-branch-22.12 > https://github.com/dceara/ovn/tree/refs/heads/tmp-branch-22.09 Could you please update commit subject to "ovs: Bump submodule to OVS 3.0.7." for patches within branches branch-22.09, branch-22.06, branch-22.03? > https://github.com/dceara/ovn/tree/refs/heads/tmp-branch-22.06 > https://github.com/dceara/ovn/tree/refs/heads/tmp-branch-22.03 > > I skipped main and branch-24.09 because those are already on the latest > OVS v3.4.z. Thanks for the update, other than commit subject, these patches look good to me! > > >> Also, shouldn't we update documentation to reflect this approach in >> Documentation/internals/ovs_submodule.rst for further bumps? And if > We should, you're right, I'll prepare a patch. > >> talking about documentation, I've got one note, which should be covered >> by new process. Imagine situation, where quite old OVS branch (let's >> say, 3.0) has a wanted commit (for example, fix for build with new >> compiler or latest libs), but the new patch release is not created >> because it is not a critical problem). I'd say we either need to request >> OVS community to bump patch release or bump from release to commit sha. >> What do you think here? Or, just leave it as is and decide how to bump >> in flexible manner in each individual case? >> > I think this is not the common case so maybe we can leave it flexible > for now. > > Regards, > Dumitru > -- Regards, Vladislav Odintsov ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [ovn-dev] Port mirroring filter support in ovn.
useful to have an ability to encap this traffic with a VXLAN, GENEVE, GRE or ERSPAN tunneling (different network analyzing solutions support these protocols for mirrored traffic; AWS does the same), set outer src/dst-ip, correct tunnel key and inject it into the LS pipeline, so that this encapsulated packet can traverse inside the overlay (actually, double-encaped) in any point of infrastructure relying on overlay routing. This gives us ability to send this mirrored traffic to any destination inside overlay topology - another subnet, even to another availability zone or outside of ovn but inside same vrf. But we faced that OF encap() action supports only nsh and mpls. Can you give us an advice whether it is possible to send double-encaped traffic with OVN somehow? Or we should extend encap() functinoality in this case? Potentially this double-encap can be reused for feature similar to AWS Gateway Load balancers [1]. 1: https://aws.amazon.com/elasticloadbalancing/gateway-load-balancer/ > >> since both options are currently not possible, I would greatly >> appreciate any insights or advice you may have regarding these approaches. >> > Thanks, > Dumitru > > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev -- Regards, Vladislav Odintsov ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn branch-24.09] ovs: Bump submodule to latest OVS branch-3.4.
On 14.10.2024 16:13, Dumitru Ceara wrote: > On 10/14/24 15:01, Vladislav Odintsov wrote: >> On 14.10.2024 14:47, Dumitru Ceara wrote: >>> On 10/13/24 10:19, Vladislav Odintsov wrote: >>>> This picks up the following relevant OVS changes: >>>> a15ce086d ofproto-dpif: Improve load balancing in dp_hash select >>>> groups. >>>> 76ba41b5c vconn: Always properly free flow stats reply. >>>> 64cb90507 ovsdb-idl: Fix IDL memory leak. >>>> ... and others. >>>> >>>> Reported-at: >>>> https://mail.openvswitch.org/pipermail/ovs-dev/2024-October/417627.html >>>> Signed-off-by: Vladislav Odintsov >>>> --- >>> Hi Vladislav, >>> >>>>ovs | 2 +- >>>>1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/ovs b/ovs >>>> index c598c05c8..a15ce086d 16 >>>> --- a/ovs >>>> +++ b/ovs >>>> @@ -1 +1 @@ >>>> -Subproject commit c598c05c85b2d38874a0ce8f7f088f6aae4fdabc >>>> +Subproject commit a15ce086d41f9dfe6c1589333413b8e777401ef0 >>> This would make branch-24.09 use a newer submodule version than the OVN >>> main branch does. >>> >>> I think we need this commit on main too, what do you think? >> Hi Dumitru, >> >> Agree. >> >> The patch: >> https://patchwork.ozlabs.org/project/ovn/patch/20241014125151.74094-1-vlodintsov@k2.cloud/ >> > I was talking offline to Ilya about all these bumps and he made a good > point: while the tip of OVS branch-3.y and the latest v3.y.z get almost > the same amount of testing in the OVS repo it might be a bit better to > use v3.y.z instead. That's because likely external users of OVS run > tagged OVS releases in production so those might get more external testing. > > I had a quick look at the main differences between choosing the tip of > branch-3.y and v3.y.z on all branches and I think we'd only miss: > > 99e7cf9cce1c vconn: Always properly free flow stats reply. > f59f19bf69a4 ovsdb-idl: Fix IDL memory leak. > > which might be OK. > > If you agree I can change your patches on all branches (no need to post > new ones) and apply them. > > What do you think? Well, I'm totally fine with this. Please feel free to modify my patches. Also, shouldn't we update documentation to reflect this approach in Documentation/internals/ovs_submodule.rst for further bumps? And if talking about documentation, I've got one note, which should be covered by new process. Imagine situation, where quite old OVS branch (let's say, 3.0) has a wanted commit (for example, fix for build with new compiler or latest libs), but the new patch release is not created because it is not a critical problem). I'd say we either need to request OVS community to bump patch release or bump from release to commit sha. What do you think here? Or, just leave it as is and decide how to bump in flexible manner in each individual case? > > Thanks, > Dumitru > -- Regards, Vladislav Odintsov ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn branch-24.09] ovs: Bump submodule to latest OVS branch-3.4.
On 14.10.2024 14:47, Dumitru Ceara wrote: > On 10/13/24 10:19, Vladislav Odintsov wrote: >> This picks up the following relevant OVS changes: >>a15ce086d ofproto-dpif: Improve load balancing in dp_hash select groups. >>76ba41b5c vconn: Always properly free flow stats reply. >>64cb90507 ovsdb-idl: Fix IDL memory leak. >>... and others. >> >> Reported-at: >> https://mail.openvswitch.org/pipermail/ovs-dev/2024-October/417627.html >> Signed-off-by: Vladislav Odintsov >> --- > Hi Vladislav, > >> ovs | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/ovs b/ovs >> index c598c05c8..a15ce086d 16 >> --- a/ovs >> +++ b/ovs >> @@ -1 +1 @@ >> -Subproject commit c598c05c85b2d38874a0ce8f7f088f6aae4fdabc >> +Subproject commit a15ce086d41f9dfe6c1589333413b8e777401ef0 > This would make branch-24.09 use a newer submodule version than the OVN > main branch does. > > I think we need this commit on main too, what do you think? Hi Dumitru, Agree. The patch: https://patchwork.ozlabs.org/project/ovn/patch/20241014125151.74094-1-vlodintsov@k2.cloud/ > > Thanks, > Dumitru > -- Regards, Vladislav Odintsov ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn] ovs: Bump submodule to latest OVS branch-3.4.
This picks up the following relevant OVS changes: a15ce086d ofproto-dpif: Improve load balancing in dp_hash select groups. 76ba41b5c vconn: Always properly free flow stats reply. 64cb90507 ovsdb-idl: Fix IDL memory leak. ... and others. Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2024-October/417627.html Signed-off-by: Vladislav Odintsov --- ovs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ovs b/ovs index c598c05c8..a15ce086d 16 --- a/ovs +++ b/ovs @@ -1 +1 @@ -Subproject commit c598c05c85b2d38874a0ce8f7f088f6aae4fdabc +Subproject commit a15ce086d41f9dfe6c1589333413b8e777401ef0 -- 2.46.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn branch-22.12 v2] ovs: Bump submodule to latest OVS branch-3.1.
This picks up the following relevant OVS changes: a0af48b75 ofproto-dpif: Improve load balancing in dp_hash select groups. 99e7cf9cc vconn: Always properly free flow stats reply. f59f19bf6 ovsdb-idl: Fix IDL memory leak. 7694dfacb compiler: Fix errors in Clang 17 ubsan checks. faf175155 vlog: Destroy async_append first then close log_fd. 483bc24e4 hash, jhash: Fix unaligned access to the hash remainder. bd5b5d3b3 ovs-atomic: Fix inclusion of Clang header by GCC 14. bb61b5fe8 vconn: Count vconn_sent regardless of log level. ... and others. Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2024-October/417627.html Signed-off-by: Vladislav Odintsov --- v2: There was an error: previous patch was bumped as branch-22.09 by mistake. New version bumps OVS submodule to latest branch-3.1 commit instead of branch-3.0. --- ovs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ovs b/ovs index 8fd5f77cd..a0af48b75 16 --- a/ovs +++ b/ovs @@ -1 +1 @@ -Subproject commit 8fd5f77cd84ea04667f987c7b84181604dc99f60 +Subproject commit a0af48b753ef3215091356f112bbb89737f286d9 -- 2.46.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn branch-22.03] ovs: Bump submodule to latest OVS branch-3.0.
This picks up the following relevant OVS changes: 876584141 vconn: Always properly free flow stats reply. 2d60ee374 ovsdb-idl: Fix IDL memory leak. 4198bcdfb compiler: Fix errors in Clang 17 ubsan checks. ec048ab62 vlog: Destroy async_append first then close log_fd. dbaf7271c hash, jhash: Fix unaligned access to the hash remainder. 85285fb45 ovs-atomic: Fix inclusion of Clang header by GCC 14. e040f35b2 vconn: Count vconn_sent regardless of log level. ... and others. Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2024-October/417627.html Signed-off-by: Vladislav Odintsov --- ovs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ovs b/ovs index 94191b7a4..a9fb87867 16 --- a/ovs +++ b/ovs @@ -1 +1 @@ -Subproject commit 94191b7a4926204510931770c52992c9ea24d4e2 +Subproject commit a9fb878679fb08bc3807cf455b0d3ce1797be26d -- 2.46.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn branch-22.06] ovs: Bump submodule to latest OVS branch-3.0.
This picks up the following relevant OVS changes: 876584141 vconn: Always properly free flow stats reply. 2d60ee374 ovsdb-idl: Fix IDL memory leak. 4198bcdfb compiler: Fix errors in Clang 17 ubsan checks. ec048ab62 vlog: Destroy async_append first then close log_fd. dbaf7271c hash, jhash: Fix unaligned access to the hash remainder. 85285fb45 ovs-atomic: Fix inclusion of Clang header by GCC 14. e040f35b2 vconn: Count vconn_sent regardless of log level. ... and others. Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2024-October/417627.html Signed-off-by: Vladislav Odintsov --- ovs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ovs b/ovs index 94191b7a4..a9fb87867 16 --- a/ovs +++ b/ovs @@ -1 +1 @@ -Subproject commit 94191b7a4926204510931770c52992c9ea24d4e2 +Subproject commit a9fb878679fb08bc3807cf455b0d3ce1797be26d -- 2.46.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn branch-22.09] ovs: Bump submodule to latest OVS branch-3.0.
This picks up the following relevant OVS changes: 876584141 vconn: Always properly free flow stats reply. 2d60ee374 ovsdb-idl: Fix IDL memory leak. 4198bcdfb compiler: Fix errors in Clang 17 ubsan checks. ec048ab62 vlog: Destroy async_append first then close log_fd. dbaf7271c hash, jhash: Fix unaligned access to the hash remainder. 85285fb45 ovs-atomic: Fix inclusion of Clang header by GCC 14. e040f35b2 vconn: Count vconn_sent regardless of log level. ... and others. Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2024-October/417627.html Signed-off-by: Vladislav Odintsov --- ovs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ovs b/ovs index 94191b7a4..a9fb87867 16 --- a/ovs +++ b/ovs @@ -1 +1 @@ -Subproject commit 94191b7a4926204510931770c52992c9ea24d4e2 +Subproject commit a9fb878679fb08bc3807cf455b0d3ce1797be26d -- 2.46.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn branch-22.12] ovs: Bump submodule to latest OVS branch-3.0.
This picks up the following relevant OVS changes: 876584141 vconn: Always properly free flow stats reply. 2d60ee374 ovsdb-idl: Fix IDL memory leak. 4198bcdfb compiler: Fix errors in Clang 17 ubsan checks. ec048ab62 vlog: Destroy async_append first then close log_fd. dbaf7271c hash, jhash: Fix unaligned access to the hash remainder. 85285fb45 ovs-atomic: Fix inclusion of Clang header by GCC 14. e040f35b2 vconn: Count vconn_sent regardless of log level. ... and others. Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2024-October/417627.html Signed-off-by: Vladislav Odintsov --- ovs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ovs b/ovs index 94191b7a4..a9fb87867 16 --- a/ovs +++ b/ovs @@ -1 +1 @@ -Subproject commit 94191b7a4926204510931770c52992c9ea24d4e2 +Subproject commit a9fb878679fb08bc3807cf455b0d3ce1797be26d -- 2.46.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn branch-23.03] ovs: Bump submodule to latest OVS branch-3.1.
This picks up the following relevant OVS changes: a0af48b75 ofproto-dpif: Improve load balancing in dp_hash select groups. 99e7cf9cc vconn: Always properly free flow stats reply. f59f19bf6 ovsdb-idl: Fix IDL memory leak. 7694dfacb compiler: Fix errors in Clang 17 ubsan checks. faf175155 vlog: Destroy async_append first then close log_fd. 483bc24e4 hash, jhash: Fix unaligned access to the hash remainder. bd5b5d3b3 ovs-atomic: Fix inclusion of Clang header by GCC 14. bb61b5fe8 vconn: Count vconn_sent regardless of log level. ... and others. Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2024-October/417627.html Signed-off-by: Vladislav Odintsov --- ovs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ovs b/ovs index 8fd5f77cd..a0af48b75 16 --- a/ovs +++ b/ovs @@ -1 +1 @@ -Subproject commit 8fd5f77cd84ea04667f987c7b84181604dc99f60 +Subproject commit a0af48b753ef3215091356f112bbb89737f286d9 -- 2.46.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn branch-23.06] ovs: Bump submodule to latest OVS branch-3.1.
This picks up the following relevant OVS changes: a0af48b75 ofproto-dpif: Improve load balancing in dp_hash select groups. 99e7cf9cc vconn: Always properly free flow stats reply. f59f19bf6 ovsdb-idl: Fix IDL memory leak. 7694dfacb compiler: Fix errors in Clang 17 ubsan checks. faf175155 vlog: Destroy async_append first then close log_fd. 483bc24e4 hash, jhash: Fix unaligned access to the hash remainder. bd5b5d3b3 ovs-atomic: Fix inclusion of Clang header by GCC 14. bb61b5fe8 vconn: Count vconn_sent regardless of log level. ... and others. Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2024-October/417627.html Signed-off-by: Vladislav Odintsov --- ovs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ovs b/ovs index 8fd5f77cd..a0af48b75 16 --- a/ovs +++ b/ovs @@ -1 +1 @@ -Subproject commit 8fd5f77cd84ea04667f987c7b84181604dc99f60 +Subproject commit a0af48b753ef3215091356f112bbb89737f286d9 -- 2.46.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn branch-23.09] ovs: Bump submodule to branch-3.2.
From: Dumitru Ceara Specifically the following commit: 4102674b3e ovsdb-idl: Preserve change_seqno when deleting rows. Without it, in specific cases, the IDL might incorrectly report deletion of yet to be seen records. This commit differs from original by bumping OVS submodule to branch-3.2 related commit ec1d73016 ("ovsdb-idl: Preserve change_seqno when deleting rows.") Signed-off-by: Dumitru Ceara Acked-by: Ilya Maximets (cherry picked from commit 66ef6709678486f7abf88db10eed15fb72edcc4a) Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2024-October/417627.html Signed-off-by: Vladislav Odintsov --- The current revision differs from original commit referred above: - ovs submodule commit sha if another - discarded change at classifier_lookup() - adjusted commit message @Dumitru, @Ilya, I'm not aware of a correct handling of Signed-off-by and Acked-By tags and commit content & message modification when cherry-picking, just wanted to save credits. So if it is not right to keep them, or to modify backport so please let me know, I can re-send v2 as a normal non-backport patch. --- controller/ofctrl.c | 2 +- ovs | 2 +- tests/test-ovn.c| 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/controller/ofctrl.c b/controller/ofctrl.c index 497890eed..718baac18 100644 --- a/controller/ofctrl.c +++ b/controller/ofctrl.c @@ -3059,7 +3059,7 @@ ofctrl_inject_pkt(const struct ovsrec_bridge *br_int, const char *flow_s, uint64_t packet_stub[128 / 8]; struct dp_packet packet; dp_packet_use_stub(&packet, packet_stub, sizeof packet_stub); -flow_compose(&packet, &uflow, NULL, 64); +flow_compose(&packet, &uflow, NULL, 64, false); uint64_t ofpacts_stub[1024 / 8]; struct ofpbuf ofpacts = OFPBUF_STUB_INITIALIZER(ofpacts_stub); diff --git a/ovs b/ovs index c88a35fc2..c2f287013 16 --- a/ovs +++ b/ovs @@ -1 +1 @@ -Subproject commit c88a35fc29f0c0eb6189853bfc738c2100d4860f +Subproject commit c2f287013025ad5b0c40e0c7fc3a9042d4899ce1 diff --git a/tests/test-ovn.c b/tests/test-ovn.c index 16d2d779d..6f38b1493 100644 --- a/tests/test-ovn.c +++ b/tests/test-ovn.c @@ -1238,7 +1238,7 @@ test_expr_to_packets(struct ovs_cmdl_context *ctx OVS_UNUSED) uint64_t packet_stub[128 / 8]; struct dp_packet packet; dp_packet_use_stub(&packet, packet_stub, sizeof packet_stub); -flow_compose(&packet, &uflow, NULL, 64); +flow_compose(&packet, &uflow, NULL, 64, false); struct ds output = DS_EMPTY_INITIALIZER; const uint8_t *buf = dp_packet_data(&packet); -- 2.46.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn branch-24.03] ovs: Bump submodule to latest OVS branch-3.3.
This picks up the following relevant OVS changes: 618944a79 ofproto-dpif: Improve load balancing in dp_hash select groups. bb49e027c vconn: Always properly free flow stats reply. 58ff23947 ovsdb-idl: Fix IDL memory leak. f02dc3cfe vlog: Destroy async_append first then close log_fd. 01eca18be hash, jhash: Fix unaligned access to the hash remainder. ... and others. Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2024-October/417627.html Signed-off-by: Vladislav Odintsov --- ovs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ovs b/ovs index f19448b86..618944a79 16 --- a/ovs +++ b/ovs @@ -1 +1 @@ -Subproject commit f19448b8618967a108ec6f34713dd811ce1d1334 +Subproject commit 618944a79fec8e98d5880ca2bbb60304855d4437 -- 2.46.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn branch-24.09] ovs: Bump submodule to latest OVS branch-3.4.
This picks up the following relevant OVS changes: a15ce086d ofproto-dpif: Improve load balancing in dp_hash select groups. 76ba41b5c vconn: Always properly free flow stats reply. 64cb90507 ovsdb-idl: Fix IDL memory leak. ... and others. Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2024-October/417627.html Signed-off-by: Vladislav Odintsov --- ovs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ovs b/ovs index c598c05c8..a15ce086d 16 --- a/ovs +++ b/ovs @@ -1 +1 @@ -Subproject commit c598c05c85b2d38874a0ce8f7f088f6aae4fdabc +Subproject commit a15ce086d41f9dfe6c1589333413b8e777401ef0 -- 2.46.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn branch-23.03] ovs: Bump submodule to tip of OVS branch-3.2.
From: Dumitru Ceara This picks up the following relevant commit: cd8ffc956c3c ovs-atomic: Fix inclusion of Clang header by GCC 14. Without this builds on Fedora 40 (rawhide) are broken due to failing to compile the submodule. Signed-off-by: Dumitru Ceara Acked-by: Numan Siddique Signed-off-by: Numan Siddique (cherry picked from commit f224c6e5f69c099ddb008f99dba2e19a902a612f) Signed-off-by: Vladislav Odintsov --- Without this patch there are errors building OVN on a modern systems. I kindly request for this patch to be backported down to 22.03 LTS including already officially unsupported branches 23.03, 22.09 and 22.06, since we internally still need to base on 22.09 branch in development. Thanks in advance if it is possible to make an exception and ignore backport rules for non-LTS releases and patch them too. --- ovs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ovs b/ovs index 8fd5f77cd..49e64f13b 16 --- a/ovs +++ b/ovs @@ -1 +1 @@ -Subproject commit 8fd5f77cd84ea04667f987c7b84181604dc99f60 +Subproject commit 49e64f13b2c965f5b53a65eeab70ac2e3f0bf69a -- 2.46.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn branch-23.09] ovs: Bump submodule to tip of OVS branch-3.2.
From: Dumitru Ceara This picks up the following relevant commit: cd8ffc956c3c ovs-atomic: Fix inclusion of Clang header by GCC 14. Without this builds on Fedora 40 (rawhide) are broken due to failing to compile the submodule. Signed-off-by: Dumitru Ceara Acked-by: Numan Siddique Signed-off-by: Numan Siddique (cherry picked from commit f224c6e5f69c099ddb008f99dba2e19a902a612f) Signed-off-by: Vladislav Odintsov --- Without this patch there are errors building OVN on a modern systems. I kindly request for this patch to be backported down to 22.03 LTS including already officially unsupport branches 23.09, 23.03, 22.09, since we internally still need to base on 22.09 branch in development. Thanks in advance if it is possible to make an exception and ignore backport rules for non-LTS releases and patch them too. --- ovs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ovs b/ovs index 8fd5f77cd..49e64f13b 16 --- a/ovs +++ b/ovs @@ -1 +1 @@ -Subproject commit 8fd5f77cd84ea04667f987c7b84181604dc99f60 +Subproject commit 49e64f13b2c965f5b53a65eeab70ac2e3f0bf69a -- 2.46.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] RFC OVN: fabric integration
On 16.09.2024 17:41, Dumitru Ceara wrote: On 9/12/24 19:45, Roberto Bartzen Acosta wrote: > - The result of this synchronization is basically: > - SB->NB: creating/deleting/updating Logical_Router_Static_Route > entries for learned routes in the Routing_Information_Base > table (using the > key). Why would you see the need to push the learned routes to the northbound database? I would see this as just creating chaos in the CMS. I would rather keep them only in the SB and let northd do the merging. This is not necessary at all! It would not be necessary to sync a new table with NB since the learned routes can be redistributed as static routes via OVN-IC, for example. The solution as a whole needs to be generic enough that we can redistribute/learn static routes in addition to directly connected routes. So, as long as northd merges these routes, it should work perfectly fine! Of course we need to be careful with the route policies that can block some prefix. While I agree that it's probably not desirable to write these dynamic routes in the NB I think it would be useful to have a way to dump both "static" and "dynamic" routing table contents from a single place. It would make debugging easier. I totally agree, that it is very desired that CMS can dump BGP learnt routes from OVN. This is the only one source of this type of data. Not only for debugging purposes, but also to return these routes in the CMS API to the end users. NB is a good place to reflect useful information back to the CMS. It already has some read-only data populated by ovn-northd (NB_Global.options.max_tunid, Logical_Switch_Port.options.tag, Logical_Router_Static_Route.external_ids.ic-lrearned-route and others). It seems to me that there is a recommended way to "gather" such information into CMS - through NB database. Though currently there is an example of anti-pattern - Service_Monitor SB table. We do use it to dump information about states of Load Balancers' backends (online/offline). This is not a clean solution to give the CMS direct access to SB, since it is not a config plane, but OVN internals. E.g.: ovn-nbctl --all lr-route-list [dumps all static routes and also learnt (received) routes] However, that means we either need ovn-nbctl to connect to the SB or we propagate the information to the NB for nbctl to read. Optionally listing of BGP routes can give the wrong view for the administator about routing. The static and ic-learnt routes is insufficient to understand the real routing table if BGP routes age in the game. Also, giving an access from ovn-nbctl to SB seems to be a huge breakage of NB/SB split concept, looks like there is a need for ovn-northd to sync information back to NB. Regards, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn] news: Fix indentation for an entry.
Signed-off-by: Vladislav Odintsov --- NEWS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/NEWS b/NEWS index 318f63195..e0a48aaa7 100644 --- a/NEWS +++ b/NEWS @@ -59,7 +59,7 @@ OVN v24.09.0 - xx xxx - The NB_Global.debug_drop_domain_id configured value is now overridden by the ID associated with the Sampling_App record created for drop sampling (Sampling_App.type configured as "drop"). -- Add support for ACL sampling through the new Sample_Collector and Sample + - Add support for ACL sampling through the new Sample_Collector and Sample tables. Sampling is supported for both traffic that creates new connections and for traffic that is part of an existing connection. - Add "external_ids:ovn-encap-ip-default" config for ovn-controller to -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn] news: Fix indentation for an entry.
Signed-off-by: Vladislav Odintsov --- NEWS | 2 +- ovs | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/NEWS b/NEWS index 318f63195..e0a48aaa7 100644 --- a/NEWS +++ b/NEWS @@ -59,7 +59,7 @@ OVN v24.09.0 - xx xxx - The NB_Global.debug_drop_domain_id configured value is now overridden by the ID associated with the Sampling_App record created for drop sampling (Sampling_App.type configured as "drop"). -- Add support for ACL sampling through the new Sample_Collector and Sample + - Add support for ACL sampling through the new Sample_Collector and Sample tables. Sampling is supported for both traffic that creates new connections and for traffic that is part of an existing connection. - Add "external_ids:ovn-encap-ip-default" config for ovn-controller to diff --git a/ovs b/ovs index 0aa14d912..bf1b16364 16 --- a/ovs +++ b/ovs @@ -1 +1 @@ -Subproject commit 0aa14d912d9a29d07ebc727007a1f21e3639eea5 +Subproject commit bf1b16364b3f01b0ff5f2f6e76842e666226a17b -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v3] northd: Support routing over other address families.
Hi Frode, regards, Vladislav Odintsov > On 26 Aug 2024, at 09:22, Frode Nordahl wrote: > > On Sun, Aug 25, 2024 at 8:43 AM Vladislav Odintsov wrote: >> >> Hi Felix, >> >> I’m wondering which task or problem you want to achieve with this change? >> While this is a definitely useful feature in physical world, where network >> switches can use EUI-64 link-local addresses, how do you plan to use it with >> OVN? >> Do you have plans to implement auto-generated LRP unique mac addresses and >> EUI-64 IPv6 LLAs in order to utilize this patch feature to remove IPAM >> complexity from CMS on allocating addresses for peering networks? Or, you’re >> mixing it somehow with Logical_Router_Port.options.prefix feature? >> If not, why not just use IPv4 LLAs, why IPv6? >> >> So, could you please explain the entire use case in more detail. > > FWIW; just wanted to chime in with our interest / support for this > being a useful addition. > > We plan to use it together with the stream of work coming out of the > OVN fabric integration thread [0] this coming cycle. The use case is > to form relationships with the physical ToR switches, which as you > point out typically use IPv6 LLA to implement BGP "unnumbered" > functionality. > > IPv6 LLAs are there by default today in OVN and in the ToRs, so I'd > flip the question on the head and ask why would you want to use IPv4 > LLAs? Oh, that was the missing puzzle! Please, forgive my ignorance, I didn’t know this feature was already implemented. Now the picture is totally clear. Thanks! Though the CMS still has to ensure that we configure unique MAC addresses, right? Don’t we want to go ahead and add support to generate them? > > 0: https://mail.openvswitch.org/pipermail/ovs-dev/2024-August/416296.html > > -- > Frode Nordahl > >> regards, >> Vladislav Odintsov >> >>>> On 22 Apr 2024, at 18:46, Felix Huettner via dev >>>> wrote: >>> In most cases IPv4 packets are routed only over other IPv4 networks and >>> IPv6 packets are routed only over IPv6 networks. However there is no >>> inherent reason for this limitation. Routing IPv4 packets over IPv6 >>> networks just requires the router to contain a route for an IPv4 network >>> with an IPv6 nexthop. >>> >>> This was previously prevented in OVN in ovn-nbctl and northd. By >>> removing these filters the forwarding will work if the mac addresses are >>> prepopulated. >>> >>> If the mac addresses are not prepopulated we will attempt to resolve them >>> using >>> the original address family of the packet and not the address family of the >>> nexthop. This will fail and we will not forward the packet. >>> >>> This feature can for example be used by service providers to >>> interconnect multiple IPv4 networks of a customer without needing to >>> negotiate free IPv4 addresses by just using any IPv6 address. >>> >>> Signed-off-by: Felix Huettner >>> --- >>> v2->v3: fix uninitialized variable >>> v1->v2: >>> - move ipv4 info to parsed_route >>> - add tests for lr-route-add >>> - switch tests to use fmt_pkt >>> - some minor test cleanups >>> NEWS | 4 + >>> northd/northd.c | 57 ++--- >>> tests/ovn-nbctl.at| 26 ++- >>> tests/ovn.at | 511 ++ >>> utilities/ovn-nbctl.c | 12 +- >>> 5 files changed, 571 insertions(+), 39 deletions(-) >>> >>> diff --git a/NEWS b/NEWS >>> index 141f1831c..14a935c86 100644 >>> --- a/NEWS >>> +++ b/NEWS >>> @@ -13,6 +13,10 @@ Post v24.03.0 >>>"lflow-stage-to-oftable STAGE_NAME" that converts stage name into >>> OpenFlow >>>table id. >>> - Rename the ovs-sandbox script to ovn-sandbox. >>> + - Allow Static Routes where the address families of ip_prefix and nexthop >>> +diverge (e.g. IPv4 packets over IPv6 links). This is currently limited >>> to >>> +nexthops that have their mac addresses prepopulated (so >>> +dynamic_neigh_routers must be false). >>> >>> OVN v24.03.0 - 01 Mar 2024 >>> -- >>> diff --git a/northd/northd.c b/northd/northd.c >>> index 331d9c267..f2357af15 100644 >>> --- a/northd/northd.c >>> +++ b/northd/northd.c >>> @@ -10194,6 +10194,8 @@ struct parsed_route { >>>const struct nbrec_logical_router_static_route *rou
Re: [ovs-dev] [PATCH ovn v3] northd: Support routing over other address families.
Hi Felix, I’m wondering which task or problem you want to achieve with this change? While this is a definitely useful feature in physical world, where network switches can use EUI-64 link-local addresses, how do you plan to use it with OVN? Do you have plans to implement auto-generated LRP unique mac addresses and EUI-64 IPv6 LLAs in order to utilize this patch feature to remove IPAM complexity from CMS on allocating addresses for peering networks? Or, you’re mixing it somehow with Logical_Router_Port.options.prefix feature? If not, why not just use IPv4 LLAs, why IPv6? So, could you please explain the entire use case in more detail. regards, Vladislav Odintsov > On 22 Apr 2024, at 18:46, Felix Huettner via dev > wrote: > In most cases IPv4 packets are routed only over other IPv4 networks and > IPv6 packets are routed only over IPv6 networks. However there is no > inherent reason for this limitation. Routing IPv4 packets over IPv6 > networks just requires the router to contain a route for an IPv4 network > with an IPv6 nexthop. > > This was previously prevented in OVN in ovn-nbctl and northd. By > removing these filters the forwarding will work if the mac addresses are > prepopulated. > > If the mac addresses are not prepopulated we will attempt to resolve them > using > the original address family of the packet and not the address family of the > nexthop. This will fail and we will not forward the packet. > > This feature can for example be used by service providers to > interconnect multiple IPv4 networks of a customer without needing to > negotiate free IPv4 addresses by just using any IPv6 address. > > Signed-off-by: Felix Huettner > --- > v2->v3: fix uninitialized variable > v1->v2: > - move ipv4 info to parsed_route > - add tests for lr-route-add > - switch tests to use fmt_pkt > - some minor test cleanups > NEWS | 4 + > northd/northd.c | 57 ++--- > tests/ovn-nbctl.at| 26 ++- > tests/ovn.at | 511 ++ > utilities/ovn-nbctl.c | 12 +- > 5 files changed, 571 insertions(+), 39 deletions(-) > > diff --git a/NEWS b/NEWS > index 141f1831c..14a935c86 100644 > --- a/NEWS > +++ b/NEWS > @@ -13,6 +13,10 @@ Post v24.03.0 > "lflow-stage-to-oftable STAGE_NAME" that converts stage name into OpenFlow > table id. > - Rename the ovs-sandbox script to ovn-sandbox. > + - Allow Static Routes where the address families of ip_prefix and nexthop > +diverge (e.g. IPv4 packets over IPv6 links). This is currently limited to > +nexthops that have their mac addresses prepopulated (so > +dynamic_neigh_routers must be false). > > OVN v24.03.0 - 01 Mar 2024 > -- > diff --git a/northd/northd.c b/northd/northd.c > index 331d9c267..f2357af15 100644 > --- a/northd/northd.c > +++ b/northd/northd.c > @@ -10194,6 +10194,8 @@ struct parsed_route { > const struct nbrec_logical_router_static_route *route; > bool ecmp_symmetric_reply; > bool is_discard_route; > +bool is_ipv4_prefix; > +bool is_ipv4_nexthop; > }; > > static uint32_t > @@ -10219,6 +10221,8 @@ parsed_routes_add(struct ovn_datapath *od, const > struct hmap *lr_ports, > /* Verify that the next hop is an IP address with an all-ones mask. */ > struct in6_addr nexthop; > unsigned int plen; > +bool is_ipv4_nexthop = true; > +bool is_ipv4_prefix; > bool is_discard_route = !strcmp(route->nexthop, "discard"); > bool valid_nexthop = route->nexthop[0] && !is_discard_route; > if (valid_nexthop) { > @@ -10237,6 +10241,7 @@ parsed_routes_add(struct ovn_datapath *od, const > struct hmap *lr_ports, > UUID_ARGS(&route->header_.uuid)); > return NULL; > } > +is_ipv4_nexthop = IN6_IS_ADDR_V4MAPPED(&nexthop); > } > > /* Parse ip_prefix */ > @@ -10248,18 +10253,7 @@ parsed_routes_add(struct ovn_datapath *od, const > struct hmap *lr_ports, > UUID_ARGS(&route->header_.uuid)); > return NULL; > } > - > -/* Verify that ip_prefix and nexthop have same address familiy. */ > -if (valid_nexthop) { > -if (IN6_IS_ADDR_V4MAPPED(&prefix) != IN6_IS_ADDR_V4MAPPED(&nexthop)) > { > -static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); > -VLOG_WARN_RL(&rl, "Address family doesn't match between > 'ip_prefix'" > - " %s and 'nexthop' %s in static route "UUID_FMT, > - route->ip_prefix, route->nexthop, > -
Re: [ovs-dev] [Patch ovn v9] northd: Routing protocol port redirection.
Thank you Martin and Dumitru for your work on this! On 13.08.2024 22:46, Dumitru Ceara wrote: On 8/12/24 19:30, Vladislav Odintsov wrote: On 13.08.2024 00:14, Dumitru Ceara wrote: On 8/12/24 18:21, Martin Kalcok wrote: This change adds two new LRP options: * routing-protocol-redirect * routing-protocols These allow redirection of a routing protocol traffic to an Logical Switch Port. This enables external routing daemons to listen on an interface bound to an LSP and effectively act as if they were listening on (and speaking from) LRP's IP address. Option 'routing-protocols' takes a comma-separated list of routing protocols whose traffic should be redirected. Currently supported are BGP (tcp 179) and BFD (udp 3784). Option 'routing-protocol-redirect' expects a string with an LSP name. When both of these options are set, any traffic entering LS that's destined for LRP's IP addresses (including IPv6 LLA) and routing protocol's port number, is redirected to the LSP specified in the 'routing-protocol-redirect' value. NOTE: this feature is experimental and may be subject to removal/change in the future. Signed-off-by: Martin Kalcok --- v9 contains small changes based on the review of v8: * Simplified search for the port specified in 'routing-protocol-redirect', using 'ovn_port_find' * As a result this change a new possible warning was added when LRP is not connected to the same LS as LSP specified in 'routing-protocol-redirect'. * Datapath test for this feature now includes verification of BFD's UDP traffic. * These tests required some more care as Ncat produced false positive results even when sending to a port where nothing was listening. My assumption is that Ncat tries to assert succes of a UDP connection based on lack of ICMP Port Unreachable message, and LR probably does not generate these? * nit: null pointer checks changed from 'if (p == NULL)' to 'if (!p)' for consistency. This version looks good to me! Acked-by: Dumitru Ceara Vladislav, do you happen to have some time to try this version out on your end too? Yes, just now finished testing. With centralized routing the BGP and BFD works well (in my setup there are two VRFs with BGP and BFD peerings configured as a haipnit inside one node to each other): # sh run <...snip...> router bgp 64512 vrf dxvif-62C25580 bgp router-id 169.254.252.1 no bgp ebgp-requires-policy no bgp network import-check neighbor 169.254.252.2 remote-as 64513 neighbor 169.254.252.2 bfd exit ! router bgp 64513 vrf dxvif-9ED34880 bgp router-id 169.254.252.2 no bgp ebgp-requires-policy no bgp network import-check neighbor 169.254.252.1 remote-as 64512 neighbor 169.254.252.1 bfd ! address-family ipv4 unicast network 10.0.0.0/24 exit-address-family exit # sh ip bgp vrf all Instance dxvif-62C25580: BGP table version is 1, local router ID is 169.254.252.1, vrf id 760 Default local pref 100, local AS 64512 Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found Network Next Hop Metric LocPrf Weight Path *> 10.0.0.0/24 169.254.252.2 0 0 64513 i Displayed 1 routes and 1 total paths Instance dxvif-9ED34880: BGP table version is 1, local router ID is 169.254.252.2, vrf id 763 Default local pref 100, local AS 64513 Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found Network Next Hop Metric LocPrf Weight Path *> 10.0.0.0/24 0.0.0.0 0 32768 i Displayed 1 routes and 1 total paths # sh bfd peers BFD Peers: peer 169.254.252.1 local-address 169.254.252.2 vrf dxvif-9ED34880 interface dx-9ED34880-v ID: 3199966869 Remote ID: 2401437121 Active mode Status: up Uptime: 26 second(s) Diagnostics: ok Remote diagnostics: ok Peer Type: dynamic RTT min/avg/max: 0/0/0 usec Local timers: Detect-multiplier: 3 Receive interval: 300ms Transmission interval: 300ms Echo receive interval: 50ms Echo transmission interval: disabled Remote timers: Detect-multiplier: 3 Receive interval: 300ms Transmission interval: 300ms Echo rec
Re: [ovs-dev] [Patch ovn v9] northd: Routing protocol port redirection.
Status: up Uptime: 26 second(s) Diagnostics: ok Remote diagnostics: ok Peer Type: dynamic RTT min/avg/max: 0/0/0 usec Local timers: Detect-multiplier: 3 Receive interval: 300ms Transmission interval: 300ms Echo receive interval: 50ms Echo transmission interval: disabled Remote timers: Detect-multiplier: 3 Receive interval: 300ms Transmission interval: 300ms Echo receive interval: 50ms At this point this looks good with a note, that for LB VIPs and NAT addresses we do use ha-chassis-group with primary/secondary chassis, which is currently not supported by the redirect feature. This probably should be somehow addressed in a future development. Tested-by: Vladislav Odintsov Mark, Numan, Han, as discussed before branching, I think it's fine to include this experimental feature on branch 24.09 (and in v24.09.0) too. Do you guys agree? Thanks, Dumitru northd/northd.c | 231 northd/northd.h | 7 ++ northd/ovn-northd.8.xml | 54 ++ ovn-nb.xml | 42 tests/ovn-northd.at | 93 tests/system-ovn.at | 149 ++ 6 files changed, 576 insertions(+) diff --git a/northd/northd.c b/northd/northd.c index 5ad30d854..8a240d93d 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -14002,6 +14002,234 @@ build_arp_resolve_flows_for_lrp(struct ovn_port *op, } } +static void +build_routing_protocols_redirect_rule__( +const char *s_addr, const char *redirect_port_name, int protocol_port, +const char *proto, bool is_ipv6, struct ovn_port *ls_peer, +struct lflow_table *lflows, struct ds *match, struct ds *actions, +struct lflow_ref *lflow_ref) +{ +int ip_ver = is_ipv6 ? 6 : 4; +ds_clear(actions); +ds_put_format(actions, "outport = \"%s\"; output;", redirect_port_name); + +/* Redirect packets in the input pipeline destined for LR's IP + * and the routing protocol's port to the LSP specified in + * 'routing-protocol-redirect' option.*/ +ds_clear(match); +ds_put_format(match, "ip%d.dst == %s && %s.dst == %d", ip_ver, s_addr, + proto, protocol_port); +ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, 100, + ds_cstr(match), + ds_cstr(actions), + lflow_ref); + +/* To accomodate "peer" nature of the routing daemons, redirect also + * replies to the daemons' client requests. */ +ds_clear(match); +ds_put_format(match, "ip%d.dst == %s && %s.src == %d", ip_ver, s_addr, + proto, protocol_port); +ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, 100, + ds_cstr(match), + ds_cstr(actions), + lflow_ref); +} + +static void +apply_routing_protocols_redirect__( +const char *s_addr, const char *redirect_port_name, int protocol_flags, +bool is_ipv6, struct ovn_port *ls_peer, struct lflow_table *lflows, +struct ds *match, struct ds *actions, struct lflow_ref *lflow_ref) +{ +if (protocol_flags & REDIRECT_BGP) { +build_routing_protocols_redirect_rule__(s_addr, redirect_port_name, +179, "tcp", is_ipv6, ls_peer, +lflows, match, actions, +lflow_ref); +} + +if (protocol_flags & REDIRECT_BFD) { +build_routing_protocols_redirect_rule__(s_addr, redirect_port_name, +3784, "udp", is_ipv6, ls_peer, +lflows, match, actions, +lflow_ref); +} + +/* Because the redirected port shares IP and MAC addresses with the LRP, + * special consideration needs to be given to the signaling protocols. */ +ds_clear(actions); +ds_put_format(actions, + "clone { outport = \"%s\"; output; }; " + "outport = %s; output;", + redirect_port_name, ls_peer->json_key); +if (is_ipv6) { +/* Ensure that redirect port receives copy of NA messages destined to + * its IP.*/ +ds_clear(match); +ds_put_format(match, "ip6.dst == %s && nd_na", s_addr); +ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, 100, + ds_cstr(match), + ds_cstr(actions), + lflow_ref); +} else { +/* Ensure that redirect port receives copy of ARP replies destined to
Re: [ovs-dev] [ovn] OF connection dropped when OVS port mac is changes
Hi Ilya, thanks for the quick response! On 12.08.2024 23:42, Ilya Maximets wrote: On 8/12/24 18:33, Ilya Maximets wrote: On 8/12/24 18:24, Vladislav Odintsov wrote: Hi, I've faced with a behavior of OVS/OVN (I couldn't understand which part to blame) loosing openflow connection if I change mac address of an added to br-int logical port, which is claimed as a normal VIF port by ovn-controller. There is a simple reproducer script: ovn-nbctl ls-add test ovn-nbctl lsp-add test test1 ip li add test1 type dummy ovs-vsctl add-port br-int test1 ip li set test1 add 00:00:00:00:00:01 When the last line is executed in ovn-controller.log I see openflow is re-connected: 2024-08-12T15:42:02.493Z|03705|rconn(ovn_pinctrl0)|INFO|unix:/run/openvswitch/br-int.mgmt: connection closed by peer 2024-08-12T15:42:02.494Z|00222|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connection closed by peer 2024-08-12T15:42:02.494Z|00223|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connection closed by peer 2024-08-12T15:42:03.495Z|00224|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connecting... 2024-08-12T15:42:03.495Z|00225|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connecting... 2024-08-12T15:42:03.495Z|03706|rconn(ovn_pinctrl0)|INFO|unix:/run/openvswitch/br-int.mgmt: connecting... 2024-08-12T15:42:03.495Z|00226|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connected In ovs-vswitchd with enabled DBG logs there is: 2024-08-12T16:01:08.694Z|3796712|poll_loop|DBG|wakeup due to [POLLIN] on fd 18 (NETLINK_ROUTE<->NETLINK_ROUTE) at lib/netlink-socket.c:1409 (0% CPU usage) 2024-08-12T16:01:08.695Z|3796713|poll_loop|DBG|wakeup due to [POLLIN] on fd 16 (NETLINK_ROUTE<->NETLINK_ROUTE) at lib/netlink-socket.c:1409 (0% CPU usage) 2024-08-12T16:01:08.695Z|3796714|netlink_socket|DBG|Dropped 505 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate 2024-08-12T16:01:08.695Z|3796715|netlink_socket|DBG|nl_sock_recv__ (Success): nl(len:1360, type=16(family-defined), flags=0, seq=0, pid=0 2024-08-12T16:01:08.695Z|3796716|dpif|DBG|Dropped 15 log messages in last 0 seconds (most recently, 0 seconds ago) due to excessive rate 2024-08-12T16:01:08.695Z|3796717|dpif|DBG|system@ovs-system: device internet is on port 1 2024-08-12T16:01:08.695Z|3796718|dpif|DBG|system@ovs-system: device br-ext is on port 2 2024-08-12T16:01:08.695Z|3796719|jsonrpc|DBG|unix:/var/run/openvswitch/db.sock: send request, method="transact", params=["Open_vSwitch",{"where":[["_uuid","==",["uuid","acae6b73-de5f-46e4-a3d9-7874efd43cb4"]]],"row":{"mac_in_use":"00:00:00:00:00:02"},"op":"update","table":"Interface"},{"lock":"ovs_vswitchd","op":"assert"}], id=4284801 2024-08-12T16:01:08.699Z|3796720|poll_loop|DBG|wakeup due to [POLLIN] on fd 37 (FIFO pipe:[103980657]) at vswitchd/bridge.c:421 (0% CPU usage) 2024-08-12T16:01:08.699Z|3796721|poll_loop|DBG|wakeup due to [POLLIN] on fd 17 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (0% CPU usage) 2024-08-12T16:01:08.699Z|00690|poll_loop(urcu5)|DBG|wakeup due to [POLLIN] on fd 56 (FIFO pipe:[103975681]) at lib/ovs-rcu.c:363 2024-08-12T16:01:08.699Z|3796722|jsonrpc|DBG|unix:/var/run/openvswitch/db.sock: received notification, method="update3", params=[["monid","Open_vSwitch"],"----",{"Interface":{"acae6b73-de5f-46e4-a3d9-7874efd43cb4":{"modify":{"mac_in_use":"00:00:00:00:00:02"] 2024-08-12T16:01:08.699Z|3796723|jsonrpc|DBG|unix:/var/run/openvswitch/db.sock: received reply, result=[{"count":1},{}], id=4284801 2024-08-12T16:01:08.699Z|3796724|vconn|DBG|unix#121: sent (Success): OFPT_PORT_STATUS (OF1.5) (xid=0x0): MOD: 15(test1): addr:00:00:00:00:00:02 config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max 2024-08-12T16:01:08.699Z|3796725|bridge|INFO|bridge br-int: using datapath ID 0002 2024-08-12T16:01:08.699Z|3796726|rconn|INFO|br-int<->unix#121: disconnecting 2024-08-12T16:01:08.699Z|3796727|rconn|DBG|br-int<->unix#121: entering DISCONNECTED 2024-08-12T16:01:08.699Z|3796728|rconn|INFO|br-int<->unix#122: disconnecting 2024-08-12T16:01:08.699Z|3796729|rconn|DBG|br-int<->unix#122: entering DISCONNECTED 2024-08-12T16:01:08.699Z|3796730|rconn|INFO|br-int<->unix#123: disconnecting 2024-08-12T16:01:08.699Z|3796731|rconn|DBG|br-int<->unix#123: entering DISCONNECTED 2024-08-12T16:01:08.699Z|3796732|jsonrpc|DBG|unix:/var/run/openvswitch/db.sock: send request, method="transact", params=["Open_vSwitch",{"where":[["_uuid","==",["uuid","5c37aa7c-9394-4068-8a4d-ab23705036d7"]]],"row":{"datapath_id
[ovs-dev] [ovn] OF connection dropped when OVS port mac is changes
Hi, I've faced with a behavior of OVS/OVN (I couldn't understand which part to blame) loosing openflow connection if I change mac address of an added to br-int logical port, which is claimed as a normal VIF port by ovn-controller. There is a simple reproducer script: ovn-nbctl ls-add test ovn-nbctl lsp-add test test1 ip li add test1 type dummy ovs-vsctl add-port br-int test1 ip li set test1 add 00:00:00:00:00:01 When the last line is executed in ovn-controller.log I see openflow is re-connected: 2024-08-12T15:42:02.493Z|03705|rconn(ovn_pinctrl0)|INFO|unix:/run/openvswitch/br-int.mgmt: connection closed by peer 2024-08-12T15:42:02.494Z|00222|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connection closed by peer 2024-08-12T15:42:02.494Z|00223|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connection closed by peer 2024-08-12T15:42:03.495Z|00224|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connecting... 2024-08-12T15:42:03.495Z|00225|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connecting... 2024-08-12T15:42:03.495Z|03706|rconn(ovn_pinctrl0)|INFO|unix:/run/openvswitch/br-int.mgmt: connecting... 2024-08-12T15:42:03.495Z|00226|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connected In ovs-vswitchd with enabled DBG logs there is: 2024-08-12T16:01:08.694Z|3796712|poll_loop|DBG|wakeup due to [POLLIN] on fd 18 (NETLINK_ROUTE<->NETLINK_ROUTE) at lib/netlink-socket.c:1409 (0% CPU usage) 2024-08-12T16:01:08.695Z|3796713|poll_loop|DBG|wakeup due to [POLLIN] on fd 16 (NETLINK_ROUTE<->NETLINK_ROUTE) at lib/netlink-socket.c:1409 (0% CPU usage) 2024-08-12T16:01:08.695Z|3796714|netlink_socket|DBG|Dropped 505 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate 2024-08-12T16:01:08.695Z|3796715|netlink_socket|DBG|nl_sock_recv__ (Success): nl(len:1360, type=16(family-defined), flags=0, seq=0, pid=0 2024-08-12T16:01:08.695Z|3796716|dpif|DBG|Dropped 15 log messages in last 0 seconds (most recently, 0 seconds ago) due to excessive rate 2024-08-12T16:01:08.695Z|3796717|dpif|DBG|system@ovs-system: device internet is on port 1 2024-08-12T16:01:08.695Z|3796718|dpif|DBG|system@ovs-system: device br-ext is on port 2 2024-08-12T16:01:08.695Z|3796719|jsonrpc|DBG|unix:/var/run/openvswitch/db.sock: send request, method="transact", params=["Open_vSwitch",{"where":[["_uuid","==",["uuid","acae6b73-de5f-46e4-a3d9-7874efd43cb4"]]],"row":{"mac_in_use":"00:00:00:00:00:02"},"op":"update","table":"Interface"},{"lock":"ovs_vswitchd","op":"assert"}], id=4284801 2024-08-12T16:01:08.699Z|3796720|poll_loop|DBG|wakeup due to [POLLIN] on fd 37 (FIFO pipe:[103980657]) at vswitchd/bridge.c:421 (0% CPU usage) 2024-08-12T16:01:08.699Z|3796721|poll_loop|DBG|wakeup due to [POLLIN] on fd 17 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (0% CPU usage) 2024-08-12T16:01:08.699Z|00690|poll_loop(urcu5)|DBG|wakeup due to [POLLIN] on fd 56 (FIFO pipe:[103975681]) at lib/ovs-rcu.c:363 2024-08-12T16:01:08.699Z|3796722|jsonrpc|DBG|unix:/var/run/openvswitch/db.sock: received notification, method="update3", params=[["monid","Open_vSwitch"],"----",{"Interface":{"acae6b73-de5f-46e4-a3d9-7874efd43cb4":{"modify":{"mac_in_use":"00:00:00:00:00:02"] 2024-08-12T16:01:08.699Z|3796723|jsonrpc|DBG|unix:/var/run/openvswitch/db.sock: received reply, result=[{"count":1},{}], id=4284801 2024-08-12T16:01:08.699Z|3796724|vconn|DBG|unix#121: sent (Success): OFPT_PORT_STATUS (OF1.5) (xid=0x0): MOD: 15(test1): addr:00:00:00:00:00:02 config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max 2024-08-12T16:01:08.699Z|3796725|bridge|INFO|bridge br-int: using datapath ID 0002 2024-08-12T16:01:08.699Z|3796726|rconn|INFO|br-int<->unix#121: disconnecting 2024-08-12T16:01:08.699Z|3796727|rconn|DBG|br-int<->unix#121: entering DISCONNECTED 2024-08-12T16:01:08.699Z|3796728|rconn|INFO|br-int<->unix#122: disconnecting 2024-08-12T16:01:08.699Z|3796729|rconn|DBG|br-int<->unix#122: entering DISCONNECTED 2024-08-12T16:01:08.699Z|3796730|rconn|INFO|br-int<->unix#123: disconnecting 2024-08-12T16:01:08.699Z|3796731|rconn|DBG|br-int<->unix#123: entering DISCONNECTED 2024-08-12T16:01:08.699Z|3796732|jsonrpc|DBG|unix:/var/run/openvswitch/db.sock: send request, method="transact", params=["Open_vSwitch",{"where":[["_uuid","==",["uuid","5c37aa7c-9394-4068-8a4d-ab23705036d7"]]],"row":{"datapath_id":"0002"},"op":"update","table":"Bridge"},{"lock":"ovs_vswitchd","op":"assert"}], id=4284802 2024-08-12T16:01:08.701Z|3796733|poll_loop|DBG|wakeup due to [POLLIN] on fd 37 (FIFO pipe:[103980657]) at vswitchd/bridge.c:421 (0% CPU usage) 2024-08-12T16:01:08.701Z|3796734|poll_loop|DBG|wakeup due to [POLLIN] on fd 17 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (0% CPU usage) 2024-08-12T16:01:08.701Z|00691|poll_loop(urcu5)|DBG|wakeup due to [POLLIN] on fd 56 (FIFO pipe:[103975681]) at lib/ovs-rcu.c:236 2024-08-12T16:01:08.701Z|37967
Re: [ovs-dev] [Patch ovn v8] northd: Routing protocol port redirection.
On 12.08.2024 20:20, martin.kal...@canonical.com wrote: Sorry, one more follow-up question. On Mon, 2024-08-12 at 15:09 +0200,martin.kal...@canonical.com wrote: Hi Dumitru, thanks for the fast review. On Mon, 2024-08-12 at 14:41 +0200, Dumitru Ceara wrote: On 8/12/24 13:44, Martin Kalcok wrote: This change adds two new LRP options: * routing-protocol-redirect * routing-protocols These allow redirection of a routing protocol traffic to an Logical Switch Port. This enables external routing daemons to listen on an interface bound to an LSP and effectively act as if they were listening on (and speaking from) LRP's IP address. Option 'routing-protocols' takes a comma-separated list of routing protocols whose traffic should be redirected. Currently supported are BGP (tcp 179) and BFD (udp 3784). Option 'routing-protocol-redirect' expects a string with an LSP name. When both of these options are set, any traffic entering LS that's destined for LRP's IP addresses (including IPv6 LLA) and routing protocol's port number, is redirected to the LSP specified in the 'routing-protocol-redirect' value. NOTE: this feature is experimental and may be subject to removal/change in the future. Signed-off-by: Martin Kalcok --- Hi Martin, v8 patch fixes intermittent segfault issue present in previous versions. It was caused by mistakingly using peer port's 'lflow_ref' (op->peer->lflow_ref) when inserting rules into the peer's datapath. I assumed that when rules are injected into the peer's datapath, it should use peer port's 'lflow_ref'. However that is not the case[0] and the thread-unsafe nature of 'lflow_ref'[1] caused crashes when another thread tried to use peer port's lflow_ref at the same time. Ah, good catch! Yes, lflow_ref is not straightforward. I'm running tests for this feature in loop and I'm at 500+ successful executions, whereas before I would see crashes in about 10 attempts. While I was looking at this patch with fresh eyes, I also included few nits: * rename 'redirect_port' var. to more descriptive 'redirect_port_name' * rename potentially confusing iterator variable 'lsp_peer' to more descriptive 'lsp_in_peer' * relaxed overly cautious string comparison 'if (s1_len == s2_len && !strncmp(s1, s2, s1_len))' to more simple 'if (!strcmp(s1, s2))' as I believe that we can safely assume that both s1 and s2 are null-terminated strings [0] https://github.com/ovn-org/ovn/blob/f0a368143e492c798d3233439f9f097f1c9690cd/northd/northd.c#L13953-L13957 [1] https://github.com/ovn-org/ovn/blob/f0a368143e492c798d3233439f9f097f1c9690cd/northd/northd.h#L684 I have a couple of small comments below. Otherwise, the code looks good to me! northd/northd.c | 226 northd/northd.h | 7 ++ northd/ovn-northd.8.xml | 54 ++ ovn-nb.xml | 42 tests/ovn-northd.at | 93 + tests/system-ovn.at | 120 + 6 files changed, 542 insertions(+) diff --git a/northd/northd.c b/northd/northd.c index 5ad30d854..53012de89 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -14002,6 +14002,230 @@ build_arp_resolve_flows_for_lrp(struct ovn_port *op, } } +static void +build_routing_protocols_redirect_rule__( + const char *s_addr, const char *redirect_port_name, int protocol_port, + const char *proto, bool is_ipv6, struct ovn_port *ls_peer, + struct lflow_table *lflows, struct ds *match, struct ds *actions, + struct lflow_ref *lflow_ref) +{ + int ip_ver = is_ipv6 ? 6 : 4; + ds_clear(actions); + ds_put_format(actions, "outport = \"%s\"; output;", redirect_port_name); + + /* Redirect packets in the input pipeline destined for LR's IP + * and the routing protocol's port to the LSP specified in + * 'routing-protocol-redirect' option.*/ + ds_clear(match); + ds_put_format(match, "ip%d.dst == %s && %s.dst == %d", ip_ver, s_addr, + proto, protocol_port); + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, 100, + ds_cstr(match), + ds_cstr(actions), + lflow_ref); + + /* To accomodate "peer" nature of the routing daemons, redirect also + * replies to the daemons' client requests. */ + ds_clear(match); + ds_put_format(match, "ip%d.dst == %s && %s.src == %d", ip_ver, s_addr, + proto, protocol_port); + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, 100, + ds_cstr(match), + ds_cstr(actions), + lflow_ref); +} + +static void +apply_routing_protocols_redirect__( + const char *s_addr, const char *redirect_port_name, int protocol_flags, + bool is_ipv6, struct ovn_port *ls_peer, struct lflow_table *lflows, + struct ds *match, struct ds *actions, struct lflow_ref *lflow_ref) +{ + if (protocol_flags & R
Re: [ovs-dev] [Patch ovn v5] northd: Routing protocol port redirection.
On 09.08.2024 16:51, martin.kal...@canonical.com wrote: On Fri, 2024-08-09 at 11:28 +0200, Dumitru Ceara wrote: On Friday, August 9, 2024, wrote: I don't mind merging them, but while we are at the topic. Are there benefits in processing performance in "fewer, more complex rules" vs "more, less complex rules"? Or is it just to improve readability? In this case I don’t expect we’ll have a lot of ports with redirect enabled so, from my perspective, it’s just readability (I was ok with the 2 flow version too). From an openflow perspective there’s no difference. Thanks, I think I'll have to keep the current 2-line implementation in and add this into the list of future improvements. @Vladislav I do appreciate the review and the feedback, and I don't want to look like I'm just ignoring it, but due to the time pressure (freeze today) and this change resulting in tests/docs change (which always takes me way more than I expect) i don't think I'll fit it in. It doesn't look like it but the overhead adds up. Sure, no problem. I'm going to test v6. Will return with feedback soon. Stay tuned :) Martin. Thanks, Dumitru Martin. On Fri, 2024-08-09 at 11:08 +0200, Dumitru Ceara wrote: On Friday, August 9, 2024, Vladislav Odintsov wrote: Don't we want to merge these two conditions into one logical flow? E.g.: "(ip%d.dst == %s && (%s.dst == %d && %s.src == %d)" Sorry, there is typo. It should be: "(ip%d.dst == %s && (%s.dst == %d || %s.src == %d)" ? This will make one logical flow per LRP IP per protocol instead of two. I didn’t test this but I think that looks ok. Regards, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [Patch ovn v5] northd: Routing protocol port redirection.
On 09.08.2024 15:48, Vladislav Odintsov wrote: Hi Martin, Dumitru, I'm gonna to perform a quick testing after rebase and get back with a feedback. Please see one question below. On 09.08.2024 15:25, martin.kal...@canonical.com wrote: Hi Dumitru, thank you for the fast review. I'll post v6 momentarily, I just want to do quick fresh end-to-end setup/test. I also want to add that you were correct about the ARP/ND. It didn't work properly and it was living off the entries populated by unicast traffic from the peer, however the entries were flapping wildly and it was discovered when I threw the BFD into the mix (real blessing in disguise). The slower BFD protocol was able to deal with it without disconnects, but BFD was disconnecting every once in a while. Hence the `clone`rules for the ARP replies and IPv6 NAs. On Fri, 2024-08-09 at 08:19 +0200, Dumitru Ceara wrote: On 8/9/24 04:45, Martin Kalcok wrote: This change adds two new LRP options: * routing-protocol-redirect * routing-protocols These allow redirection of a routing protocol traffic to an Logical Switch Port. This enables external routing daemons to listen on an interface bound to an LSP and effectively act as if they were listening on (and speaking from) LRP's IP address. Option 'routing-protocols' takes a comma-separated list of routing protocols whose traffic should be redirected. Currently supported are BGP (tcp 179) and BFD (udp 3784). Option 'routing-protocol-redirect' expects a string with an LSP name. When both of these options are set, any traffic entering LS that's destined for LRP's IP addresses (including IPv6 LLA) and routing protocol's port number, is redirected to the LSP specified in the 'routing-protocol-redirect' value. NOTE: this feature is experimental and may be subject to removal/change in the future. Signed-off-by: Martin Kalcok --- Hi Martin, Thanks for v5! Unfortunately it needs a rebase. Also there were some minor things that need to be fixed in the man pages. While at it I have a few other small comments. v5 includes: * change from pure BGP redirect to a configurable protocol redirection * fixes issue with local routing daemon not being able to receive reply traffic when contacting its peer. * ARP and ND traffic is cloned to the redirected port, to properly populate information about its neighbors. northd/northd.c | 217 northd/northd.h | 7 ++ northd/ovn-northd.8.xml | 54 ++ ovn-nb.xml | 42 tests/ovn-northd.at | 93 + tests/system-ovn.at | 100 ++ 6 files changed, 513 insertions(+) diff --git a/northd/northd.c b/northd/northd.c index 4353df07d..39b1998fd 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -13048,6 +13048,221 @@ build_arp_resolve_flows_for_lrp(struct ovn_port *op, } } +static void +build_r_p_redirect_rule__( I think I prefer a more explicit name, e.g., build_routing_protocols_redirect_rule__(). + const char *s_addr, const char *redirect_port_name, int protocol_port, + const char *proto, bool is_ipv6, struct ovn_port *ls_peer, + struct lflow_table *lflows, struct ds *match, struct ds *actions) +{ + int ip_ver = is_ipv6 ? 6 : 4; + ds_clear(actions); + ds_put_format(actions, "outport = \"%s\"; output;", redirect_port_name); + + /* Redirect packets in the input pipeline destined for LR's IP + * and the routing protocol's port to the LSP specified in + * 'routing-protocol-redirect' option.*/ + ds_clear(match); + ds_put_format(match, "ip%d.dst == %s && %s.dst == %d", ip_ver, s_addr, + proto, protocol_port); + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, 100, + ds_cstr(match), + ds_cstr(actions), + ls_peer->lflow_ref); + + /* To accomodate "peer" nature of the routing daemons, redirect also + * replies to the daemons' client requests. */ + ds_clear(match); + ds_put_format(match, "ip%d.dst == %s && %s.src == %d", ip_ver, s_addr, + proto, protocol_port); + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, 100, + ds_cstr(match), + ds_cstr(actions), + ls_peer->lflow_ref); +} Don't we want to merge these two conditions into one logical flow? E.g.: "(ip%d.dst == %s && (%s.dst == %d && %s.src == %d)" Sorry, there is typo. It should be: "(ip%d.dst == %s && (%s.dst == %d || %s.src == %d)" ? This will make one logical flow per LRP IP per protocol instead of two. + +static void +apply_r_p_redirect__( Same here, I think I prefer apply_routing_protocols_redirect__().
Re: [ovs-dev] [Patch ovn v5] northd: Routing protocol port redirection.
Hi Martin, Dumitru, I'm gonna to perform a quick testing after rebase and get back with a feedback. Please see one question below. On 09.08.2024 15:25, martin.kal...@canonical.com wrote: Hi Dumitru, thank you for the fast review. I'll post v6 momentarily, I just want to do quick fresh end-to-end setup/test. I also want to add that you were correct about the ARP/ND. It didn't work properly and it was living off the entries populated by unicast traffic from the peer, however the entries were flapping wildly and it was discovered when I threw the BFD into the mix (real blessing in disguise). The slower BFD protocol was able to deal with it without disconnects, but BFD was disconnecting every once in a while. Hence the `clone`rules for the ARP replies and IPv6 NAs. On Fri, 2024-08-09 at 08:19 +0200, Dumitru Ceara wrote: On 8/9/24 04:45, Martin Kalcok wrote: This change adds two new LRP options: * routing-protocol-redirect * routing-protocols These allow redirection of a routing protocol traffic to an Logical Switch Port. This enables external routing daemons to listen on an interface bound to an LSP and effectively act as if they were listening on (and speaking from) LRP's IP address. Option 'routing-protocols' takes a comma-separated list of routing protocols whose traffic should be redirected. Currently supported are BGP (tcp 179) and BFD (udp 3784). Option 'routing-protocol-redirect' expects a string with an LSP name. When both of these options are set, any traffic entering LS that's destined for LRP's IP addresses (including IPv6 LLA) and routing protocol's port number, is redirected to the LSP specified in the 'routing-protocol-redirect' value. NOTE: this feature is experimental and may be subject to removal/change in the future. Signed-off-by: Martin Kalcok --- Hi Martin, Thanks for v5! Unfortunately it needs a rebase. Also there were some minor things that need to be fixed in the man pages. While at it I have a few other small comments. v5 includes: * change from pure BGP redirect to a configurable protocol redirection * fixes issue with local routing daemon not being able to receive reply traffic when contacting its peer. * ARP and ND traffic is cloned to the redirected port, to properly populate information about its neighbors. northd/northd.c | 217 northd/northd.h | 7 ++ northd/ovn-northd.8.xml | 54 ++ ovn-nb.xml | 42 tests/ovn-northd.at | 93 + tests/system-ovn.at | 100 ++ 6 files changed, 513 insertions(+) diff --git a/northd/northd.c b/northd/northd.c index 4353df07d..39b1998fd 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -13048,6 +13048,221 @@ build_arp_resolve_flows_for_lrp(struct ovn_port *op, } } +static void +build_r_p_redirect_rule__( I think I prefer a more explicit name, e.g., build_routing_protocols_redirect_rule__(). + const char *s_addr, const char *redirect_port_name, int protocol_port, + const char *proto, bool is_ipv6, struct ovn_port *ls_peer, + struct lflow_table *lflows, struct ds *match, struct ds *actions) +{ + int ip_ver = is_ipv6 ? 6 : 4; + ds_clear(actions); + ds_put_format(actions, "outport = \"%s\"; output;", redirect_port_name); + + /* Redirect packets in the input pipeline destined for LR's IP + * and the routing protocol's port to the LSP specified in + * 'routing-protocol-redirect' option.*/ + ds_clear(match); + ds_put_format(match, "ip%d.dst == %s && %s.dst == %d", ip_ver, s_addr, + proto, protocol_port); + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, 100, + ds_cstr(match), + ds_cstr(actions), + ls_peer->lflow_ref); + + /* To accomodate "peer" nature of the routing daemons, redirect also + * replies to the daemons' client requests. */ + ds_clear(match); + ds_put_format(match, "ip%d.dst == %s && %s.src == %d", ip_ver, s_addr, + proto, protocol_port); + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, 100, + ds_cstr(match), + ds_cstr(actions), + ls_peer->lflow_ref); +} Don't we want to merge these two conditions into one logical flow? E.g.: "(ip%d.dst == %s && (%s.dst == %d && %s.src == %d)" ? This will make one logical flow per LRP IP per protocol instead of two. + +static void +apply_r_p_redirect__( Same here, I think I prefer apply_routing_protocols_redirect__(). + const char *s_addr, const char *redirect_port_name, int protocol_flags, + bool is_ipv6, struct ovn_port *ls_peer, struct lflow_table *lflows, + struct ds *match, struct ds *actions) +{ + if (protocol_flags & REDIRECT_BGP) { + build_r_p_redirect_rule__(s_addr, redirect_port_name, 179, "tcp", + is_ipv6,
Re: [ovs-dev] [Patch ovn v4] northd: BGP port redirecting.
With OVN we do use IPv4 LLA for now. IPv6 is in future plans. On 08.08.2024 19:31, martin.kal...@canonical.com wrote: Would it be useful to redirect only traffic for LRP's IPv6 LLA? @Vladislav, in your setups, do you use ipv4 or ipv6 LLAs for setting up BGP peering? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [Patch ovn v4] northd: BGP port redirecting.
On 08.08.2024 18:45, Dumitru Ceara wrote: On 8/8/24 11:14, Vladislav Odintsov wrote: On 08.08.2024 15:51,martin.kal...@canonical.com wrote: On Thu, 2024-08-08 at 10:46 +0200, Dumitru Ceara wrote: On 8/8/24 10:42, Vladislav Odintsov wrote: On 08.08.2024 03:16, Dumitru Ceara wrote: Re-adding the dev list. On 8/7/24 18:12, Vladislav Odintsov wrote: Hi Dumitru, Hi Vladislav, I'd like to add some thoughts to your inputs. Thanks for that, I added some more of my own below. :) On 06.08.2024 19:23, Dumitru Ceara wrote: Hi Martin, Sorry, I was reviewing V3 but I was slow in actually sending out the email. On 8/6/24 13:17, Martin Kalcok wrote: This change adds a 'bgp-redirect' option to LRP that allows redirecting of BGP control plane traffic to an arbitrary LSP in its peer LS. The option expects a string with a LSP name. When set, any traffic entering LS that's destined for any of the LRP's IP addresses (including IPv6 LLA) is redirected to the LSP specified in the option's value. This enables external BGP daemons to listen on an interface bound to a LSP and effectively act as if they were listening on (and speaking from) LRP's IP address. Signed-off-by: Martin Kalcok --- Strictly on this version of the patch, and with the thought in mind that we might want to consider this feature experimental [0] and maybe change it (NB too) in the future, I left a few comments inline. With those addressed I think the patch looks OK to me. [0] https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmich...@redhat.com/ Now, for the point when we'd remove the "experimental" tag: In general, it makes me a bit uneasy that we have to share the MAC and IP between the LRP and the VIF of logical switch port that's connected to the same switch as the LRP. I was thinking of alternatives for the future and the only things I could come up with until now are: 1. Create a separate, "disconnected" logical switch: ovn-nbctl ls-add bgp Add the bgp-daemon port to it: ovn-nbctl lsp-add bgp bgp-daemon. Then we don't need "unknown" either, I think. But it's not possible today to move packets from the ingress pipeline of a logical switch ("public" in this test) to the egress pipeline of a different logical switch ("bgp" in my example). It also feels icky to implement that. 2. Add the ability to bind an OVS port directly to a logical router port: then we could do the same type of redirect you do here but instead in the logical router pipeline. The advantage is we don't have to drop any non-bgp traffic in the switch pipeline. The switch keeps functioning as it does today. Maybe an advantage of this second alternative would be that we can easily attach a filtering option to this LRP (e.g., LRP.options:control-traffic-filter) to allow us to be more flexible in what kind of traffic we forward to the actuall routing protocol daemon that runs behind that OVS port - Vladislav also mentioned during the meeting it might be interesting to forward BFD packets to the FRR (or whatever application implements the routing protocol) instance too. The idea to be able to bind LRP to OVS port sounds very interesting to me. But with a note that it's not a pure "bind", but a partial: as you wrote with some "filter" applied to pass not all the traffic. The main idea here is to pass only control plane traffic destined to LRP's As we're discussing on the other thread (Martin pointed it out) we also probably need to involve conntrack and allow replies to any kind of traffic initiated by the entity behind the LRP's VIF (e.g., BGP sessions initiated from the host). addresses. Other than that seems odd. Transit traffic should be remained in LR pipeline otherwise it will have no difference with a regular VIF LSP. Definitely, traffic that needs to be DNATed (DNAT/unSNAT rules or LB rules that will change the destination IP from LRP IP to something else) should not be affected. All other "transit" traffic doesn't have LRP IP, does it? You're right. @Dumitru, @Martin, what if we just redirect all traffic destined to LRP IPs to redirect port? Is there any drawbacks? For security it is possible to optionally use ACLs with or without conntrack. It's up to user/CMS. This seems quite simple in the code and very flexible. So no additional flows seems to be needed in future to support any other routing protocols (or for another possible non-routing usecases). Won't this break all "transit" traffic that has destination IP the LRP IP (DNAT, LB), etc? I'm afraid so. I guess I was just tired yesterday when I made that proposal for general redirect. Given that the redirect is implemented in the LS pipeline, it would eat up all the traffic for LRP's IP before the DNAT/unSNAT occurs in the LR pipeline. I'll give it a quick
Re: [ovs-dev] [Patch ovn v4] northd: BGP port redirecting.
On 08.08.2024 15:51, martin.kal...@canonical.com wrote: On Thu, 2024-08-08 at 10:46 +0200, Dumitru Ceara wrote: On 8/8/24 10:42, Vladislav Odintsov wrote: On 08.08.2024 03:16, Dumitru Ceara wrote: Re-adding the dev list. On 8/7/24 18:12, Vladislav Odintsov wrote: Hi Dumitru, Hi Vladislav, I'd like to add some thoughts to your inputs. Thanks for that, I added some more of my own below. :) On 06.08.2024 19:23, Dumitru Ceara wrote: Hi Martin, Sorry, I was reviewing V3 but I was slow in actually sending out the email. On 8/6/24 13:17, Martin Kalcok wrote: This change adds a 'bgp-redirect' option to LRP that allows redirecting of BGP control plane traffic to an arbitrary LSP in its peer LS. The option expects a string with a LSP name. When set, any traffic entering LS that's destined for any of the LRP's IP addresses (including IPv6 LLA) is redirected to the LSP specified in the option's value. This enables external BGP daemons to listen on an interface bound to a LSP and effectively act as if they were listening on (and speaking from) LRP's IP address. Signed-off-by: Martin Kalcok --- Strictly on this version of the patch, and with the thought in mind that we might want to consider this feature experimental [0] and maybe change it (NB too) in the future, I left a few comments inline. With those addressed I think the patch looks OK to me. [0] https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmich...@redhat.com/ Now, for the point when we'd remove the "experimental" tag: In general, it makes me a bit uneasy that we have to share the MAC and IP between the LRP and the VIF of logical switch port that's connected to the same switch as the LRP. I was thinking of alternatives for the future and the only things I could come up with until now are: 1. Create a separate, "disconnected" logical switch: ovn-nbctl ls-add bgp Add the bgp-daemon port to it: ovn-nbctl lsp-add bgp bgp-daemon. Then we don't need "unknown" either, I think. But it's not possible today to move packets from the ingress pipeline of a logical switch ("public" in this test) to the egress pipeline of a different logical switch ("bgp" in my example). It also feels icky to implement that. 2. Add the ability to bind an OVS port directly to a logical router port: then we could do the same type of redirect you do here but instead in the logical router pipeline. The advantage is we don't have to drop any non-bgp traffic in the switch pipeline. The switch keeps functioning as it does today. Maybe an advantage of this second alternative would be that we can easily attach a filtering option to this LRP (e.g., LRP.options:control-traffic-filter) to allow us to be more flexible in what kind of traffic we forward to the actuall routing protocol daemon that runs behind that OVS port - Vladislav also mentioned during the meeting it might be interesting to forward BFD packets to the FRR (or whatever application implements the routing protocol) instance too. The idea to be able to bind LRP to OVS port sounds very interesting to me. But with a note that it's not a pure "bind", but a partial: as you wrote with some "filter" applied to pass not all the traffic. The main idea here is to pass only control plane traffic destined to LRP's As we're discussing on the other thread (Martin pointed it out) we also probably need to involve conntrack and allow replies to any kind of traffic initiated by the entity behind the LRP's VIF (e.g., BGP sessions initiated from the host). addresses. Other than that seems odd. Transit traffic should be remained in LR pipeline otherwise it will have no difference with a regular VIF LSP. Definitely, traffic that needs to be DNATed (DNAT/unSNAT rules or LB rules that will change the destination IP from LRP IP to something else) should not be affected. All other "transit" traffic doesn't have LRP IP, does it? You're right. @Dumitru, @Martin, what if we just redirect all traffic destined to LRP IPs to redirect port? Is there any drawbacks? For security it is possible to optionally use ACLs with or without conntrack. It's up to user/CMS. This seems quite simple in the code and very flexible. So no additional flows seems to be needed in future to support any other routing protocols (or for another possible non-routing usecases). Won't this break all "transit" traffic that has destination IP the LRP IP (DNAT, LB), etc? I'm afraid so. I guess I was just tired yesterday when I made that proposal for general redirect. Given that the redirect is implemented in the LS pipeline, it would eat up all the traffic for LRP's IP before the DNAT/unSNAT occurs in the LR pipeline. I'll give it a quick test in case I'm wrong here again. Oops, this will definitely break that cases.
Re: [ovs-dev] [Patch ovn v4] northd: BGP port redirecting.
On 08.08.2024 03:16, Dumitru Ceara wrote: Re-adding the dev list. On 8/7/24 18:12, Vladislav Odintsov wrote: Hi Dumitru, Hi Vladislav, I'd like to add some thoughts to your inputs. Thanks for that, I added some more of my own below. :) On 06.08.2024 19:23, Dumitru Ceara wrote: Hi Martin, Sorry, I was reviewing V3 but I was slow in actually sending out the email. On 8/6/24 13:17, Martin Kalcok wrote: This change adds a 'bgp-redirect' option to LRP that allows redirecting of BGP control plane traffic to an arbitrary LSP in its peer LS. The option expects a string with a LSP name. When set, any traffic entering LS that's destined for any of the LRP's IP addresses (including IPv6 LLA) is redirected to the LSP specified in the option's value. This enables external BGP daemons to listen on an interface bound to a LSP and effectively act as if they were listening on (and speaking from) LRP's IP address. Signed-off-by: Martin Kalcok --- Strictly on this version of the patch, and with the thought in mind that we might want to consider this feature experimental [0] and maybe change it (NB too) in the future, I left a few comments inline. With those addressed I think the patch looks OK to me. [0] https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmich...@redhat.com/ Now, for the point when we'd remove the "experimental" tag: In general, it makes me a bit uneasy that we have to share the MAC and IP between the LRP and the VIF of logical switch port that's connected to the same switch as the LRP. I was thinking of alternatives for the future and the only things I could come up with until now are: 1. Create a separate, "disconnected" logical switch: ovn-nbctl ls-add bgp Add the bgp-daemon port to it: ovn-nbctl lsp-add bgp bgp-daemon. Then we don't need "unknown" either, I think. But it's not possible today to move packets from the ingress pipeline of a logical switch ("public" in this test) to the egress pipeline of a different logical switch ("bgp" in my example). It also feels icky to implement that. 2. Add the ability to bind an OVS port directly to a logical router port: then we could do the same type of redirect you do here but instead in the logical router pipeline. The advantage is we don't have to drop any non-bgp traffic in the switch pipeline. The switch keeps functioning as it does today. Maybe an advantage of this second alternative would be that we can easily attach a filtering option to this LRP (e.g., LRP.options:control-traffic-filter) to allow us to be more flexible in what kind of traffic we forward to the actuall routing protocol daemon that runs behind that OVS port - Vladislav also mentioned during the meeting it might be interesting to forward BFD packets to the FRR (or whatever application implements the routing protocol) instance too. The idea to be able to bind LRP to OVS port sounds very interesting to me. But with a note that it's not a pure "bind", but a partial: as you wrote with some "filter" applied to pass not all the traffic. The main idea here is to pass only control plane traffic destined to LRP's As we're discussing on the other thread (Martin pointed it out) we also probably need to involve conntrack and allow replies to any kind of traffic initiated by the entity behind the LRP's VIF (e.g., BGP sessions initiated from the host). addresses. Other than that seems odd. Transit traffic should be remained in LR pipeline otherwise it will have no difference with a regular VIF LSP. Definitely, traffic that needs to be DNATed (DNAT/unSNAT rules or LB rules that will change the destination IP from LRP IP to something else) should not be affected. All other "transit" traffic doesn't have LRP IP, does it? You're right. @Dumitru, @Martin, what if we just redirect all traffic destined to LRP IPs to redirect port? Is there any drawbacks? For security it is possible to optionally use ACLs with or without conntrack. It's up to user/CMS. This seems quite simple in the code and very flexible. So no additional flows seems to be needed in future to support any other routing protocols (or for another possible non-routing usecases). Having a filter (or match like in ACL or Logical_Router_Policies) looks more flexible in terms of used protocols. This can coexist with proposal from current patch, where the flow is not programmable from user. I think so too. Regards, Dumitru But again, if we consider the current feature "experimental", we can spend more time during the next release cycle to figure out what's best. v4 of this patch renames the feature from "bgp-mirror" to "bgp-redirect" as discussed during community meeting. northd/northd.c | 108 northd/ovn
Re: [ovs-dev] [Patch ovn v4] northd: BGP port redirecting.
On 07.08.2024 16:23, Dumitru Ceara wrote: On 8/7/24 11:17,martin.kal...@canonical.com wrote: Hi Dumitru and Vladislav, Thank you both for the review and the feedback On Wed, 2024-08-07 at 09:17 +0200, Dumitru Ceara wrote: On 8/6/24 19:22, Vladislav Odintsov wrote: Hi Martin, Dumitru, Hi Vladislav, On 06.08.2024 19:23, Dumitru Ceara wrote: Hi Martin, Sorry, I was reviewing V3 but I was slow in actually sending out the email. On 8/6/24 13:17, Martin Kalcok wrote: This change adds a 'bgp-redirect' option to LRP that allows redirecting of BGP control plane traffic to an arbitrary LSP in its peer LS. The option expects a string with a LSP name. When set, any traffic entering LS that's destined for any of the LRP's IP addresses (including IPv6 LLA) is redirected to the LSP specified in the option's value. This enables external BGP daemons to listen on an interface bound to a LSP and effectively act as if they were listening on (and speaking from) LRP's IP address. Signed-off-by: Martin Kalcok --- Strictly on this version of the patch, and with the thought in mind that we might want to consider this feature experimental [0] and maybe change it (NB too) in the future, I left a few comments inline. With those addressed I think the patch looks OK to me. [0] https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmich...@redhat.com/ I'm keeping an eye on this one and I think you made a good point Dumitru about the need for clear way to mark feature as experimental. I'll chime in that patch discussion as well. Thanks! Now, for the point when we'd remove the "experimental" tag: In general, it makes me a bit uneasy that we have to share the MAC and IP between the LRP and the VIF of logical switch port that's connected to the same switch as the LRP. I think that in one way or another, the requirement for the interface to share IP/MAC with the LRP will remain. The reason for sharing them is to allow operating BGP unnumbered which uses IPv6 LLA instead of pre-configured ASN. I think we don't need it with any of the alternative options below but we can discuss those in detail after branching, in the 25.03 cycle. I was thinking of alternatives for the future and the only things I could come up with until now are: 1. Create a separate, "disconnected" logical switch: ovn-nbctl ls-add bgp Add the bgp-daemon port to it: ovn-nbctl lsp-add bgp bgp-daemon. Then we don't need "unknown" either, I think. But it's not possible today to move packets from the ing,ress pipeline of a logical switch ("public" in this test) to the egress pipeline of a different logical switch ("bgp" in my example). It also feels icky to implement that. 2. Add the ability to bind an OVS port directly to a logical router port: then we could do the same type of redirect you do here but instead in the logical router pipeline. The advantage is we don't have to drop any non-bgp traffic in the switch pipeline. The switch keeps functioning as it does today. I like the second approach a lot. It would indeed feel more "in place" to have this logic in the LR, instead of LS. I didn't think this would be possible initially. Ack, maybe we should come back to this discussion after branching. Maybe an advantage of this second alternative would be that we can easily attach a filtering option to this LRP (e.g., LRP.options:control-traffic-filter) to allow us to be more flexible in what kind of traffic we forward to the actuall routing protocol daemon that runs behind that OVS port - Vladislav also mentioned during the meeting it might be interesting to forward BFD packets to the FRR (or whatever application implements the routing protocol) instance too. @Martin, I'm again kindly asking about possible support for BFD redirect. Is it possible to incorporate these changes as an additional configuration know ("bfd-redirect") into your patch so we can start using this functionality internally and possibly give some feedback (hopefully positive)? It doesn't seem to be a very big change but looks very attractive for us. I can send a separate patch for this on top of your patch, but technically it won't be able to jump into the 24.09 because formally it's already a soft-freeze in progress. As an option I can send a patch, which can be squashed into this one. What do you think? I didn't check with other maintainers (added them in CC now), but given that we agreed to try to accept the port redirecting patch as experimental I think it's fine to expand it to allow BFD support too. Posting a follow up patch on top of this one is fine from my perspective. Indeed the change is not that big code-wise, but given that it would require some forethought about the implementation I was bit hesitant to include it in here. I didn't want to rock th
Re: [ovs-dev] [Patch ovn v4] northd: BGP port redirecting.
Hi Martin, Dumitru, On 06.08.2024 19:23, Dumitru Ceara wrote: Hi Martin, Sorry, I was reviewing V3 but I was slow in actually sending out the email. On 8/6/24 13:17, Martin Kalcok wrote: This change adds a 'bgp-redirect' option to LRP that allows redirecting of BGP control plane traffic to an arbitrary LSP in its peer LS. The option expects a string with a LSP name. When set, any traffic entering LS that's destined for any of the LRP's IP addresses (including IPv6 LLA) is redirected to the LSP specified in the option's value. This enables external BGP daemons to listen on an interface bound to a LSP and effectively act as if they were listening on (and speaking from) LRP's IP address. Signed-off-by: Martin Kalcok --- Strictly on this version of the patch, and with the thought in mind that we might want to consider this feature experimental [0] and maybe change it (NB too) in the future, I left a few comments inline. With those addressed I think the patch looks OK to me. [0] https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmich...@redhat.com/ Now, for the point when we'd remove the "experimental" tag: In general, it makes me a bit uneasy that we have to share the MAC and IP between the LRP and the VIF of logical switch port that's connected to the same switch as the LRP. I was thinking of alternatives for the future and the only things I could come up with until now are: 1. Create a separate, "disconnected" logical switch: ovn-nbctl ls-add bgp Add the bgp-daemon port to it: ovn-nbctl lsp-add bgp bgp-daemon. Then we don't need "unknown" either, I think. But it's not possible today to move packets from the ingress pipeline of a logical switch ("public" in this test) to the egress pipeline of a different logical switch ("bgp" in my example). It also feels icky to implement that. 2. Add the ability to bind an OVS port directly to a logical router port: then we could do the same type of redirect you do here but instead in the logical router pipeline. The advantage is we don't have to drop any non-bgp traffic in the switch pipeline. The switch keeps functioning as it does today. Maybe an advantage of this second alternative would be that we can easily attach a filtering option to this LRP (e.g., LRP.options:control-traffic-filter) to allow us to be more flexible in what kind of traffic we forward to the actuall routing protocol daemon that runs behind that OVS port - Vladislav also mentioned during the meeting it might be interesting to forward BFD packets to the FRR (or whatever application implements the routing protocol) instance too. @Martin, I'm again kindly asking about possible support for BFD redirect. Is it possible to incorporate these changes as an additional configuration know ("bfd-redirect") into your patch so we can start using this functionality internally and possibly give some feedback (hopefully positive)? It doesn't seem to be a very big change but looks very attractive for us. I can send a separate patch for this on top of your patch, but technically it won't be able to jump into the 24.09 because formally it's already a soft-freeze in progress. As an option I can send a patch, which can be squashed into this one. What do you think? But again, if we consider the current feature "experimental", we can spend more time during the next release cycle to figure out what's best. v4 of this patch renames the feature from "bgp-mirror" to "bgp-redirect" as discussed during community meeting. northd/northd.c | 108 northd/ovn-northd.8.xml | 23 + ovn-nb.xml | 14 ++ tests/ovn-northd.at | 58 + tests/system-ovn.at | 86 5 files changed, 289 insertions(+) diff --git a/northd/northd.c b/northd/northd.c index 4353df07d..088104f25 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -13048,6 +13048,113 @@ build_arp_resolve_flows_for_lrp(struct ovn_port *op, } } +static void +build_bgp_redirect_rule__( +const char *s_addr, const char *bgp_port_name, bool is_ipv6, +struct ovn_port *ls_peer, struct lflow_table *lflows, +struct ds *match, struct ds *actions) +{ +int ip_ver = is_ipv6 ? 6 : 4; +/* Redirect packets in the input pipeline destined for LR's IP to + * the port specified in 'bgp-redirect' option. + */ +ds_clear(match); +ds_clear(actions); +ds_put_format(match, "ip%d.dst == %s && tcp.dst == 179", ip_ver, s_addr); +ds_put_format(actions, "outport = \"%s\"; output;", bgp_port_name); +ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, 100, + ds_cstr(match), + ds_cstr(actions), + ls_peer->lflow_ref); + + +/* Drop any traffic originating from 'bgp-redirect' port that does + * not originate from BGP daemon port. This blocks unnecessary + * traffic like ARP bro
[ovs-dev] [PATCH ovn] tests: Fix typo in read-only sb ssl-ciphers test.
Though this typo does not affect test correctness, it is worth to be fixed in a right way. Fixes: bcc650a29d3f ("tests: Fix ssl-ciphers RO sb test with old openssl.") Signed-off-by: Vladislav Odintsov --- tests/ovn.at | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/ovn.at b/tests/ovn.at index b31afbfb3..5b81fc210 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -37939,7 +37939,7 @@ AT_CHECK([ovn-sbctl --db=ssl:127.0.0.1:$TCP_PORT \ --ca-cert=$PKIDIR/testpki-cacert.pem \ --ssl-ciphers='HIGH:!aNULL:!MD5:@SECLEVEL=1' \ --ssl-protocols='TLSv1,TLSv1.1,TLSv1.2' \ -chassis-add ch vxlan 1.2.4.8 2>&1 | grep 'transaction error]', [0], [dnl +chassis-add ch vxlan 1.2.4.8 2>&1 | grep 'transaction error'], [0], [dnl ovn-sbctl: transaction error: {"details":"insert operation not allowed when database server is in read only mode","error":"not allowed"} ], [ignore]) -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [Patch ovn v2 1/1] northd: BGP port mirroring.
Hi Martin, Frode, I'd like to summarize OVN technical meeting discussion. First, there was a discussion about new option naming. Ales proposed terms "forward" and "redirect" (IIRC). Do we want to reflect the exact behavior of new option? "redirect" from my perspective could be a good candidate. Please correct me if I'm wrong in next 4 terms: 1. In current BGP support patch series [0] OVN only installs NAT and LB VIP addresses into a separate routing table via Netlink which is then should be redistributed by external routing daemon (FRR, Bird, etc.). External routing daemon is configured outside of OVN. 2. OVN doesn't import routes received and installed into separate kernel routing table by routing daemon. "OVN to outside" direction routes are configured as normal Logical_Router_Static_Route in OVN_Northbound by CMS/external automation, while "outside to OVN" direction routes are installed on the external router(s) automatically with BGP. 3. If user has more then one BGP peering with Leaf/external Router and needs fast (sub-second) fail over for "OVN to outside" direction, BFD should be configured for ECMP Logical_Router_Static_Routes from OVN side with these external router IPs as a nexthop group. On external router side BFD within BGP must be configured. 4. If to forward BFD traffic from LRP to "bgp daemon" LSP unconditionally, functionality from #3 will be broken. I'm wondering, if we don't configure BFD for ECMP routes from OVNand use external tooling for routes learning, could we conditionally add BFD redirecting rules with a separate option? Would it be safe or there are any negative consequences? 0: https://patchwork.ozlabs.org/project/ovn/list/?series=416659 On 26.07.2024 23:10, Vladislav Odintsov wrote: On 26.07.2024 21:21, martin.kal...@canonical.com wrote: On Fri, 2024-07-26 at 21:07 +0700, Vladislav Odintsov wrote: Hi Frode, On 26.07.2024 19:17, Frode Nordahl wrote: On Fri, Jul 26, 2024 at 11:28 AM Vladislav Odintsov wrote: Hi Martin, thanks for the patch. Typically for faster BGP (or other routing protocols) convergence BFD signalling is used. Would you mind adding flows to forward BFD traffic to the same LSP as it is already done for BGP? It could be an additional option like ("enable-bfd-for-bgp") or something like this, or we can install flows unconditionally. UDP/3789 is the default BFD proto/port. BFD is indeed an important part for fast convergence of routing protocols, it is however also an important part of end to end liveness detection for a data path. In this work our goal is to exchange control plane information with an external daemon so that it can take care of the routing protocol state machine. We do want to keep the data path in OVS so that users can benefit from all the data path implementations it has to offer, including hardware acceleration. With this in mind, does it really make sense for the external daemon to speak BFD, or would it be better to integrate with the OVN managed BFD for static routes which is implemented in the OVS data path? Typically BFD for routing protocols is configured in routing daemons (on both sides of peering), because main routing daemon (e.g. bgpd) has to get notifications from BFD engine (e.g. bfdd), that the connection is lost. OVS-based BFD sessions seems nothing to do here. I proposed to install "redirect" flows similar to BGP: forward udp/3789 to dedicated LSP for routing daemon. After installing openflow control plane is not needed for BFD to work. OVS datapath in this case just forwards traffic from external network (for instance, leaf switch) to internal OVS port to routing daemon). Hope this explains the idea more clear. Wouldn't this be like having multiple "sources of truth" in the system? On one hand there's OVN injecting routes [0], that are picked up by the BGP daemon, and on the other there's a BFD daemon that will be removing them if it believes that they are unreachable. Couldn't this lead to some flapping? It shouldn't. Just in case - I'm not talking about OVN BFD for static routes feature. I mean BFD within routing daemon. BFD daemon is just a "sidecar" for the BGP to notify the latter that the connectivity is lost. After BFD detects connectivity failure it notifies BGP and it terminates BGP session and removes routes learnt from dead peer. [0] This works for both sides: for routing daemon on the "OVN side" and for external BGP speaker. This can protect against 2 types of failures: 1. L3 Gateway failure: power outage, physical disconnection, kernel panic, OVS failure, etc. If ha-chassis-group is configured, other OVN cluster nodes will detect this failure though OVS-based BFD and trigger failover to the next ha-chassis in the group. At the same time external BGP speaker will also detect that BFD se
Re: [ovs-dev] [Patch ovn v2 1/1] northd: BGP port mirroring.
On 26.07.2024 21:21, martin.kal...@canonical.com wrote: On Fri, 2024-07-26 at 21:07 +0700, Vladislav Odintsov wrote: Hi Frode, On 26.07.2024 19:17, Frode Nordahl wrote: On Fri, Jul 26, 2024 at 11:28 AM Vladislav Odintsov wrote: Hi Martin, thanks for the patch. Typically for faster BGP (or other routing protocols) convergence BFD signalling is used. Would you mind adding flows to forward BFD traffic to the same LSP as it is already done for BGP? It could be an additional option like ("enable-bfd-for-bgp") or something like this, or we can install flows unconditionally. UDP/3789 is the default BFD proto/port. BFD is indeed an important part for fast convergence of routing protocols, it is however also an important part of end to end liveness detection for a data path. In this work our goal is to exchange control plane information with an external daemon so that it can take care of the routing protocol state machine. We do want to keep the data path in OVS so that users can benefit from all the data path implementations it has to offer, including hardware acceleration. With this in mind, does it really make sense for the external daemon to speak BFD, or would it be better to integrate with the OVN managed BFD for static routes which is implemented in the OVS data path? Typically BFD for routing protocols is configured in routing daemons (on both sides of peering), because main routing daemon (e.g. bgpd) has to get notifications from BFD engine (e.g. bfdd), that the connection is lost. OVS-based BFD sessions seems nothing to do here. I proposed to install "redirect" flows similar to BGP: forward udp/3789 to dedicated LSP for routing daemon. After installing openflow control plane is not needed for BFD to work. OVS datapath in this case just forwards traffic from external network (for instance, leaf switch) to internal OVS port to routing daemon). Hope this explains the idea more clear. Wouldn't this be like having multiple "sources of truth" in the system? On one hand there's OVN injecting routes [0], that are picked up by the BGP daemon, and on the other there's a BFD daemon that will be removing them if it believes that they are unreachable. Couldn't this lead to some flapping? It shouldn't. Just in case - I'm not talking about OVN BFD for static routes feature. I mean BFD within routing daemon. BFD daemon is just a "sidecar" for the BGP to notify the latter that the connectivity is lost. After BFD detects connectivity failure it notifies BGP and it terminates BGP session and removes routes learnt from dead peer. [0] This works for both sides: for routing daemon on the "OVN side" and for external BGP speaker. This can protect against 2 types of failures: 1. L3 Gateway failure: power outage, physical disconnection, kernel panic, OVS failure, etc. If ha-chassis-group is configured, other OVN cluster nodes will detect this failure though OVS-based BFD and trigger failover to the next ha-chassis in the group. At the same time external BGP speaker will also detect that BFD session went down and terminate BGP session with routing daemon and send BGP update with removal of routes through itself, because they are not reachable anymore. Then a next by priority l3gateway claims chassis-redirect LRP and start to advertise NAT/LB VIP addresses. 2. Leaf/external router failure. In case, where we have two or more peers/leafs, these leafs can go up and down. FRR should install/delete routes though each of them as fast as possible. Here BFD also comes to help to detect failures faster than BGP keepalives do. IIUC, in current BGP integration OVN doesn't import routes learned from BGP and installed by FRR into VRF. But this should be done in future so that LR could send traffic only to alive peer. For the clarity, I've prepared a small illustration of interacting components in drawio [1] and PNG [2] formats. There are two pairs curve arrows between FRR running in a VRF and 2 external BGP speakers. The light blue and orange arrows show BGP session traffic datapaths and the dash and dash red and blue lines show BFD traffic datapaths. Please correct me if I misunderstood your point. 0: https://docs.frrouting.org/en/latest/bfd.html#bgp-bfd-configuration 1: https://s3.k2.cloud/vlodintsov/public-artifacts/ovn-native-bgp.drawio 2: https://s3.k2.cloud/vlodintsov/public-artifacts/ovn-native-bgp.drawio.png [0] https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416038.html Also, do we need to hard code on the BGP protocol or it can be generalized so that an end user can pass a proto and optionally a port to forward? This can bring OSPF or other dynamic routing protocols support. What do you think? While it is true that this may apply generally to other routing protocols, it does make it more complicated to configure. Well, I thought about it more and agree with you. To implement OSP
Re: [ovs-dev] [Patch ovn v2 1/1] northd: BGP port mirroring.
Hi Frode, On 26.07.2024 19:17, Frode Nordahl wrote: On Fri, Jul 26, 2024 at 11:28 AM Vladislav Odintsov wrote: Hi Martin, thanks for the patch. Typically for faster BGP (or other routing protocols) convergence BFD signalling is used. Would you mind adding flows to forward BFD traffic to the same LSP as it is already done for BGP? It could be an additional option like ("enable-bfd-for-bgp") or something like this, or we can install flows unconditionally. UDP/3789 is the default BFD proto/port. BFD is indeed an important part for fast convergence of routing protocols, it is however also an important part of end to end liveness detection for a data path. In this work our goal is to exchange control plane information with an external daemon so that it can take care of the routing protocol state machine. We do want to keep the data path in OVS so that users can benefit from all the data path implementations it has to offer, including hardware acceleration. With this in mind, does it really make sense for the external daemon to speak BFD, or would it be better to integrate with the OVN managed BFD for static routes which is implemented in the OVS data path? Typically BFD for routing protocols is configured in routing daemons (on both sides of peering), because main routing daemon (e.g. bgpd) has to get notifications from BFD engine (e.g. bfdd), that the connection is lost. OVS-based BFD sessions seems nothing to do here. I proposed to install "redirect" flows similar to BGP: forward udp/3789 to dedicated LSP for routing daemon. After installing openflow control plane is not needed for BFD to work. OVS datapath in this case just forwards traffic from external network (for instance, leaf switch) to internal OVS port to routing daemon). Hope this explains the idea more clear. Also, do we need to hard code on the BGP protocol or it can be generalized so that an end user can pass a proto and optionally a port to forward? This can bring OSPF or other dynamic routing protocols support. What do you think? While it is true that this may apply generally to other routing protocols, it does make it more complicated to configure. Well, I thought about it more and agree with you. To implement OSPF there must be more work done like handling multicast traffic for a specific IPv4 or IPv6 address. This could be done in a separate patch when OSPF support is needed. -- Frode Nordahl On 25.07.2024 02:02, Mark Michelson wrote: Hi Martin, thanks for the patch. I have one note below, but other than that it looks good to me. On 7/16/24 02:59, Martin Kalcok wrote: This change adds a 'bgp-mirror' option to LRP that allows mirroring of BGP control plane traffic to an arbitrary LSP in its peer LS. The option expects a string with a LSP name. When set, any traffic entering LS that's destined for any of the LRP's IP addresses (including IPv6 LLA) is redirected to the LSP specified in the option's value. This enables external BGP daemons to listen on an interface bound to a LSP and effectively act as if they were listening on (and speaking from) LRP's IP address. Signed-off-by: Martin Kalcok --- northd/northd.c | 87 + northd/ovn-northd.8.xml | 23 +++ ovn-nb.xml | 14 +++ tests/ovn-northd.at | 45 + tests/system-ovn.at | 86 5 files changed, 255 insertions(+) diff --git a/northd/northd.c b/northd/northd.c index 4353df07d..e07bf68cc 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -13048,6 +13048,92 @@ build_arp_resolve_flows_for_lrp(struct ovn_port *op, } } +static void +build_bgp_redirect_rule__( +const char *s_addr, const char *bgp_port_name, bool is_ipv6, +struct ovn_port *ls_peer, struct lflow_table *lflows, +struct ds *match, struct ds *actions) +{ +int ip_ver = is_ipv6 ? 6 : 4; +/* Redirect packets in the input pipeline destined for LR's IP to + * the port specified in 'bgp-mirror' option. + */ +ds_clear(match); +ds_clear(actions); +ds_put_format(match, "ip%d.dst == %s && tcp.dst == 179", ip_ver, s_addr); +ds_put_format(actions, "outport = \"%s\"; output;", bgp_port_name); +ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, 100, + ds_cstr(match), + ds_cstr(actions), + ls_peer->lflow_ref); + + +/* Drop any traffic originating from 'bgp-mirror' port that does + * not originate from BGP daemon port. This blocks unnecessary + * traffic like ARP broadcasts or IPv6 router solicitation packets + * from the dummy 'bgp-mirror' port. + */ +ds_clear(match); +ds_put_format(match, "inport == \"%s\"", bgp_port_name); +ovn_
Re: [ovs-dev] [Patch ovn v2 1/1] northd: BGP port mirroring.
Hi Martin, thanks for the patch. Typically for faster BGP (or other routing protocols) convergence BFD signalling is used. Would you mind adding flows to forward BFD traffic to the same LSP as it is already done for BGP? It could be an additional option like ("enable-bfd-for-bgp") or something like this, or we can install flows unconditionally. UDP/3789 is the default BFD proto/port. Also, do we need to hard code on the BGP protocol or it can be generalized so that an end user can pass a proto and optionally a port to forward? This can bring OSPF or other dynamic routing protocols support. What do you think? On 25.07.2024 02:02, Mark Michelson wrote: Hi Martin, thanks for the patch. I have one note below, but other than that it looks good to me. On 7/16/24 02:59, Martin Kalcok wrote: This change adds a 'bgp-mirror' option to LRP that allows mirroring of BGP control plane traffic to an arbitrary LSP in its peer LS. The option expects a string with a LSP name. When set, any traffic entering LS that's destined for any of the LRP's IP addresses (including IPv6 LLA) is redirected to the LSP specified in the option's value. This enables external BGP daemons to listen on an interface bound to a LSP and effectively act as if they were listening on (and speaking from) LRP's IP address. Signed-off-by: Martin Kalcok --- northd/northd.c | 87 + northd/ovn-northd.8.xml | 23 +++ ovn-nb.xml | 14 +++ tests/ovn-northd.at | 45 + tests/system-ovn.at | 86 5 files changed, 255 insertions(+) diff --git a/northd/northd.c b/northd/northd.c index 4353df07d..e07bf68cc 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -13048,6 +13048,92 @@ build_arp_resolve_flows_for_lrp(struct ovn_port *op, } } +static void +build_bgp_redirect_rule__( + const char *s_addr, const char *bgp_port_name, bool is_ipv6, + struct ovn_port *ls_peer, struct lflow_table *lflows, + struct ds *match, struct ds *actions) +{ + int ip_ver = is_ipv6 ? 6 : 4; + /* Redirect packets in the input pipeline destined for LR's IP to + * the port specified in 'bgp-mirror' option. + */ + ds_clear(match); + ds_clear(actions); + ds_put_format(match, "ip%d.dst == %s && tcp.dst == 179", ip_ver, s_addr); + ds_put_format(actions, "outport = \"%s\"; output;", bgp_port_name); + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, 100, + ds_cstr(match), + ds_cstr(actions), + ls_peer->lflow_ref); + + + /* Drop any traffic originating from 'bgp-mirror' port that does + * not originate from BGP daemon port. This blocks unnecessary + * traffic like ARP broadcasts or IPv6 router solicitation packets + * from the dummy 'bgp-mirror' port. + */ + ds_clear(match); + ds_put_format(match, "inport == \"%s\"", bgp_port_name); + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_CHECK_PORT_SEC, 80, + ds_cstr(match), + REGBIT_PORT_SEC_DROP " = 1; next;", + ls_peer->lflow_ref); + + ds_put_format(match, + " && ip%d.src == %s && tcp.src == 179", + ip_ver, + s_addr); + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_CHECK_PORT_SEC, 81, + ds_cstr(match), + REGBIT_PORT_SEC_DROP " = check_in_port_sec(); next;", + ls_peer->lflow_ref); +} + +static void +build_lrouter_bgp_redirect( + struct ovn_port *op, struct lflow_table *lflows, + struct ds *match, struct ds *actions) +{ + /* LRP has to have a peer.*/ + if (op->peer == NULL) { + return; + } + /* LRP has to have NB record.*/ + if (op->nbrp == NULL) { + return; + } + + /* Proceed only for LRPs that have 'bgp-mirror' option set. Value of this + * option is the name of LSP to which a BGP traffic will be mirrored. + */ + const char *bgp_port = smap_get(&op->nbrp->options, "bgp-mirror"); + if (bgp_port == NULL) { + return; + } + + if (op->cr_port != NULL) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); + VLOG_WARN_RL(&rl, "Option 'bgp-mirror' is not supported on" + " Distributed Gateway Port '%s'", op->key); + return; + } Somewhere around here would be a good place to ensure that "bgp_port" exists on op->peer. If the port does not exist, then print a warning and return early. It would also be a good idea to add this as part of the ovn-northd test. Set the router port's bgp_port to a nonexistent port on the connected logical switch and ensure that BGP-related logical flows are not installed. + + /* Mirror traffic destined for LRP's IPs and default BGP port + * to the port defined in 'bgp-mirror' option. + */ + for (size_t
Re: [ovs-dev] [PATCH ovn v2 4/5] northd: Add options for distributed route exchange.
Hi Frode, First of all, thanks for the patch set and for starting works with OVN native BGP integration! I’ve got some questions/comments, please see inline. regards, Vladislav Odintsov On 19 Jul 2024, at 09:10, Frode Nordahl wrote: Add three new options for Logical Router Ports that control ovn-controller route exchange for NAT addresses and LB VIPs. Load Balancers already have structured data in the Southbound database which the ovn-controller can use directly. NAT addresses are however currently only expressed as specialized rules in the Port_Binding table nat_addresses column on LSP peer records for LRPs, used for (G)ARP processing, as well as logical flow rules for OpenFlow processing. Options considered for how to redistribute these addresses to the ovn-controllers in a structured way: * Introduce even more conditional processing of the lsp nat_addresses column. * Parse ct_dnat records in the Logical_Flow table. * Add column to the Port_Binding table. * Copy the Northbound NAT table over to the Southbound database, similar to what is done with Load Balancers. * Populate Port_Binding table nat_addresses column on LRPs peer record (the proposed approach). The Port_Binding table LRP peer records nat_addresses column is currently unused, populate it with NAT addresses for route exchange, when the redistribute-nat LRP option is set to 'true'. The options are only processed for gateway routers. Signed-off-by: Frode Nordahl --- controller/pinctrl.c | 8 +-- northd/northd.c | 22 +++ ovn-nb.xml | 45 ++ ovn-sb.xml | 51 +++- tests/ovn-northd.at | 35 ++ 5 files changed, 158 insertions(+), 3 deletions(-) diff --git a/controller/pinctrl.c b/controller/pinctrl.c index 708240e24..d9ef97ce1 100644 --- a/controller/pinctrl.c +++ b/controller/pinctrl.c @@ -6428,11 +6428,15 @@ get_nat_addresses_and_keys(struct ovsdb_idl_index *sbrec_port_binding_by_name, const struct sbrec_port_binding *pb; pb = lport_lookup_by_name(sbrec_port_binding_by_name, gw_port); -if (!pb) { +if (!pb || !pb->datapath) { continue; } -if (pb->n_nat_addresses) { +/* We only want to consider nat_addresses column for LS datapaths. */ +const char *logical_switch = smap_get(&pb->datapath->external_ids, + "logical-switch"); + +if (pb->n_nat_addresses && logical_switch) { for (int i = 0; i < pb->n_nat_addresses; i++) { consider_nat_address(sbrec_port_binding_by_name, pb->nat_addresses[i], pb, diff --git a/northd/northd.c b/northd/northd.c index 6898daa00..10d78b561 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -3939,6 +3939,28 @@ sync_pb_for_lrp(struct ovn_port *op, } if (chassis_name) { smap_add(&new, "l3gateway-chassis", chassis_name); +if (smap_get_bool(&op->nbrp->options, "maintain-vrf", false)) { +smap_add(&new, "maintain-vrf", "true"); +} +if (smap_get_bool(&op->nbrp->options, + "redistribute-nat", false)) { +smap_add(&new, "redistribute-nat", "true"); + +size_t n_nats = 0; +char **nats = NULL; +nats = get_nat_addresses(op, &n_nats, false, false, NULL); +sbrec_port_binding_set_nat_addresses(op->sb, + (const char **) nats, + n_nats); +for (size_t i = 0; i < n_nats; i++) { +free(nats[i]); +} +free(nats); +} +if (smap_get_bool(&op->nbrp->options, + "redistribute-lb-vips", false)) { +smap_add(&new, "redistribute-lb-vips", "true"); +} } } diff --git a/ovn-nb.xml b/ovn-nb.xml index 9552534f6..7a5c1be57 100644 --- a/ovn-nb.xml +++ b/ovn-nb.xml @@ -3451,6 +3451,51 @@ or option. + + + + When configured the ovn-controller will redistribute + host routes to Load Balancer VIPs that are local to its chassis and + associated with the LR datapath. Though I'm not a native English speaker, it seems that this sentence should be adjusted because it is unclear what "host routes" are and why they should be redistributed TO Load Balancer VIPs. Also, datapath is an internal term, should we use just "Logical Router" instead? Also, I
[ovs-dev] [PATCH ovn] tests: Fix ssl-ciphers RO sb test with old openssl.
The test "read-only sb db:pssl access with ssl-ciphers and ssl-protocols" fails when running with openssl which doesn't support some of passed values. For instance, on openssl 1.0.2 there is no support for 'SECLEVEL' and test fails due to extra string in stderr, which is asserted as a part of test: ./ovn.at:37851: ovn-sbctl --db=ssl:127.0.0.1:$TCP_PORT \ --private-key=$PKIDIR/testpki-test-privkey.pem \ --certificate=$PKIDIR/testpki-test-cert.pem \ --ca-cert=$PKIDIR/testpki-cacert.pem \ --ssl-ciphers='HIGH:!aNULL:!MD5:@SECLEVEL=1' \ --ssl-protocols='TLSv1,TLSv1.1,TLSv1.2' \ chassis-add ch vxlan 1.2.4.8 --- - 2024-07-05 13:48:11.697647047 +0300 +++ /builddir/build/BUILD/ovn-24.03.90/tests/testsuite.dir/at-groups/520/stderr 2024-07-05 13:48:11.694353357 +0300 @@ -1,2 +1,3 @@ +2024-07-05T10:48:11Z|1|stream_ssl|ERR|SSL_CTX_set_cipher_list: error:140E6118:SSL routines:SSL_CIPHER_PROCESS_RULESTR:invalid command ovn-sbctl: transaction error: {"details":"insert operation not allowed when database server is in read only mode","error":"not allowed"} This patch fixes the test adding grep of expected transaction error. CC: Aliasgar Ginwala Fixes: 620203f9f0d9 ("Fix segfault due to ssl-ciphers.") Signed-off-by: Vladislav Odintsov --- tests/ovn.at | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tests/ovn.at b/tests/ovn.at index 87a64499f..2341f52d5 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -37854,9 +37854,9 @@ AT_CHECK([ovn-sbctl --db=ssl:127.0.0.1:$TCP_PORT \ --ca-cert=$PKIDIR/testpki-cacert.pem \ --ssl-ciphers='HIGH:!aNULL:!MD5:@SECLEVEL=1' \ --ssl-protocols='TLSv1,TLSv1.1,TLSv1.2' \ -chassis-add ch vxlan 1.2.4.8], [1], [ignore], -[ovn-sbctl: transaction error: {"details":"insert operation not allowed when database server is in read only mode","error":"not allowed"} -]) +chassis-add ch vxlan 1.2.4.8 2>&1 | grep 'transaction error]', [0], [dnl +ovn-sbctl: transaction error: {"details":"insert operation not allowed when database server is in read only mode","error":"not allowed"} +], [ignore]) OVS_APP_EXIT_AND_WAIT([ovsdb-server]) AT_CLEANUP -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v6 0/2] Add support to disable VXLAN mode.
Hi Alexey, The discussion for explicit configuration vs automatic determining VXLAN mode can be found reading the next thread [1]. 1: https://mail.openvswitch.org/pipermail/ovs-dev/2024-April/412986.html regards, Vladislav Odintsov -Original Message- From: dev on behalf of Aleksey Baulin via dev Reply to: Aleksey Baulin Date: Friday, 5 July 2024 at 14:32 To: "ovs-dev@openvswitch.org" Subject: Re: [ovs-dev] [PATCH ovn v6 0/2] Add support to disable VXLAN mode. I missed the discussion on this patch set. I can see that it's been accepted. Still, I'd like to ask a question. Originally there was the VTEP-only VxLAN scenario that worked great in OVN. Then, the patch by Ihar Hrachyshka provided support for internal VxLANs. The tunnel id space was severely limited which, in turn, became the limitation for the VTEP-only VxLAN scenario as well. In other words, the patch that introduced internal VxLANs broke the scenario for VTEP-only VxLANs. In essence, Vladislav Odintsov asserts that the patch by Ihar Hrachyshka introduced an implicit configuration option the state of which was determined in software (the function is_vxlan_mode()). The new patch by Vladislav Odintsov makes that option explicit. That makes the behavior of a configured cluster completely depend on its value - as opposed to determining the behavior in software from configuration. On the one hand, that makes it simple - one option to control whether the cluster supports internal VxLANs or it is VTEP-only VxLANs. On the other hand I can't help but wonder - why can't that be determined in software - like in the original patch? I would think that all necessary data is present in a configuration and there's no need for an extra explicit option. With the new patch from Vladislav Odintsov it can be done just once in one place. So the question is: what are the pros and cons of each variant? Why is the variant with a new option chosen? Thanks! ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v6 0/2] Add support to disable VXLAN mode.
Thank you all! regards, Vladislav Odintsov -Original Message- From: dev on behalf of Dumitru Ceara Date: Friday, 28 June 2024 at 12:03 To: Vladislav Odintsov , "d...@openvswitch.org" Subject: Re: [ovs-dev] [PATCH ovn v6 0/2] Add support to disable VXLAN mode. On 6/7/24 15:54, Vladislav Odintsov wrote: > v6: > - Addressed Mark's review comments: > 1. Removed global variable "vxlan_mode" change from "global" engine node. > 2. Configuration knob "disable_vxlan_mode" was renamed to "vxlan_mode" > v5: > - Addressed Ihar's review comments: > 1. fixed errors after incorrect conflicts solving on rebase; > 2. changed VXLAN mode naming to capitalized; > 3. clarified VXLAN mode in ovn-architecture man page. > v4: > - Addressed Dumitru's and Ihar's review comments; > - single patch was split into two: > 1. function call replaced with a global variable `vxlan_mode`; > 2. introduced `disable_vxlan_mode` configuration knob; > - rebased onto latest main branch. > v3: > - Removed accidental ovs submodule change. > v2: > - Added NEWS item. > > Vladislav Odintsov (2): > northd: Make `vxlan_mode` a global variable. > northd: Add support for disabling vxlan mode. > Thanks, Vladislav, Ihar and Ales! I applied this series to main. Best regards, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v5 1/2] northd: Make `vxlan_mode` a global variable.
I’ve posted new version (v6) of patch set: https://patchwork.ozlabs.org/project/ovn/list/?series=410010 > On 6 Jun 2024, at 22:40, Ihar Hrachyshka wrote: > > On Thu, Jun 6, 2024 at 1:27 AM Vladislav Odintsov <mailto:odiv...@gmail.com>> wrote: > >> Thanks Mark for such a detailed answer! >> >> I agree with your points, and also was thinking about them, but could not >> value their importance in terms of I-P logic. You helped with that. >> >> I’d prefer to apply my proposal to revert back to “bool is_vxlan_mode()” >> to make the “global” a true global. Will submit v6. >> >> > Happy we are doing it. Mark is more eloquent than me. :) > > >> regards, >> Vladislav Odintsov >> >>> On 5 Jun 2024, at 23:13, Mark Michelson wrote: >>> >>> On 6/5/24 08:51, Vladislav Odintsov wrote: >>>> Hi Mark, >>>> Thanks for the review! >>>> Please, see below. >>>> regards, >>>> Vladislav Odintsov >>>> -Original Message- >>>> From: Mark Michelson >>>> Date: Tuesday, 4 June 2024 at 03:45 >>>> To: Vladislav Odintsov , >>>> Subject: Re: [ovs-dev] [PATCH ovn v5 1/2] northd: Make `vxlan_mode` a >> global variable. >>>>Hi Vladislav, >>>>Generally speaking, I agree with this change. However, I think that >>>>setting a global variable from an incremental processing engine node >>>>runner feels wrong. >>>> The init_vxlan_mode() is called inside the en_global_config_run() only >> to >>>> initialize global value, which is then read by >> get_ovn_max_dp_key_local() to >>>> fill the "max_tunid" variable inside incremental processing engine node. >>>> Which drawbacks do you see of such variable initialization? >>> >>> The biggest drawbacks are: >>> * Reasoning about "ownership" of the vxlan_mode global variable >>> * Maintenance of the en_global_config I-P engine node. >>> >>> On the first point, since vxlan_mode is a global variable in northd.c, >> it's not obvious that the owner of this data is the en_global_config engine >> node. It's an easy mistake for someone to reference the variable before it >> has been initialized, for instance. However, if the boolean is on the >> ed_type_global_config struct, then it's clear to see that this data is >> scoped to the en_global_config engine node. >>> >>> On the second point, if someone were to overhaul the en_global_config >> engine node, it would be an easy mistake to make to not notice that >> vxlan_mode is being set by the engine node. I could see a developer >> splitting the node into separate nodes, for instance. In doing so, the >> developer could easily miss that the global vxlan_mode is being set by the >> engine node, since it's hidden behind an init_ function call. However, >> placing vxlan_mode on the ed_type_global_config makes it more clear that >> the en_global_config engine node is responsible for setting the value. >> >>> >>>>I think that instead, the "vxlan_mode" variable you have introduced >>>>should be a field on struct ed_type_global_config. This way, the >> engine >>>>node is only modifying data local to itself. >>>> I guess, that moving this to the struct ed_type_global_config will make >> the code >>>> a bit more complex: we have to pass this variable through all function >> calls to >>>> be able to read vxlan_mode value inside >> ovn_datapath_assign_requested_tnl_id(), >>>> ovn_port_assign_requested_tnl_id() and ovn_port_allocate_key(). >>> >>> I think dependency injection makes the code easier to read, understand, >> and maintain rather than making it more complex. It's clearer that the data >> from the en_global_config engine node is needed in all of the functions you >> listed if those functions require an ed_type_global_config argument. >>> >>>> Apart of this, the "vxlan_mode" variable has the same "global" meaning >> as >>>> "use_ct_inv_match", "check_lsp_is_up", "use_common_zone" and other >> global >>>> variables, which configure the global OVN behaviour. The difference is >> that it >>>> is required to read its value inside the en_global_config_run() to >> reflect the >>>> max_tunid back to NB_Global. >>> >>> Personally, I don't l
[ovs-dev] [PATCH ovn v6 2/2] northd: Add support for disabling vxlan mode.
Commit [1] introduced a "VXLAN mode" concept. It brought a limitation for available tunnel IDs because of lack of space in VXLAN VNI. In VXLAN mode OVN is limited by 4095 datapaths (LRs or non-transit LSs) and 2047 logical ports per datapath. Prior to this patch VXLAN mode was enabled automatically if at least one chassis had encap of VXLAN type. In scenarios where one want to use VXLAN only for HW VTEP (RAMP) switch, such limitation makes no sence. This patch adds support for explicit disabling of VXLAN mode via Northbound database. 1: https://github.com/ovn-org/ovn/commit/b07f1bc3d068 Acked-By: Ihar Hrachyshka Fixes: b07f1bc3d068 ("Add VXLAN support for non-VTEP datapath bindings") Signed-off-by: Vladislav Odintsov --- NEWS | 4 northd/en-global-config.c | 8 +++- northd/northd.c | 10 -- northd/northd.h | 3 ++- ovn-architecture.7.xml| 6 ++ ovn-nb.xml| 10 ++ tests/ovn-northd.at | 29 + 7 files changed, 66 insertions(+), 4 deletions(-) diff --git a/NEWS b/NEWS index 3bdc55172..aa1669d9c 100644 --- a/NEWS +++ b/NEWS @@ -31,6 +31,10 @@ Post v24.03.0 has been renamed to "options:ic-route-denylist" in order to comply with inclusive language guidelines. The previous name is still recognized to aid with backwards compatibility. + - Added new global config option NB_Global:options:vxlan_mode to support +ability to disable "VXLAN mode" to extend available tunnel IDs space for +datapaths from 4095 to 16711680. For more details see man ovn-nb(5) for +mentioned option. OVN v24.03.0 - 01 Mar 2024 -- diff --git a/northd/en-global-config.c b/northd/en-global-config.c index df0f8e58c..784538a14 100644 --- a/northd/en-global-config.c +++ b/northd/en-global-config.c @@ -117,7 +117,8 @@ en_global_config_run(struct engine_node *node , void *data) char *max_tunid = xasprintf("%d", get_ovn_max_dp_key_local( -is_vxlan_mode(sbrec_chassis_table))); +is_vxlan_mode(&nb->options, + sbrec_chassis_table))); smap_replace(options, "max_tunid", max_tunid); free(max_tunid); @@ -534,6 +535,11 @@ check_nb_options_out_of_sync(const struct nbrec_nb_global *nb, return true; } +if (config_out_of_sync(&nb->options, &config_data->nb_options, + "vxlan_mode", false)) { +return true; +} + return false; } diff --git a/northd/northd.c b/northd/northd.c index 6d118a19a..a4937b472 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -886,8 +886,13 @@ join_datapaths(const struct nbrec_logical_switch_table *nbrec_ls_table, } bool -is_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) +is_vxlan_mode(const struct smap *nb_options, + const struct sbrec_chassis_table *sbrec_chassis_table) { +if (!smap_get_bool(nb_options, "vxlan_mode", true)) { +return false; +} + const struct sbrec_chassis *chassis; SBREC_CHASSIS_TABLE_FOR_EACH (chassis, sbrec_chassis_table) { for (int i = 0; i < chassis->n_encaps; i++) { @@ -17605,7 +17610,8 @@ ovnnb_db_run(struct northd_input *input_data, use_common_zone = smap_get_bool(input_data->nb_options, "use_common_zone", false); -vxlan_mode = is_vxlan_mode(input_data->sbrec_chassis_table); +vxlan_mode = is_vxlan_mode(input_data->nb_options, + input_data->sbrec_chassis_table); build_datapaths(ovnsb_txn, input_data->nbrec_logical_switch_table, diff --git a/northd/northd.h b/northd/northd.h index 987f82954..2f2fdb673 100644 --- a/northd/northd.h +++ b/northd/northd.h @@ -790,7 +790,8 @@ lr_has_multiple_gw_ports(const struct ovn_datapath *od) } bool -is_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table); +is_vxlan_mode(const struct smap *nb_options, + const struct sbrec_chassis_table *sbrec_chassis_table); uint32_t get_ovn_max_dp_key_local(bool _vxlan_mode); diff --git a/ovn-architecture.7.xml b/ovn-architecture.7.xml index e32d1a9f7..640944faf 100644 --- a/ovn-architecture.7.xml +++ b/ovn-architecture.7.xml @@ -2920,4 +2920,10 @@ the future, gateways that do not support encapsulations with large amounts of metadata may continue to have a reduced feature set. + +VXLAN mode is recommended to be disabled if VXLAN encap at +hypervisors is needed only to support HW VTEP L2 Gateway functionality. +See man ovn-nb(5) for table NB_Global column +options key vxlan_mode for more details. + diff --git a/ovn-nb.xml b/ovn-nb.xml inde
[ovs-dev] [PATCH ovn v6 1/2] northd: Make `vxlan_mode` a global variable.
This simplifies code and subsequent commit to explicitely disable VXLAN mode is based on these changes. Also "VXLAN mode" term is introduced in ovn-architecture man page. Signed-off-by: Vladislav Odintsov --- northd/en-global-config.c | 3 +- northd/northd.c | 76 --- northd/northd.h | 5 ++- ovn-architecture.7.xml| 10 +++--- 4 files changed, 41 insertions(+), 53 deletions(-) diff --git a/northd/en-global-config.c b/northd/en-global-config.c index 28c78a12c..df0f8e58c 100644 --- a/northd/en-global-config.c +++ b/northd/en-global-config.c @@ -116,7 +116,8 @@ en_global_config_run(struct engine_node *node , void *data) } char *max_tunid = xasprintf("%d", -get_ovn_max_dp_key_local(sbrec_chassis_table)); +get_ovn_max_dp_key_local( +is_vxlan_mode(sbrec_chassis_table))); smap_replace(options, "max_tunid", max_tunid); free(max_tunid); diff --git a/northd/northd.c b/northd/northd.c index 9f81afccb..6d118a19a 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -90,6 +90,10 @@ static bool use_ct_inv_match = true; */ static bool default_acl_drop; +/* If this option is 'true' northd will use limited 24-bit space for datapath + * and ports tunnel key allocation (12 bits for each instead of default 16). */ +static bool vxlan_mode; + #define MAX_OVN_TAGS 4096 @@ -881,7 +885,7 @@ join_datapaths(const struct nbrec_logical_switch_table *nbrec_ls_table, } } -static bool +bool is_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) { const struct sbrec_chassis *chassis; @@ -896,25 +900,22 @@ is_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) } uint32_t -get_ovn_max_dp_key_local(const struct sbrec_chassis_table *sbrec_chassis_table) +get_ovn_max_dp_key_local(bool _vxlan_mode) { -if (is_vxlan_mode(sbrec_chassis_table)) { -/* OVN_MAX_DP_GLOBAL_NUM doesn't apply for vxlan mode. */ +if (_vxlan_mode) { +/* OVN_MAX_DP_GLOBAL_NUM doesn't apply for VXLAN mode. */ return OVN_MAX_DP_VXLAN_KEY; } return OVN_MAX_DP_KEY - OVN_MAX_DP_GLOBAL_NUM; } static void -ovn_datapath_allocate_key(const struct sbrec_chassis_table *sbrec_ch_table, - struct hmap *datapaths, struct hmap *dp_tnlids, +ovn_datapath_allocate_key(struct hmap *datapaths, struct hmap *dp_tnlids, struct ovn_datapath *od, uint32_t *hint) { if (!od->tunnel_key) { od->tunnel_key = ovn_allocate_tnlid(dp_tnlids, "datapath", -OVN_MIN_DP_KEY_LOCAL, -get_ovn_max_dp_key_local(sbrec_ch_table), -hint); +OVN_MIN_DP_KEY_LOCAL, get_ovn_max_dp_key_local(vxlan_mode), hint); if (!od->tunnel_key) { if (od->sb) { sbrec_datapath_binding_delete(od->sb); @@ -927,7 +928,6 @@ ovn_datapath_allocate_key(const struct sbrec_chassis_table *sbrec_ch_table, static void ovn_datapath_assign_requested_tnl_id( -const struct sbrec_chassis_table *sbrec_chassis_table, struct hmap *dp_tnlids, struct ovn_datapath *od) { const struct smap *other_config = (od->nbs @@ -936,8 +936,7 @@ ovn_datapath_assign_requested_tnl_id( uint32_t tunnel_key = smap_get_int(other_config, "requested-tnl-key", 0); if (tunnel_key) { const char *interconn_ts = smap_get(other_config, "interconn-ts"); -if (!interconn_ts && is_vxlan_mode(sbrec_chassis_table) && -tunnel_key >= 1 << 12) { +if (!interconn_ts && vxlan_mode && tunnel_key >= 1 << 12) { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); VLOG_WARN_RL(&rl, "Tunnel key %"PRIu32" for datapath %s is " "incompatible with VXLAN", tunnel_key, @@ -985,7 +984,6 @@ build_datapaths(struct ovsdb_idl_txn *ovnsb_txn, const struct nbrec_logical_switch_table *nbrec_ls_table, const struct nbrec_logical_router_table *nbrec_lr_table, const struct sbrec_datapath_binding_table *sbrec_dp_table, -const struct sbrec_chassis_table *sbrec_chassis_table, struct ovn_datapaths *ls_datapaths, struct ovn_datapaths *lr_datapaths, struct ovs_list *lr_list) @@ -1000,12 +998,11 @@ build_datapaths(struct ovsdb_idl_txn *ovnsb_txn, struct hmap dp_tnlids = HMAP_INITIALIZER(&dp_tnlids); struct ovn_datapath *od; LIST_FOR_EACH (od, list, &both) { -ovn_datapath_assign_requested_tnl_id(sbrec_chassis_table, &dp_tnlids, -
[ovs-dev] [PATCH ovn v6 0/2] Add support to disable VXLAN mode.
v6: - Addressed Mark's review comments: 1. Removed global variable "vxlan_mode" change from "global" engine node. 2. Configuration knob "disable_vxlan_mode" was renamed to "vxlan_mode" v5: - Addressed Ihar's review comments: 1. fixed errors after incorrect conflicts solving on rebase; 2. changed VXLAN mode naming to capitalized; 3. clarified VXLAN mode in ovn-architecture man page. v4: - Addressed Dumitru's and Ihar's review comments; - single patch was split into two: 1. function call replaced with a global variable `vxlan_mode`; 2. introduced `disable_vxlan_mode` configuration knob; - rebased onto latest main branch. v3: - Removed accidental ovs submodule change. v2: - Added NEWS item. Vladislav Odintsov (2): northd: Make `vxlan_mode` a global variable. northd: Add support for disabling vxlan mode. NEWS | 4 ++ northd/en-global-config.c | 9 - northd/northd.c | 84 +-- northd/northd.h | 6 ++- ovn-architecture.7.xml| 16 +--- ovn-nb.xml| 10 + tests/ovn-northd.at | 29 ++ 7 files changed, 104 insertions(+), 54 deletions(-) -- 2.44.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v5 1/2] northd: Make `vxlan_mode` a global variable.
Thanks Mark for such a detailed answer! I agree with your points, and also was thinking about them, but could not value their importance in terms of I-P logic. You helped with that. I’d prefer to apply my proposal to revert back to “bool is_vxlan_mode()” to make the “global” a true global. Will submit v6. regards, Vladislav Odintsov > On 5 Jun 2024, at 23:13, Mark Michelson wrote: > > On 6/5/24 08:51, Vladislav Odintsov wrote: >> Hi Mark, >> Thanks for the review! >> Please, see below. >> regards, >> Vladislav Odintsov >> -Original Message- >> From: Mark Michelson >> Date: Tuesday, 4 June 2024 at 03:45 >> To: Vladislav Odintsov , >> Subject: Re: [ovs-dev] [PATCH ovn v5 1/2] northd: Make `vxlan_mode` a global >> variable. >> Hi Vladislav, >> Generally speaking, I agree with this change. However, I think that >> setting a global variable from an incremental processing engine node >> runner feels wrong. >> The init_vxlan_mode() is called inside the en_global_config_run() only to >> initialize global value, which is then read by get_ovn_max_dp_key_local() to >> fill the "max_tunid" variable inside incremental processing engine node. >> Which drawbacks do you see of such variable initialization? > > The biggest drawbacks are: > * Reasoning about "ownership" of the vxlan_mode global variable > * Maintenance of the en_global_config I-P engine node. > > On the first point, since vxlan_mode is a global variable in northd.c, it's > not obvious that the owner of this data is the en_global_config engine node. > It's an easy mistake for someone to reference the variable before it has been > initialized, for instance. However, if the boolean is on the > ed_type_global_config struct, then it's clear to see that this data is scoped > to the en_global_config engine node. > > On the second point, if someone were to overhaul the en_global_config engine > node, it would be an easy mistake to make to not notice that vxlan_mode is > being set by the engine node. I could see a developer splitting the node into > separate nodes, for instance. In doing so, the developer could easily miss > that the global vxlan_mode is being set by the engine node, since it's hidden > behind an init_ function call. However, placing vxlan_mode on the > ed_type_global_config makes it more clear that the en_global_config engine > node is responsible for setting the value. > >> I think that instead, the "vxlan_mode" variable you have introduced >> should be a field on struct ed_type_global_config. This way, the engine >> node is only modifying data local to itself. >> I guess, that moving this to the struct ed_type_global_config will make the >> code >> a bit more complex: we have to pass this variable through all function calls >> to >> be able to read vxlan_mode value inside >> ovn_datapath_assign_requested_tnl_id(), >> ovn_port_assign_requested_tnl_id() and ovn_port_allocate_key(). > > I think dependency injection makes the code easier to read, understand, and > maintain rather than making it more complex. It's clearer that the data from > the en_global_config engine node is needed in all of the functions you listed > if those functions require an ed_type_global_config argument. > >> Apart of this, the "vxlan_mode" variable has the same "global" meaning as >> "use_ct_inv_match", "check_lsp_is_up", "use_common_zone" and other global >> variables, which configure the global OVN behaviour. The difference is that >> it >> is required to read its value inside the en_global_config_run() to reflect >> the >> max_tunid back to NB_Global. > > Personally, I don't like those global variables either :) > > But those globals are also set within northd.c, and are initialized at the > begining of a DB run. From the perspective of northd processing, they are > truly "global" in their scope. The engine nodes form a dependency tree, and > so it's important that data that is scoped to a particular node is housed in > that engine node's data. This way, when nodes are being created, it's clear > to know which other engine nodes they depend on. If engine nodes are setting > global variables, then it becomes harder to understand the dependencies. >> If the global variable setting is totally not acceptable from engine node, I >> can propose another approach here. Revert init_vxlan_mode() back to >> `bool is_vxlan_mode()` and assign global variable outside of global engine >>
Re: [ovs-dev] [PATCH ovn v5 2/2] northd: Add support for disabling vxlan mode.
Hi Mark, Please see below. regards, Vladislav Odintsov -Original Message- From: dev on behalf of Mark Michelson Date: Tuesday, 4 June 2024 at 03:45 To: Vladislav Odintsov , "d...@openvswitch.org" Subject: Re: [ovs-dev] [PATCH ovn v5 2/2] northd: Add support for disabling vxlan mode. Hi Vladislav, I have just one comment below. On 5/3/24 04:13, Vladislav Odintsov wrote: > Commit [1] introduced a "VXLAN mode" concept. It brought a limitation > for available tunnel IDs because of lack of space in VXLAN VNI. > In VXLAN mode OVN is limited by 4095 datapaths (LRs or non-transit LSs) > and 2047 logical ports per datapath. > > Prior to this patch VXLAN mode was enabled automatically if at least one > chassis had encap of VXLAN type. In scenarios where one want to use > VXLAN only for HW VTEP (RAMP) switch, such limitation makes no sence. > > This patch adds support for explicit disabling of VXLAN mode via > Northbound database. > > 1: https://github.com/ovn-org/ovn/commit/b07f1bc3d068 > > Acked-By: Ihar Hrachyshka > Fixes: b07f1bc3d068 ("Add VXLAN support for non-VTEP datapath bindings") > Signed-off-by: Vladislav Odintsov > --- > NEWS | 4 > northd/en-global-config.c | 7 ++- > northd/northd.c | 10 -- > northd/northd.h | 3 ++- > ovn-architecture.7.xml| 6 ++ > ovn-nb.xml| 10 ++ > tests/ovn-northd.at | 29 + > 7 files changed, 65 insertions(+), 4 deletions(-) > > diff --git a/NEWS b/NEWS > index 3b5e93dc9..007b27f3d 100644 > --- a/NEWS > +++ b/NEWS > @@ -17,6 +17,10 @@ Post v24.03.0 > external-ids, the option is no longer needed as it became effectively > "true" for all scenarios. > - Added DHCPv4 relay support. > + - Added new global config option NB_Global:options:disable_vxlan_mode to > +extend available tunnel IDs space for datapaths from 4095 to 16711680 > +when running in "VXLAN mode". For more details see man ovn-nb(5) for > +mentioned option. > > OVN v24.03.0 - 01 Mar 2024 > -- > diff --git a/northd/en-global-config.c b/northd/en-global-config.c > index 873649a89..f5e2a8154 100644 > --- a/northd/en-global-config.c > +++ b/northd/en-global-config.c > @@ -115,7 +115,7 @@ en_global_config_run(struct engine_node *node , void *data) >config_data->svc_monitor_mac); > } > > -init_vxlan_mode(sbrec_chassis_table); > +init_vxlan_mode(&nb->options, sbrec_chassis_table); > char *max_tunid = xasprintf("%d", get_ovn_max_dp_key_local()); > smap_replace(options, "max_tunid", max_tunid); > free(max_tunid); > @@ -533,6 +533,11 @@ check_nb_options_out_of_sync(const struct nbrec_nb_global *nb, > return true; > } > > +if (config_out_of_sync(&nb->options, &config_data->nb_options, > + "disable_vxlan_mode", false)) { > +return true; > +} > + > return false; > } > > diff --git a/northd/northd.c b/northd/northd.c > index 0e0ae24db..7bdffe531 100644 > --- a/northd/northd.c > +++ b/northd/northd.c > @@ -886,8 +886,14 @@ join_datapaths(const struct nbrec_logical_switch_table *nbrec_ls_table, > } > > void > -init_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) > +init_vxlan_mode(const struct smap *nb_options, > +const struct sbrec_chassis_table *sbrec_chassis_table) > { > +if (smap_get_bool(nb_options, "disable_vxlan_mode", false)) { > +vxlan_mode = false; > +return; > +} > + > const struct sbrec_chassis *chassis; > SBREC_CHASSIS_TABLE_FOR_EACH (chassis, sbrec_chassis_table) { > for (int i = 0; i < chassis->n_encaps; i++) { > @@ -17596,7 +17602,7 @@ ovnnb_db_run(struct northd_input *input_data, > use_common_zone = smap_get_bool(input_data->nb_options, "use_common_zone", > false); > > -init_vxlan_mode(input_data->sbrec_chassis_table); > +init_vxlan_mode(input_data->nb_options, input_data->sbrec_chassis_table); >
Re: [ovs-dev] [PATCH ovn v5 1/2] northd: Make `vxlan_mode` a global variable.
Hi Mark, Thanks for the review! Please, see below. regards, Vladislav Odintsov -Original Message- From: Mark Michelson Date: Tuesday, 4 June 2024 at 03:45 To: Vladislav Odintsov , Subject: Re: [ovs-dev] [PATCH ovn v5 1/2] northd: Make `vxlan_mode` a global variable. Hi Vladislav, Generally speaking, I agree with this change. However, I think that setting a global variable from an incremental processing engine node runner feels wrong. The init_vxlan_mode() is called inside the en_global_config_run() only to initialize global value, which is then read by get_ovn_max_dp_key_local() to fill the "max_tunid" variable inside incremental processing engine node. Which drawbacks do you see of such variable initialization? I think that instead, the "vxlan_mode" variable you have introduced should be a field on struct ed_type_global_config. This way, the engine node is only modifying data local to itself. I guess, that moving this to the struct ed_type_global_config will make the code a bit more complex: we have to pass this variable through all function calls to be able to read vxlan_mode value inside ovn_datapath_assign_requested_tnl_id(), ovn_port_assign_requested_tnl_id() and ovn_port_allocate_key(). Apart of this, the "vxlan_mode" variable has the same "global" meaning as "use_ct_inv_match", "check_lsp_is_up", "use_common_zone" and other global variables, which configure the global OVN behaviour. The difference is that it is required to read its value inside the en_global_config_run() to reflect the max_tunid back to NB_Global. If the global variable setting is totally not acceptable from engine node, I can propose another approach here. Revert init_vxlan_mode() back to `bool is_vxlan_mode()` and assign global variable outside of global engine node: diff --git a/northd/en-global-config.c b/northd/en-global-config.c index 873649a89..df0f8e58c 100644 --- a/northd/en-global-config.c +++ b/northd/en-global-config.c @@ -115,8 +115,9 @@ en_global_config_run(struct engine_node *node , void *data) config_data->svc_monitor_mac); } -init_vxlan_mode(sbrec_chassis_table); -char *max_tunid = xasprintf("%d", get_ovn_max_dp_key_local()); +char *max_tunid = xasprintf("%d", +get_ovn_max_dp_key_local( +is_vxlan_mode(sbrec_chassis_table))); smap_replace(options, "max_tunid", max_tunid); free(max_tunid); diff --git a/northd/northd.c b/northd/northd.c index 0e0ae24db..9ac608f03 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -885,25 +885,24 @@ join_datapaths(const struct nbrec_logical_switch_table *nbrec_ls_table, } } -void -init_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) +bool +is_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) { const struct sbrec_chassis *chassis; SBREC_CHASSIS_TABLE_FOR_EACH (chassis, sbrec_chassis_table) { for (int i = 0; i < chassis->n_encaps; i++) { if (!strcmp(chassis->encaps[i]->type, "vxlan")) { -vxlan_mode = true; -return; +return true; } } } -vxlan_mode = false; +return false; } uint32_t -get_ovn_max_dp_key_local(void) +get_ovn_max_dp_key_local(bool _vxlan_mode) { -if (vxlan_mode) { +if (_vxlan_mode) { /* OVN_MAX_DP_GLOBAL_NUM doesn't apply for VXLAN mode. */ return OVN_MAX_DP_VXLAN_KEY; } @@ -916,9 +915,7 @@ ovn_datapath_allocate_key(struct hmap *datapaths, struct hmap *dp_tnlids, { if (!od->tunnel_key) { od->tunnel_key = ovn_allocate_tnlid(dp_tnlids, "datapath", -OVN_MIN_DP_KEY_LOCAL, -get_ovn_max_dp_key_local(), -hint); +OVN_MIN_DP_KEY_LOCAL, get_ovn_max_dp_key_local(vxlan_mode), hint); if (!od->tunnel_key) { if (od->sb) { sbrec_datapath_binding_delete(od->sb); @@ -17596,7 +17593,7 @@ ovnnb_db_run(struct northd_input *input_data, use_common_zone = smap_get_bool(input_data->nb_options, "use_common_zone", false); -init_vxlan_mode(input_data->sbrec_chassis_table); +vxlan_mode = is_vxlan_mode(input_data->sbrec_chassis_table); build_datapaths(ovnsb_txn, input_data->nbrec_logical_switch_table, diff --git a/northd/northd.h b/northd/northd.h index be480003e..c613652e9 100644 --- a/northd/northd.h +++ b/northd/northd.h @@ -791,9 +791,9 @@ lr_has_multiple_gw_ports(const struct ovn_datapath *od) return od->n_l3dgw_ports > 1 && !od->is_gw_router; } -vo
Re: [ovs-dev] [PATCH] python: idl: Fix index not being updated on row modification.
Hi Ilya, Thanks for the fix! Some time ago we internally noticed a problem with index updates, which was not a real issue, but I tried your fix and it fixes our original issue. How we observed that issue: In terminal one: ovn-nbctl ls-add test In terminal two: Run python, load IDL and query the name for the created object (ls "test") (we use ovsdbapp, so example uses it as well): >>> from ovsdbapp.backend.ovs_idl import connection, idlutils >>> import ovsdbapp.schema.ovn_northbound.impl_idl as nb_idl >>> >>> idl = connection.OvsdbIdl.from_server("tcp:127.0.0.1:6641", >>> "OVN_Northbound") >>> api_idl = nb_idl.OvnNbApiIdlImpl(connection.Connection(idl, 100)) >>> >>> sw = api_idl.ls_get("test").execute().name >>> 'test' Than switch back to first terminal and change ls 'name' (which is an indexed field): ovn-nbctl set logical-switch test name=test2 Switch back to python terminal and try to get the name again. In case of affected python-ovs the old instance "test" returns new name "test2" from "test" instance and "test2" instance is not accessible: >>> sw = api_idl.ls_get("test").execute().name >>> 'test2' >>> sw = api_idl.ls_get("test2").execute().name Traceback (most recent call last): File "", line 1, in AttributeError: 'NoneType' object has no attribute 'name' With your patch it works as expected: >>> sw = api_idl.ls_get("test").execute().name Traceback (most recent call last): File "", line 1, in AttributeError: 'NoneType' object has no attribute 'name' >>> sw = api_idl.ls_get("test2").execute().name >>> 'test2' I just wanted to share our experience with this problem and patch. You can add this to OVS python tests, if you consider it's worth it. Thanks again :) regards, Vladislav Odintsov -Original Message- From: dev on behalf of Ilya Maximets Date: Tuesday, 28 May 2024 at 00:39 To: "ovs-dev@openvswitch.org" Cc: Ilya Maximets , Dumitru Ceara Subject: [ovs-dev] [PATCH] python: idl: Fix index not being updated on row modification. When a row is modified, python IDL doesn't perform any operations on existing client-side indexes. This means that if the column on which index is created changes, the old value will remain in the index and the new one will not be added to the index. Beside lookup failures this is also causing inability to remove modified rows, because the new column value doesn't exist in the index causing an exception on attempt to remove it: Traceback (most recent call last): File "ovsdbapp/backend/ovs_idl/connection.py", line 110, in run self.idl.run() File "ovs/db/idl.py", line 465, in run self.__parse_update(msg.params[2], OVSDB_UPDATE3) File "ovs/db/idl.py", line 924, in __parse_update self.__do_parse_update(update, version, self.tables) File "ovs/db/idl.py", line 964, in __do_parse_update changes = self.__process_update2(table, uuid, row_update) File "ovs/db/idl.py", line 991, in __process_update2 del table.rows[uuid] File "ovs/db/custom_index.py", line 102, in __delitem__ index.remove(val) File "ovs/db/custom_index.py", line 66, in remove self.values.remove(self.index_entry_from_row(row)) File "sortedcontainers/sortedlist.py", line 2015, in remove raise ValueError('{0!r} not in list'.format(value)) ValueError: Datapath_Binding( uuid=UUID('498e66a2-70bc-4587-a66f-0433baf82f60'), tunnel_key=16711683, load_balancers=[], external_ids={}) not in list Fix that by always removing an existing row from indexes before modification and adding back afterwards. This ensures that old values are removed from the index and new ones are added. This behavior is consistent with the C implementation. The new test that reproduces the removal issue is added. Some extra testing infrastructure added to be able to handle and print out the 'indexed' table from the idltest schema. Fixes: 13973bc41524 ("Add multi-column index support for the Python IDL") Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2024-May/053159.html Reported-by: Roberto Bartzen Acosta Signed-off-by: Ilya Maximets --- python/ovs/db/idl.py | 13 -- tests/ovsdb-idl.at | 95 +++- tests/test-ovsdb.c | 43 tests/test-ovsdb.py | 15 +++ 4 files ch
Re: [ovs-dev] OVS 3.3.1 release date
Thanks Ilya for clarification! Looking forward to see new release. regards, Vladislav Odintsov -Original Message- From: dev on behalf of Ilya Maximets Date: Wednesday, 22 May 2024 at 18:27 To: Vladislav Odintsov , "ovs-dev@openvswitch.org" Cc: "i.maxim...@ovn.org" , David Marchand Subject: Re: [ovs-dev] OVS 3.3.1 release date On 5/22/24 16:17, Vladislav Odintsov wrote: > Hi all, > > I’m wondering whether there is a planned date for OVS 3.3.1 release? > > Currently there are a lot of useful bugfixes in branch-3.3 above 3.3.1 > tag and latest release was on the 17th of February (>3 months ago). Hi. I plan to make a series of releases in the next couple of weeks, ideally by the end of May, but maybe the first week of June. The current plan is try to incorporate at least partially David's fixes: https://patchwork.ozlabs.org/project/openvswitch/list/?series=403694 And get DPDK update to the recently released v23.11.1. But I think we'll need to make stable releases even if we will not be able to incorporate changes above in time. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] OVS 3.3.1 release date
Hi all, I’m wondering whether there is a planned date for OVS 3.3.1 release? Currently there are a lot of useful bugfixes in branch-3.3 above 3.3.1 tag and latest release was on the 17th of February (>3 months ago). regards, Vladislav Odintsov ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v2] controller: Store src_mac, src_ip in svc_monitor struct.
Again adding forgotten tag: Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2024-April/413198.html regards, Vladislav Odintsov On 22.05.2024, 15:19, "Vladislav Odintsov" wrote: These structure members are read in pinctrl_handler() in a separare thread. This is unsafe: when IDL is re-connected or some IDL objects are freed after svc_monitors list with svc_monitor structures, which point to sbrec_service_monitor is initialized, sb_svc_mon structure property can point to wrong address, which then leads to segmentation fault in svc_monitor_send_tcp_health_check__() and svc_monitor_send_udp_health_check() on accessing svc_mon->sb_svc_mon. Imagine situation: Main ovn-controller thread: 1. Connects to SB and fills IDL with DB contents. 2. run pinctrl_run() with pinctrl mutex and sync_svc_monitors() as a part of it. 3. Release mutex. ... 4. Loss of OVNSB connection for any reason (network outage/inactivity probe timeout/etc), start new main-loop iteration, re-initialize IDL in ovsdb_idl_run() (which probably will destroy previous IDL objects). ... pinctrl thread: 5. Awake from poll_block(). ... run new iteration in its main loop ... 6. Acquire mutex lock. 7. Run svc_monitors_run(), run svc_monitor_send_tcp_health_check__() or svc_monitor_send_udp_health_check(), which try to access IDL objects with outdated pointers. 8. Segmentation fault: Stack trace of thread 212406: __GI_strlen (libc.so.6) inet_pton (libc.so.6) ip_parse (ovn-controller) svc_monitor_send_tcp_health_check__.part.37 (ovn-controller) svc_monitor_send_health_check (ovn-controller) pinctrl_handler (ovn-controller) ovsthread_wrapper (ovn-controller) start_thread (libpthread.so.0) __clone (libc.so.6) This patch removes unsafe access to IDL objects from pinctrl thread. New svc_monitor structure members are used to store needed data. CC: Numan Siddique Acked-by: Ales Musil Fixes: 8be01f4a5329 ("Send service monitor health checks") Signed-off-by: Vladislav Odintsov --- v1 -> v2: - Addressed Ales's comment: replaced ip_parse() & ipv6_parse() with ip46_parse(). --- controller/pinctrl.c | 37 - 1 file changed, 16 insertions(+), 21 deletions(-) diff --git a/controller/pinctrl.c b/controller/pinctrl.c index 6a2c3dc68..0178ac6cc 100644 --- a/controller/pinctrl.c +++ b/controller/pinctrl.c @@ -7307,6 +7307,9 @@ struct svc_monitor { long long int timestamp; bool is_ip6; +struct eth_addr src_mac; +struct in6_addr src_ip; + long long int wait_time; long long int next_send_time; @@ -7501,6 +7504,9 @@ sync_svc_monitors(struct ovsdb_idl_txn *ovnsb_idl_txn, svc_mon->n_success = 0; svc_mon->n_failures = 0; +eth_addr_from_string(sb_svc_mon->src_mac, &svc_mon->src_mac); +ip46_parse(sb_svc_mon->src_ip, &svc_mon->src_ip); + hmap_insert(&svc_monitors_map, &svc_mon->hmap_node, hash); ovs_list_push_back(&svc_monitors, &svc_mon->list_node); changed = true; @@ -8259,19 +8265,14 @@ svc_monitor_send_tcp_health_check__(struct rconn *swconn, struct dp_packet packet; dp_packet_use_stub(&packet, packet_stub, sizeof packet_stub); -struct eth_addr eth_src; -eth_addr_from_string(svc_mon->sb_svc_mon->src_mac, ð_src); if (svc_mon->is_ip6) { -struct in6_addr ip6_src; -ipv6_parse(svc_mon->sb_svc_mon->src_ip, &ip6_src); -pinctrl_compose_ipv6(&packet, eth_src, svc_mon->ea, - &ip6_src, &svc_mon->ip, IPPROTO_TCP, +pinctrl_compose_ipv6(&packet, svc_mon->src_mac, svc_mon->ea, + &svc_mon->src_ip, &svc_mon->ip, IPPROTO_TCP, 63, TCP_HEADER_LEN); } else { -ovs_be32 ip4_src; -ip_parse(svc_mon->sb_svc_mon->src_ip, &ip4_src); -pinctrl_compose_ipv4(&packet, eth_src, svc_mon->ea, - ip4_src, in6_addr_get_mapped_ipv4(&svc_mon->ip), +pinctrl_compose_ipv4(&packet, svc_mon->src_mac, svc_mon->ea, + in6_addr_get_mapped_ipv4(&svc_mon->src_ip), + in6_addr_get_mapped_ipv4(&svc_mon->ip), IPPROTO_TCP, 63, TCP_HEADER_LEN); } @@ -8327,24 +8328,18 @@ svc_monitor_send_udp_health_check(struct rconn *swconn,
Re: [ovs-dev] [PATCH ovn] controller: Store src_mac, src_ip in svc_monitor struct.
Hi Ales! Thanks for the review. I've addressed requested changes in v2 with your Acked-by added: https://patchwork.ozlabs.org/project/ovn/patch/20240522121913.609332-1-odiv...@gmail.com/ regards, Vladislav Odintsov On 22.05.2024, 10:59, "dev on behalf of Ales Musil" wrote: On Tue, May 14, 2024 at 9:25 PM Vladislav Odintsov wrote: > These structure members are read in pinctrl_handler() in a separare thread. > This is unsafe: when IDL is re-connected or some IDL objects are freed > after svc_monitors list with svc_monitor structures, which point to > sbrec_service_monitor is initialized, sb_svc_mon structure property can > point to wrong address, which then leads to segmentation fault in > svc_monitor_send_tcp_health_check__() and > svc_monitor_send_udp_health_check() on accessing svc_mon->sb_svc_mon. > > Imagine situation: > > Main ovn-controller thread: > 1. Connects to SB and fills IDL with DB contents. > 2. run pinctrl_run() with pinctrl mutex and sync_svc_monitors() as a part >of it. > 3. Release mutex. > > ... > 4. Loss of OVNSB connection for any reason (network outage/inactivity probe >timeout/etc), start new main-loop iteration, re-initialize IDL in >ovsdb_idl_run() (which probably will destroy previous IDL objects). > ... > > pinctrl thread: > 5. Awake from poll_block(). > ... run new iteration in its main loop ... > 6. Aqcuire mutex lock. > 7. Run svc_monitors_run(), run svc_monitor_send_tcp_health_check__() or >svc_monitor_send_udp_health_check(), which try to access IDL objects >with outdated pointers. > > 8. Segmentation fault: > > Stack trace of thread 212406: > >> __GI_strlen (libc.so.6) > >> inet_pton (libc.so.6) > >> ip_parse (ovn-controller) > >> svc_monitor_send_tcp_health_check__.part.37 (ovn-controller) > >> svc_monitor_send_health_check (ovn-controller) > >> pinctrl_handler (ovn-controller) > >> ovsthread_wrapper (ovn-controller) > >> start_thread (libpthread.so.0) > >> __clone (libc.so.6) > > This patch removes unsafe access to IDL objects from pinctrl thread. > New svc_monitor structure members are used to store needed data. > > CC: Numan Siddique > Fixes: 8be01f4a5329 ("Send service monitor health checks") > Signed-off-by: Vladislav Odintsov > --- > Hi Vladislav, thank you for the fix. I have one comment down below. > controller/pinctrl.c | 43 ++- > 1 file changed, 22 insertions(+), 21 deletions(-) > > diff --git a/controller/pinctrl.c b/controller/pinctrl.c > index 6a2c3dc68..b843edb35 100644 > --- a/controller/pinctrl.c > +++ b/controller/pinctrl.c > @@ -7307,6 +7307,9 @@ struct svc_monitor { > long long int timestamp; > bool is_ip6; > > +struct eth_addr src_mac; > +struct in6_addr src_ip; > + > long long int wait_time; > long long int next_send_time; > > @@ -7501,6 +7504,15 @@ sync_svc_monitors(struct ovsdb_idl_txn > *ovnsb_idl_txn, > svc_mon->n_success = 0; > svc_mon->n_failures = 0; > > +eth_addr_from_string(sb_svc_mon->src_mac, &svc_mon->src_mac); > +if (is_ipv4) { > +ovs_be32 ip4_src; > +ip_parse(sb_svc_mon->src_ip, &ip4_src); > +svc_mon->src_ip = in6_addr_mapped_ipv4(ip4_src); > +} else { > +ipv6_parse(sb_svc_mon->src_ip, &svc_mon->src_ip); > +} > + > The whole if else block can be replaced with ip46_parse(). hmap_insert(&svc_monitors_map, &svc_mon->hmap_node, hash); > ovs_list_push_back(&svc_monitors, &svc_mon->list_node); > changed = true; > @@ -8259,19 +8271,14 @@ svc_monitor_send_tcp_health_check__(struct rconn > *swconn, > struct dp_packet packet; > dp_packet_use_stub(&packet, packet_stub, sizeof packet_stub); > > -struct eth_addr eth_src; > -eth_addr_from_string(svc_mon->sb_svc_mon->src_mac, ð_src); > if (svc_mon->is_ip6) { > -struct in6_addr ip6_src; > -ipv6_parse(svc_mon->sb_svc_mon->src_ip, &ip6_src); > -pinctrl_compose_ipv6(&packet, eth_src, svc_mon->ea, > -
[ovs-dev] [PATCH ovn v2] controller: Store src_mac, src_ip in svc_monitor struct.
These structure members are read in pinctrl_handler() in a separare thread. This is unsafe: when IDL is re-connected or some IDL objects are freed after svc_monitors list with svc_monitor structures, which point to sbrec_service_monitor is initialized, sb_svc_mon structure property can point to wrong address, which then leads to segmentation fault in svc_monitor_send_tcp_health_check__() and svc_monitor_send_udp_health_check() on accessing svc_mon->sb_svc_mon. Imagine situation: Main ovn-controller thread: 1. Connects to SB and fills IDL with DB contents. 2. run pinctrl_run() with pinctrl mutex and sync_svc_monitors() as a part of it. 3. Release mutex. ... 4. Loss of OVNSB connection for any reason (network outage/inactivity probe timeout/etc), start new main-loop iteration, re-initialize IDL in ovsdb_idl_run() (which probably will destroy previous IDL objects). ... pinctrl thread: 5. Awake from poll_block(). ... run new iteration in its main loop ... 6. Acquire mutex lock. 7. Run svc_monitors_run(), run svc_monitor_send_tcp_health_check__() or svc_monitor_send_udp_health_check(), which try to access IDL objects with outdated pointers. 8. Segmentation fault: Stack trace of thread 212406: __GI_strlen (libc.so.6) inet_pton (libc.so.6) ip_parse (ovn-controller) svc_monitor_send_tcp_health_check__.part.37 (ovn-controller) svc_monitor_send_health_check (ovn-controller) pinctrl_handler (ovn-controller) ovsthread_wrapper (ovn-controller) start_thread (libpthread.so.0) __clone (libc.so.6) This patch removes unsafe access to IDL objects from pinctrl thread. New svc_monitor structure members are used to store needed data. CC: Numan Siddique Acked-by: Ales Musil Fixes: 8be01f4a5329 ("Send service monitor health checks") Signed-off-by: Vladislav Odintsov --- v1 -> v2: - Addressed Ales's comment: replaced ip_parse() & ipv6_parse() with ip46_parse(). --- controller/pinctrl.c | 37 - 1 file changed, 16 insertions(+), 21 deletions(-) diff --git a/controller/pinctrl.c b/controller/pinctrl.c index 6a2c3dc68..0178ac6cc 100644 --- a/controller/pinctrl.c +++ b/controller/pinctrl.c @@ -7307,6 +7307,9 @@ struct svc_monitor { long long int timestamp; bool is_ip6; +struct eth_addr src_mac; +struct in6_addr src_ip; + long long int wait_time; long long int next_send_time; @@ -7501,6 +7504,9 @@ sync_svc_monitors(struct ovsdb_idl_txn *ovnsb_idl_txn, svc_mon->n_success = 0; svc_mon->n_failures = 0; +eth_addr_from_string(sb_svc_mon->src_mac, &svc_mon->src_mac); +ip46_parse(sb_svc_mon->src_ip, &svc_mon->src_ip); + hmap_insert(&svc_monitors_map, &svc_mon->hmap_node, hash); ovs_list_push_back(&svc_monitors, &svc_mon->list_node); changed = true; @@ -8259,19 +8265,14 @@ svc_monitor_send_tcp_health_check__(struct rconn *swconn, struct dp_packet packet; dp_packet_use_stub(&packet, packet_stub, sizeof packet_stub); -struct eth_addr eth_src; -eth_addr_from_string(svc_mon->sb_svc_mon->src_mac, ð_src); if (svc_mon->is_ip6) { -struct in6_addr ip6_src; -ipv6_parse(svc_mon->sb_svc_mon->src_ip, &ip6_src); -pinctrl_compose_ipv6(&packet, eth_src, svc_mon->ea, - &ip6_src, &svc_mon->ip, IPPROTO_TCP, +pinctrl_compose_ipv6(&packet, svc_mon->src_mac, svc_mon->ea, + &svc_mon->src_ip, &svc_mon->ip, IPPROTO_TCP, 63, TCP_HEADER_LEN); } else { -ovs_be32 ip4_src; -ip_parse(svc_mon->sb_svc_mon->src_ip, &ip4_src); -pinctrl_compose_ipv4(&packet, eth_src, svc_mon->ea, - ip4_src, in6_addr_get_mapped_ipv4(&svc_mon->ip), +pinctrl_compose_ipv4(&packet, svc_mon->src_mac, svc_mon->ea, + in6_addr_get_mapped_ipv4(&svc_mon->src_ip), + in6_addr_get_mapped_ipv4(&svc_mon->ip), IPPROTO_TCP, 63, TCP_HEADER_LEN); } @@ -8327,24 +8328,18 @@ svc_monitor_send_udp_health_check(struct rconn *swconn, struct svc_monitor *svc_mon, ovs_be16 udp_src) { -struct eth_addr eth_src; -eth_addr_from_string(svc_mon->sb_svc_mon->src_mac, ð_src); - uint64_t packet_stub[128 / 8]; struct dp_packet packet; dp_packet_use_stub(&packet, packet_stub, sizeof packet_stub); if (svc_mon->is_ip6) { -struct in6_addr ip6_src; -ipv6_parse(svc_mon->sb_svc_mon->src_ip, &ip6_src); -pinctrl_compose_ipv6(&packet, eth_src, svc_mon->ea, -
Re: [ovs-dev] [PATCH ovn] ovn-ctl: Support for --config-file ovsdb-server option.
Thanks Numan! regards, Vladislav Odintsov > On 15 May 2024, at 23:55, Numan Siddique wrote: > > On Fri, May 3, 2024 at 2:05 AM Ales Musil wrote: >> >>> On Tue, Apr 23, 2024 at 6:43 PM Vladislav Odintsov >>> wrote: >>> >>> Since OVS 3.3.0 ovsdb-server accepts databases and remotes configuration >>> via JSON text file. This patch adds support for such option. >>> >>> Signed-off-by: Vladislav Odintsov > > Thanks for the patch. > > I applied this with the below changes > > - > diff --git a/utilities/ovn-ctl b/utilities/ovn-ctl > index fd1ae12567..a4f410e4f7 100755 > --- a/utilities/ovn-ctl > +++ b/utilities/ovn-ctl > @@ -1242,8 +1242,7 @@ File location options: > --db-sb-relay-sock=SOCKET OVN_IC_Northbound db socket (default: > $DB_SB_RELAY_SOCK) > --db-sb-relay-pidfile=FILE OVN_Southbound relay db pidfile > (default: $DB_SB_RELAY_CTRL_PIDFILE) > --db-sb-relay-ctrl-sock=SOCKET OVN_Southbound relay db control > socket (default: $DB_SB_RELAY_CTRL_SOCK) > - --db-sb-relay-config-file=FILE OVN_IC_Northbound ovsdb-server > configuration file > - Mutually exclusive with > --db-ic-nb-use-remote-in-db=yes. > + --db-sb-relay-config-file=FILE OVN_Southbound relay ovsdb-server > configuration file. Oops, copy-paste typo. > --ovn-sb-relay-db-ssl-key=KEY OVN_Southbound DB relay SSL private key file > --ovn-sb-relay-db-ssl-cert=CERT OVN_Southbound DB relay SSL certificate file > --ovn-sb-relay-db-ssl-ca-cert=CERT OVN OVN_Southbound DB relay SSL > CA certificate file > diff --git a/utilities/ovn-ctl.8.xml b/utilities/ovn-ctl.8.xml > index c0fbb0792d..4f21ba4ea3 100644 > --- a/utilities/ovn-ctl.8.xml > +++ b/utilities/ovn-ctl.8.xml > @@ -86,6 +86,11 @@ > --db-ic-sb-schema=FILE > --db-ic-sb-create-insecure-remote=yes|no > --db-ic-nb-create-insecure-remote=yes|no > +--db-nb-config-file=FILE > +--db-sb-config-file=FILE > +--db-ic-nb-config-file=FILE > +--db-ic-sb-config-file=FILE > +--db-sb-relay-config-file=FILE > --ovn-controller-ssl-key=KEY > --ovn-controller-ssl-cert=CERT > --ovn-controller-ssl-ca-cert=CERT > > - > > > Numan > >>> --- >>> NEWS | 1 + >>> utilities/ovn-ctl | 39 +++ >>> 2 files changed, 36 insertions(+), 4 deletions(-) >>> >>> diff --git a/NEWS b/NEWS >>> index 9adf6a31c..39ea88d78 100644 >>> --- a/NEWS >>> +++ b/NEWS >>> @@ -16,6 +16,7 @@ Post v24.03.0 >>> - Remove "ovn-set-local-ip" config option from vswitchd >>> external-ids, the option is no longer needed as it became effectively >>> "true" for all scenarios. >>> + - Add support for ovsdb-server `--config-file` option in ovn-ctl. >>> >>> OVN v24.03.0 - 01 Mar 2024 >>> -- >>> diff --git a/utilities/ovn-ctl b/utilities/ovn-ctl >>> index dae5e22f4..fd1ae1256 100755 >>> --- a/utilities/ovn-ctl >>> +++ b/utilities/ovn-ctl >>> @@ -169,6 +169,7 @@ start_ovsdb__() { >>> local sync_from_port >>> local file >>> local schema >>> +local config_file >>> local logfile >>> local log >>> local sock >>> @@ -199,6 +200,7 @@ start_ovsdb__() { >>> eval sync_from_port=\$DB_${DB}_SYNC_FROM_PORT >>> eval file=\$DB_${DB}_FILE >>> eval schema=\$DB_${DB}_SCHEMA >>> +eval config_file=\$DB_${DB}_CONFIG_FILE >>> eval logfile=\$OVN_${DB}_LOGFILE >>> eval log=\$OVN_${DB}_LOG >>> eval sock=\$DB_${DB}_SOCK >>> @@ -281,7 +283,12 @@ $cluster_remote_port >>> >>> set ovsdb-server >>> set "$@" $log --log-file=$logfile >>> -set "$@" --remote=punix:$sock --pidfile=$db_pid_file >>> +set "$@" --pidfile=$db_pid_file >>> +if test X"$config_file" == X; then >>> +set "$@" --remote=punix:$sock >>> +else >>> +set "$@" --config-file=$config_file >>> +fi >>> set "$@" --unixctl=$ctrl_sock >>> >>> [ "$OVN_USER" != "" ] && set "$@" --user "$OVN_USER" >>> @@ -297,7 +304,7 @@ $cluster_remote_port >>> set exec "$@" >>> fi >>>
Re: [ovs-dev] [PATCH ovn] controller: Store src_mac, src_ip in svc_monitor struct.
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2024-April/413198.html > On 14 May 2024, at 22:25, Vladislav Odintsov wrote: > > These structure members are read in pinctrl_handler() in a separare thread. > This is unsafe: when IDL is re-connected or some IDL objects are freed > after svc_monitors list with svc_monitor structures, which point to > sbrec_service_monitor is initialized, sb_svc_mon structure property can > point to wrong address, which then leads to segmentation fault in > svc_monitor_send_tcp_health_check__() and > svc_monitor_send_udp_health_check() on accessing svc_mon->sb_svc_mon. > > Imagine situation: > > Main ovn-controller thread: > 1. Connects to SB and fills IDL with DB contents. > 2. run pinctrl_run() with pinctrl mutex and sync_svc_monitors() as a part > of it. > 3. Release mutex. > > ... > 4. Loss of OVNSB connection for any reason (network outage/inactivity probe > timeout/etc), start new main-loop iteration, re-initialize IDL in > ovsdb_idl_run() (which probably will destroy previous IDL objects). > ... > > pinctrl thread: > 5. Awake from poll_block(). > ... run new iteration in its main loop ... > 6. Aqcuire mutex lock. > 7. Run svc_monitors_run(), run svc_monitor_send_tcp_health_check__() or > svc_monitor_send_udp_health_check(), which try to access IDL objects > with outdated pointers. > > 8. Segmentation fault: > > Stack trace of thread 212406: >>> __GI_strlen (libc.so.6) >>> inet_pton (libc.so.6) >>> ip_parse (ovn-controller) >>> svc_monitor_send_tcp_health_check__.part.37 (ovn-controller) >>> svc_monitor_send_health_check (ovn-controller) >>> pinctrl_handler (ovn-controller) >>> ovsthread_wrapper (ovn-controller) >>> start_thread (libpthread.so.0) >>> __clone (libc.so.6) > > This patch removes unsafe access to IDL objects from pinctrl thread. > New svc_monitor structure members are used to store needed data. > > CC: Numan Siddique > Fixes: 8be01f4a5329 ("Send service monitor health checks") > Signed-off-by: Vladislav Odintsov > --- > controller/pinctrl.c | 43 ++- > 1 file changed, 22 insertions(+), 21 deletions(-) > > diff --git a/controller/pinctrl.c b/controller/pinctrl.c > index 6a2c3dc68..b843edb35 100644 > --- a/controller/pinctrl.c > +++ b/controller/pinctrl.c > @@ -7307,6 +7307,9 @@ struct svc_monitor { > long long int timestamp; > bool is_ip6; > > +struct eth_addr src_mac; > +struct in6_addr src_ip; > + > long long int wait_time; > long long int next_send_time; > > @@ -7501,6 +7504,15 @@ sync_svc_monitors(struct ovsdb_idl_txn *ovnsb_idl_txn, > svc_mon->n_success = 0; > svc_mon->n_failures = 0; > > +eth_addr_from_string(sb_svc_mon->src_mac, &svc_mon->src_mac); > +if (is_ipv4) { > +ovs_be32 ip4_src; > +ip_parse(sb_svc_mon->src_ip, &ip4_src); > +svc_mon->src_ip = in6_addr_mapped_ipv4(ip4_src); > +} else { > +ipv6_parse(sb_svc_mon->src_ip, &svc_mon->src_ip); > +} > + > hmap_insert(&svc_monitors_map, &svc_mon->hmap_node, hash); > ovs_list_push_back(&svc_monitors, &svc_mon->list_node); > changed = true; > @@ -8259,19 +8271,14 @@ svc_monitor_send_tcp_health_check__(struct rconn > *swconn, > struct dp_packet packet; > dp_packet_use_stub(&packet, packet_stub, sizeof packet_stub); > > -struct eth_addr eth_src; > -eth_addr_from_string(svc_mon->sb_svc_mon->src_mac, ð_src); > if (svc_mon->is_ip6) { > -struct in6_addr ip6_src; > -ipv6_parse(svc_mon->sb_svc_mon->src_ip, &ip6_src); > -pinctrl_compose_ipv6(&packet, eth_src, svc_mon->ea, > - &ip6_src, &svc_mon->ip, IPPROTO_TCP, > +pinctrl_compose_ipv6(&packet, svc_mon->src_mac, svc_mon->ea, > + &svc_mon->src_ip, &svc_mon->ip, IPPROTO_TCP, > 63, TCP_HEADER_LEN); > } else { > -ovs_be32 ip4_src; > -ip_parse(svc_mon->sb_svc_mon->src_ip, &ip4_src); > -pinctrl_compose_ipv4(&packet, eth_src, svc_mon->ea, > - ip4_src, in6_addr_get_mapped_ipv4(&svc_mon->ip), > +pinctrl_compose_ipv4(&packet, svc_mon->src_mac, svc_mon->ea, > + in6_addr_get_mapped_ipv4(&svc_mon->src_ip), > +
[ovs-dev] [PATCH ovn] controller: Store src_mac, src_ip in svc_monitor struct.
These structure members are read in pinctrl_handler() in a separare thread. This is unsafe: when IDL is re-connected or some IDL objects are freed after svc_monitors list with svc_monitor structures, which point to sbrec_service_monitor is initialized, sb_svc_mon structure property can point to wrong address, which then leads to segmentation fault in svc_monitor_send_tcp_health_check__() and svc_monitor_send_udp_health_check() on accessing svc_mon->sb_svc_mon. Imagine situation: Main ovn-controller thread: 1. Connects to SB and fills IDL with DB contents. 2. run pinctrl_run() with pinctrl mutex and sync_svc_monitors() as a part of it. 3. Release mutex. ... 4. Loss of OVNSB connection for any reason (network outage/inactivity probe timeout/etc), start new main-loop iteration, re-initialize IDL in ovsdb_idl_run() (which probably will destroy previous IDL objects). ... pinctrl thread: 5. Awake from poll_block(). ... run new iteration in its main loop ... 6. Aqcuire mutex lock. 7. Run svc_monitors_run(), run svc_monitor_send_tcp_health_check__() or svc_monitor_send_udp_health_check(), which try to access IDL objects with outdated pointers. 8. Segmentation fault: Stack trace of thread 212406: >> __GI_strlen (libc.so.6) >> inet_pton (libc.so.6) >> ip_parse (ovn-controller) >> svc_monitor_send_tcp_health_check__.part.37 (ovn-controller) >> svc_monitor_send_health_check (ovn-controller) >> pinctrl_handler (ovn-controller) >> ovsthread_wrapper (ovn-controller) >> start_thread (libpthread.so.0) >> __clone (libc.so.6) This patch removes unsafe access to IDL objects from pinctrl thread. New svc_monitor structure members are used to store needed data. CC: Numan Siddique Fixes: 8be01f4a5329 ("Send service monitor health checks") Signed-off-by: Vladislav Odintsov --- controller/pinctrl.c | 43 ++- 1 file changed, 22 insertions(+), 21 deletions(-) diff --git a/controller/pinctrl.c b/controller/pinctrl.c index 6a2c3dc68..b843edb35 100644 --- a/controller/pinctrl.c +++ b/controller/pinctrl.c @@ -7307,6 +7307,9 @@ struct svc_monitor { long long int timestamp; bool is_ip6; +struct eth_addr src_mac; +struct in6_addr src_ip; + long long int wait_time; long long int next_send_time; @@ -7501,6 +7504,15 @@ sync_svc_monitors(struct ovsdb_idl_txn *ovnsb_idl_txn, svc_mon->n_success = 0; svc_mon->n_failures = 0; +eth_addr_from_string(sb_svc_mon->src_mac, &svc_mon->src_mac); +if (is_ipv4) { +ovs_be32 ip4_src; +ip_parse(sb_svc_mon->src_ip, &ip4_src); +svc_mon->src_ip = in6_addr_mapped_ipv4(ip4_src); +} else { +ipv6_parse(sb_svc_mon->src_ip, &svc_mon->src_ip); +} + hmap_insert(&svc_monitors_map, &svc_mon->hmap_node, hash); ovs_list_push_back(&svc_monitors, &svc_mon->list_node); changed = true; @@ -8259,19 +8271,14 @@ svc_monitor_send_tcp_health_check__(struct rconn *swconn, struct dp_packet packet; dp_packet_use_stub(&packet, packet_stub, sizeof packet_stub); -struct eth_addr eth_src; -eth_addr_from_string(svc_mon->sb_svc_mon->src_mac, ð_src); if (svc_mon->is_ip6) { -struct in6_addr ip6_src; -ipv6_parse(svc_mon->sb_svc_mon->src_ip, &ip6_src); -pinctrl_compose_ipv6(&packet, eth_src, svc_mon->ea, - &ip6_src, &svc_mon->ip, IPPROTO_TCP, +pinctrl_compose_ipv6(&packet, svc_mon->src_mac, svc_mon->ea, + &svc_mon->src_ip, &svc_mon->ip, IPPROTO_TCP, 63, TCP_HEADER_LEN); } else { -ovs_be32 ip4_src; -ip_parse(svc_mon->sb_svc_mon->src_ip, &ip4_src); -pinctrl_compose_ipv4(&packet, eth_src, svc_mon->ea, - ip4_src, in6_addr_get_mapped_ipv4(&svc_mon->ip), +pinctrl_compose_ipv4(&packet, svc_mon->src_mac, svc_mon->ea, + in6_addr_get_mapped_ipv4(&svc_mon->src_ip), + in6_addr_get_mapped_ipv4(&svc_mon->ip), IPPROTO_TCP, 63, TCP_HEADER_LEN); } @@ -8327,24 +8334,18 @@ svc_monitor_send_udp_health_check(struct rconn *swconn, struct svc_monitor *svc_mon, ovs_be16 udp_src) { -struct eth_addr eth_src; -eth_addr_from_string(svc_mon->sb_svc_mon->src_mac, ð_src); - uint64_t packet_stub[128 / 8]; struct dp_packet packet; dp_packet_use_stub(&packet, packet_stub, sizeof packet_stub); if (svc_mon->is_ip6) { -struct in6_addr ip6_src; -
[ovs-dev] [PATCH ovn v5 1/2] northd: Make `vxlan_mode` a global variable.
This simplifies code and subsequent commit to explicitely disable VXLAN mode is based on these changes. Also "VXLAN mode" term is introduced in ovn-architecture man page. Signed-off-by: Vladislav Odintsov --- northd/en-global-config.c | 4 +- northd/northd.c | 85 +-- northd/northd.h | 5 ++- ovn-architecture.7.xml| 10 ++--- 4 files changed, 47 insertions(+), 57 deletions(-) diff --git a/northd/en-global-config.c b/northd/en-global-config.c index 28c78a12c..873649a89 100644 --- a/northd/en-global-config.c +++ b/northd/en-global-config.c @@ -115,8 +115,8 @@ en_global_config_run(struct engine_node *node , void *data) config_data->svc_monitor_mac); } -char *max_tunid = xasprintf("%d", -get_ovn_max_dp_key_local(sbrec_chassis_table)); +init_vxlan_mode(sbrec_chassis_table); +char *max_tunid = xasprintf("%d", get_ovn_max_dp_key_local()); smap_replace(options, "max_tunid", max_tunid); free(max_tunid); diff --git a/northd/northd.c b/northd/northd.c index 133cddb69..0e0ae24db 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -90,6 +90,10 @@ static bool use_ct_inv_match = true; */ static bool default_acl_drop; +/* If this option is 'true' northd will use limited 24-bit space for datapath + * and ports tunnel key allocation (12 bits for each instead of default 16). */ +static bool vxlan_mode; + #define MAX_OVN_TAGS 4096 @@ -881,40 +885,40 @@ join_datapaths(const struct nbrec_logical_switch_table *nbrec_ls_table, } } -static bool -is_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) +void +init_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) { const struct sbrec_chassis *chassis; SBREC_CHASSIS_TABLE_FOR_EACH (chassis, sbrec_chassis_table) { for (int i = 0; i < chassis->n_encaps; i++) { if (!strcmp(chassis->encaps[i]->type, "vxlan")) { -return true; +vxlan_mode = true; +return; } } } -return false; +vxlan_mode = false; } uint32_t -get_ovn_max_dp_key_local(const struct sbrec_chassis_table *sbrec_chassis_table) +get_ovn_max_dp_key_local(void) { -if (is_vxlan_mode(sbrec_chassis_table)) { -/* OVN_MAX_DP_GLOBAL_NUM doesn't apply for vxlan mode. */ +if (vxlan_mode) { +/* OVN_MAX_DP_GLOBAL_NUM doesn't apply for VXLAN mode. */ return OVN_MAX_DP_VXLAN_KEY; } return OVN_MAX_DP_KEY - OVN_MAX_DP_GLOBAL_NUM; } static void -ovn_datapath_allocate_key(const struct sbrec_chassis_table *sbrec_ch_table, - struct hmap *datapaths, struct hmap *dp_tnlids, +ovn_datapath_allocate_key(struct hmap *datapaths, struct hmap *dp_tnlids, struct ovn_datapath *od, uint32_t *hint) { if (!od->tunnel_key) { od->tunnel_key = ovn_allocate_tnlid(dp_tnlids, "datapath", -OVN_MIN_DP_KEY_LOCAL, -get_ovn_max_dp_key_local(sbrec_ch_table), -hint); +OVN_MIN_DP_KEY_LOCAL, +get_ovn_max_dp_key_local(), +hint); if (!od->tunnel_key) { if (od->sb) { sbrec_datapath_binding_delete(od->sb); @@ -927,7 +931,6 @@ ovn_datapath_allocate_key(const struct sbrec_chassis_table *sbrec_ch_table, static void ovn_datapath_assign_requested_tnl_id( -const struct sbrec_chassis_table *sbrec_chassis_table, struct hmap *dp_tnlids, struct ovn_datapath *od) { const struct smap *other_config = (od->nbs @@ -936,8 +939,7 @@ ovn_datapath_assign_requested_tnl_id( uint32_t tunnel_key = smap_get_int(other_config, "requested-tnl-key", 0); if (tunnel_key) { const char *interconn_ts = smap_get(other_config, "interconn-ts"); -if (!interconn_ts && is_vxlan_mode(sbrec_chassis_table) && -tunnel_key >= 1 << 12) { +if (!interconn_ts && vxlan_mode && tunnel_key >= 1 << 12) { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); VLOG_WARN_RL(&rl, "Tunnel key %"PRIu32" for datapath %s is " "incompatible with VXLAN", tunnel_key, @@ -985,7 +987,6 @@ build_datapaths(struct ovsdb_idl_txn *ovnsb_txn, const struct nbrec_logical_switch_table *nbrec_ls_table, const struct nbrec_logical_router_table *nbrec_lr_table, const struct sbrec_datapath_binding_table *sbrec_dp_table, -const struct sbrec_chassis_table *sbrec_chassis
[ovs-dev] [PATCH ovn v5 2/2] northd: Add support for disabling vxlan mode.
Commit [1] introduced a "VXLAN mode" concept. It brought a limitation for available tunnel IDs because of lack of space in VXLAN VNI. In VXLAN mode OVN is limited by 4095 datapaths (LRs or non-transit LSs) and 2047 logical ports per datapath. Prior to this patch VXLAN mode was enabled automatically if at least one chassis had encap of VXLAN type. In scenarios where one want to use VXLAN only for HW VTEP (RAMP) switch, such limitation makes no sence. This patch adds support for explicit disabling of VXLAN mode via Northbound database. 1: https://github.com/ovn-org/ovn/commit/b07f1bc3d068 Acked-By: Ihar Hrachyshka Fixes: b07f1bc3d068 ("Add VXLAN support for non-VTEP datapath bindings") Signed-off-by: Vladislav Odintsov --- NEWS | 4 northd/en-global-config.c | 7 ++- northd/northd.c | 10 -- northd/northd.h | 3 ++- ovn-architecture.7.xml| 6 ++ ovn-nb.xml| 10 ++ tests/ovn-northd.at | 29 + 7 files changed, 65 insertions(+), 4 deletions(-) diff --git a/NEWS b/NEWS index 3b5e93dc9..007b27f3d 100644 --- a/NEWS +++ b/NEWS @@ -17,6 +17,10 @@ Post v24.03.0 external-ids, the option is no longer needed as it became effectively "true" for all scenarios. - Added DHCPv4 relay support. + - Added new global config option NB_Global:options:disable_vxlan_mode to +extend available tunnel IDs space for datapaths from 4095 to 16711680 +when running in "VXLAN mode". For more details see man ovn-nb(5) for +mentioned option. OVN v24.03.0 - 01 Mar 2024 -- diff --git a/northd/en-global-config.c b/northd/en-global-config.c index 873649a89..f5e2a8154 100644 --- a/northd/en-global-config.c +++ b/northd/en-global-config.c @@ -115,7 +115,7 @@ en_global_config_run(struct engine_node *node , void *data) config_data->svc_monitor_mac); } -init_vxlan_mode(sbrec_chassis_table); +init_vxlan_mode(&nb->options, sbrec_chassis_table); char *max_tunid = xasprintf("%d", get_ovn_max_dp_key_local()); smap_replace(options, "max_tunid", max_tunid); free(max_tunid); @@ -533,6 +533,11 @@ check_nb_options_out_of_sync(const struct nbrec_nb_global *nb, return true; } +if (config_out_of_sync(&nb->options, &config_data->nb_options, + "disable_vxlan_mode", false)) { +return true; +} + return false; } diff --git a/northd/northd.c b/northd/northd.c index 0e0ae24db..7bdffe531 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -886,8 +886,14 @@ join_datapaths(const struct nbrec_logical_switch_table *nbrec_ls_table, } void -init_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) +init_vxlan_mode(const struct smap *nb_options, +const struct sbrec_chassis_table *sbrec_chassis_table) { +if (smap_get_bool(nb_options, "disable_vxlan_mode", false)) { +vxlan_mode = false; +return; +} + const struct sbrec_chassis *chassis; SBREC_CHASSIS_TABLE_FOR_EACH (chassis, sbrec_chassis_table) { for (int i = 0; i < chassis->n_encaps; i++) { @@ -17596,7 +17602,7 @@ ovnnb_db_run(struct northd_input *input_data, use_common_zone = smap_get_bool(input_data->nb_options, "use_common_zone", false); -init_vxlan_mode(input_data->sbrec_chassis_table); +init_vxlan_mode(input_data->nb_options, input_data->sbrec_chassis_table); build_datapaths(ovnsb_txn, input_data->nbrec_logical_switch_table, diff --git a/northd/northd.h b/northd/northd.h index be480003e..d0322e621 100644 --- a/northd/northd.h +++ b/northd/northd.h @@ -792,7 +792,8 @@ lr_has_multiple_gw_ports(const struct ovn_datapath *od) } void -init_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table); +init_vxlan_mode(const struct smap *nb_options, +const struct sbrec_chassis_table *sbrec_chassis_table); uint32_t get_ovn_max_dp_key_local(void); diff --git a/ovn-architecture.7.xml b/ovn-architecture.7.xml index 3ecb58933..f4eae340c 100644 --- a/ovn-architecture.7.xml +++ b/ovn-architecture.7.xml @@ -2920,4 +2920,10 @@ the future, gateways that do not support encapsulations with large amounts of metadata may continue to have a reduced feature set. + +VXLAN mode is recommended to be disabled if VXLAN encap at +hypervisors is needed only to support HW VTEP L2 Gateway functionality. +See man ovn-nb(5) for table NB_Global column +options key disable_vxlan_mode for more details. + diff --git a/ovn-nb.xml b/ovn-nb.xml index 5cb6ba640..84f1e07b6 100644 --- a/ovn-nb.xml +++ b/ovn-nb.xml @@ -381,6 +381,16 @@ of SB changes would be very noticeable. + +
[ovs-dev] [PATCH ovn v5 0/2] Add support to disable VXLAN mode.
v5: - Addressed Ihar's review comments: 1. fixed errors after incorrect conflicts solving on rebase; 2. changed VXLAN mode naming to capitalized; 3. clarified VXLAN mode in ovn-architecture man page. v4: - Addressed Dumitru's and Ihar's review comments; - single patch was split into two: 1. function call replaced with a global variable `vxlan_mode`; 2. introduced `disable_vxlan_mode` configuration knob; - rebased onto latest main branch. v3: - Removed accidental ovs submodule change. v2: - Added NEWS item. Vladislav Odintsov (2): northd: Make `vxlan_mode` a global variable. northd: Add support for disabling vxlan mode. NEWS | 4 ++ northd/en-global-config.c | 9 +++- northd/northd.c | 91 ++- northd/northd.h | 6 ++- ovn-architecture.7.xml| 16 --- ovn-nb.xml| 10 + tests/ovn-northd.at | 29 + 7 files changed, 108 insertions(+), 57 deletions(-) -- 2.44.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v4 1/2] northd: Make `vxlan_mode` a global variable.
Hi Ihar, thanks for your review! > On 2 May 2024, at 18:11, Ihar Hrachyshka wrote: > > On Thu, May 2, 2024 at 5:51 AM Vladislav Odintsov <mailto:odiv...@gmail.com>> wrote: > >> This simplifies code and subsequent commit to explicitely disable vxlan >> > > I personally find it debatable that moving from explicit dependency through > a function argument to implicit dependency through a global variable is a > simplification. But I will leave others to chime in. > Here I wanted to mention that in many pieces of code argument which was passed just to find VXLAN encaps was removed and with less code/arguments it looks more simple. > >> mode is based on these changes. >> >> Also `vxlan mode` term is introduced in ovn-architecture man page. >> > > Should the mode name keep VXLAN capitalized? > Dunno. This was inspired by writing in initial commit [1]. I’m fine with both writings. > >> >> Signed-off-by: Vladislav Odintsov >> --- >> northd/en-global-config.c | 4 +- >> northd/northd.c | 94 --- >> northd/northd.h | 5 ++- >> ovn-architecture.7.xml| 11 +++-- >> 4 files changed, 50 insertions(+), 64 deletions(-) >> >> diff --git a/northd/en-global-config.c b/northd/en-global-config.c >> index 28c78a12c..873649a89 100644 >> --- a/northd/en-global-config.c >> +++ b/northd/en-global-config.c >> @@ -115,8 +115,8 @@ en_global_config_run(struct engine_node *node , void >> *data) >> config_data->svc_monitor_mac); >> } >> >> -char *max_tunid = xasprintf("%d", >> -get_ovn_max_dp_key_local(sbrec_chassis_table)); >> +init_vxlan_mode(sbrec_chassis_table); >> +char *max_tunid = xasprintf("%d", get_ovn_max_dp_key_local()); >> smap_replace(options, "max_tunid", max_tunid); >> free(max_tunid); >> >> diff --git a/northd/northd.c b/northd/northd.c >> index 5e12fd1e8..b54219a85 100644 >> --- a/northd/northd.c >> +++ b/northd/northd.c >> @@ -90,6 +90,10 @@ static bool use_ct_inv_match = true; >> */ >> static bool default_acl_drop; >> >> +/* If this option is 'true' northd will use limited 24-bit space for >> datapath >> + * and ports tunnel key allocation (12 bits for each instead of default >> 16). */ >> +static bool vxlan_mode; >> + >> #define MAX_OVN_TAGS 4096 >> >> >> @@ -881,24 +885,25 @@ join_datapaths(const struct >> nbrec_logical_switch_table *nbrec_ls_table, >> } >> } >> >> -static bool >> -is_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) >> +void >> +init_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) >> { >> const struct sbrec_chassis *chassis; >> SBREC_CHASSIS_TABLE_FOR_EACH (chassis, sbrec_chassis_table) { >> for (int i = 0; i < chassis->n_encaps; i++) { >> if (!strcmp(chassis->encaps[i]->type, "vxlan")) { >> -return true; >> +vxlan_mode = true; >> +return; >> } >> } >> } >> -return false; >> +vxlan_mode = false; >> } >> >> uint32_t >> -get_ovn_max_dp_key_local(const struct sbrec_chassis_table >> *sbrec_chassis_table) >> +get_ovn_max_dp_key_local(void) >> { >> -if (is_vxlan_mode(sbrec_chassis_table)) { >> +if (vxlan_mode) { >> /* OVN_MAX_DP_GLOBAL_NUM doesn't apply for vxlan mode. */ >> return OVN_MAX_DP_VXLAN_KEY; >> } >> @@ -906,15 +911,14 @@ get_ovn_max_dp_key_local(const struct >> sbrec_chassis_table *sbrec_chassis_table) >> } >> >> static void >> -ovn_datapath_allocate_key(const struct sbrec_chassis_table >> *sbrec_ch_table, >> - struct hmap *datapaths, struct hmap *dp_tnlids, >> +ovn_datapath_allocate_key(struct hmap *datapaths, struct hmap *dp_tnlids, >> struct ovn_datapath *od, uint32_t *hint) >> { >> if (!od->tunnel_key) { >> od->tunnel_key = ovn_allocate_tnlid(dp_tnlids, "datapath", >> -OVN_MIN_DP_KEY_LOCAL, >> - >> get_ovn_max_dp_key_local(sbrec_ch_table), >> -hint); >> +OVN_MIN_DP_KEY_LOCAL, >> +get_ovn_max_dp_key_local(), >> +
[ovs-dev] [PATCH ovn v4 1/2] northd: Make `vxlan_mode` a global variable.
This simplifies code and subsequent commit to explicitely disable vxlan mode is based on these changes. Also `vxlan mode` term is introduced in ovn-architecture man page. Signed-off-by: Vladislav Odintsov --- northd/en-global-config.c | 4 +- northd/northd.c | 94 --- northd/northd.h | 5 ++- ovn-architecture.7.xml| 11 +++-- 4 files changed, 50 insertions(+), 64 deletions(-) diff --git a/northd/en-global-config.c b/northd/en-global-config.c index 28c78a12c..873649a89 100644 --- a/northd/en-global-config.c +++ b/northd/en-global-config.c @@ -115,8 +115,8 @@ en_global_config_run(struct engine_node *node , void *data) config_data->svc_monitor_mac); } -char *max_tunid = xasprintf("%d", -get_ovn_max_dp_key_local(sbrec_chassis_table)); +init_vxlan_mode(sbrec_chassis_table); +char *max_tunid = xasprintf("%d", get_ovn_max_dp_key_local()); smap_replace(options, "max_tunid", max_tunid); free(max_tunid); diff --git a/northd/northd.c b/northd/northd.c index 5e12fd1e8..b54219a85 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -90,6 +90,10 @@ static bool use_ct_inv_match = true; */ static bool default_acl_drop; +/* If this option is 'true' northd will use limited 24-bit space for datapath + * and ports tunnel key allocation (12 bits for each instead of default 16). */ +static bool vxlan_mode; + #define MAX_OVN_TAGS 4096 @@ -881,24 +885,25 @@ join_datapaths(const struct nbrec_logical_switch_table *nbrec_ls_table, } } -static bool -is_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) +void +init_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) { const struct sbrec_chassis *chassis; SBREC_CHASSIS_TABLE_FOR_EACH (chassis, sbrec_chassis_table) { for (int i = 0; i < chassis->n_encaps; i++) { if (!strcmp(chassis->encaps[i]->type, "vxlan")) { -return true; +vxlan_mode = true; +return; } } } -return false; +vxlan_mode = false; } uint32_t -get_ovn_max_dp_key_local(const struct sbrec_chassis_table *sbrec_chassis_table) +get_ovn_max_dp_key_local(void) { -if (is_vxlan_mode(sbrec_chassis_table)) { +if (vxlan_mode) { /* OVN_MAX_DP_GLOBAL_NUM doesn't apply for vxlan mode. */ return OVN_MAX_DP_VXLAN_KEY; } @@ -906,15 +911,14 @@ get_ovn_max_dp_key_local(const struct sbrec_chassis_table *sbrec_chassis_table) } static void -ovn_datapath_allocate_key(const struct sbrec_chassis_table *sbrec_ch_table, - struct hmap *datapaths, struct hmap *dp_tnlids, +ovn_datapath_allocate_key(struct hmap *datapaths, struct hmap *dp_tnlids, struct ovn_datapath *od, uint32_t *hint) { if (!od->tunnel_key) { od->tunnel_key = ovn_allocate_tnlid(dp_tnlids, "datapath", -OVN_MIN_DP_KEY_LOCAL, -get_ovn_max_dp_key_local(sbrec_ch_table), -hint); +OVN_MIN_DP_KEY_LOCAL, +get_ovn_max_dp_key_local(), +hint); if (!od->tunnel_key) { if (od->sb) { sbrec_datapath_binding_delete(od->sb); @@ -927,7 +931,6 @@ ovn_datapath_allocate_key(const struct sbrec_chassis_table *sbrec_ch_table, static void ovn_datapath_assign_requested_tnl_id( -const struct sbrec_chassis_table *sbrec_chassis_table, struct hmap *dp_tnlids, struct ovn_datapath *od) { const struct smap *other_config = (od->nbs @@ -936,8 +939,7 @@ ovn_datapath_assign_requested_tnl_id( uint32_t tunnel_key = smap_get_int(other_config, "requested-tnl-key", 0); if (tunnel_key) { const char *interconn_ts = smap_get(other_config, "interconn-ts"); -if (!interconn_ts && is_vxlan_mode(sbrec_chassis_table) && -tunnel_key >= 1 << 12) { +if (!interconn_ts && vxlan_mode && tunnel_key >= 1 << 12) { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); VLOG_WARN_RL(&rl, "Tunnel key %"PRIu32" for datapath %s is " "incompatible with VXLAN", tunnel_key, @@ -985,7 +987,6 @@ build_datapaths(struct ovsdb_idl_txn *ovnsb_txn, const struct nbrec_logical_switch_table *nbrec_ls_table, const struct nbrec_logical_router_table *nbrec_lr_table, const struct sbrec_datapath_binding_table *sbrec_dp_table, -const struct sbrec_chassis_table *sbrec_chassis_table, struct ovn_dat
[ovs-dev] [PATCH ovn v4 2/2] northd: Add support for disabling vxlan mode.
Commit [1] introduced a "vxlan mode" concept. It brought a limitation for available tunnel IDs because of lack of space in VXLAN VNI. In vxlan mode OVN is limited by 4095 datapaths (LRs or non-transit LSs) and 2047 logical switch ports per datapath. Prior to this patch vxlan mode was enabled automatically if at least one chassis had encap of vxlan type. In scenarios where one want to use VXLAN only for HW VTEP (RAMP) switch, such limitation makes no sence. This patch adds support for explicit disabling of vxlan mode via Northbound database. 1: https://github.com/ovn-org/ovn/commit/b07f1bc3d068 CC: Ihar Hrachyshka Fixes: b07f1bc3d068 ("Add VXLAN support for non-VTEP datapath bindings") Signed-off-by: Vladislav Odintsov --- NEWS | 3 +++ northd/en-global-config.c | 7 ++- northd/northd.c | 10 -- northd/northd.h | 3 ++- ovn-architecture.7.xml| 6 ++ ovn-nb.xml| 10 ++ tests/ovn-northd.at | 29 + 7 files changed, 64 insertions(+), 4 deletions(-) diff --git a/NEWS b/NEWS index 3b5e93dc9..43ab05a68 100644 --- a/NEWS +++ b/NEWS @@ -17,6 +17,9 @@ Post v24.03.0 external-ids, the option is no longer needed as it became effectively "true" for all scenarios. - Added DHCPv4 relay support. + - Added new global config option NB_Global:options:disable_vxlan_mode to +extend available tunnel IDs space for datapaths from 4095 to 16711680. +For more details see man ovn-nb(5) for mentioned option. OVN v24.03.0 - 01 Mar 2024 -- diff --git a/northd/en-global-config.c b/northd/en-global-config.c index 873649a89..f5e2a8154 100644 --- a/northd/en-global-config.c +++ b/northd/en-global-config.c @@ -115,7 +115,7 @@ en_global_config_run(struct engine_node *node , void *data) config_data->svc_monitor_mac); } -init_vxlan_mode(sbrec_chassis_table); +init_vxlan_mode(&nb->options, sbrec_chassis_table); char *max_tunid = xasprintf("%d", get_ovn_max_dp_key_local()); smap_replace(options, "max_tunid", max_tunid); free(max_tunid); @@ -533,6 +533,11 @@ check_nb_options_out_of_sync(const struct nbrec_nb_global *nb, return true; } +if (config_out_of_sync(&nb->options, &config_data->nb_options, + "disable_vxlan_mode", false)) { +return true; +} + return false; } diff --git a/northd/northd.c b/northd/northd.c index b54219a85..d1535172e 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -886,8 +886,14 @@ join_datapaths(const struct nbrec_logical_switch_table *nbrec_ls_table, } void -init_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) +init_vxlan_mode(const struct smap *nb_options, +const struct sbrec_chassis_table *sbrec_chassis_table) { +if (smap_get_bool(nb_options, "disable_vxlan_mode", false)) { +vxlan_mode = false; +return; +} + const struct sbrec_chassis *chassis; SBREC_CHASSIS_TABLE_FOR_EACH (chassis, sbrec_chassis_table) { for (int i = 0; i < chassis->n_encaps; i++) { @@ -17593,7 +17599,7 @@ ovnnb_db_run(struct northd_input *input_data, use_common_zone = smap_get_bool(input_data->nb_options, "use_common_zone", false); -init_vxlan_mode(input_data->sbrec_chassis_table); +init_vxlan_mode(input_data->nb_options, input_data->sbrec_chassis_table); build_datapaths(ovnsb_txn, input_data->nbrec_logical_switch_table, diff --git a/northd/northd.h b/northd/northd.h index be480003e..d0322e621 100644 --- a/northd/northd.h +++ b/northd/northd.h @@ -792,7 +792,8 @@ lr_has_multiple_gw_ports(const struct ovn_datapath *od) } void -init_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table); +init_vxlan_mode(const struct smap *nb_options, +const struct sbrec_chassis_table *sbrec_chassis_table); uint32_t get_ovn_max_dp_key_local(void); diff --git a/ovn-architecture.7.xml b/ovn-architecture.7.xml index 7abb1fa83..251c9c514 100644 --- a/ovn-architecture.7.xml +++ b/ovn-architecture.7.xml @@ -2919,4 +2919,10 @@ the future, gateways that do not support encapsulations with large amounts of metadata may continue to have a reduced feature set. + +vxlan mode is recommended to be disabled if VXLAN encap at +hypervisors is needed only to support HW VTEP L2 Gateway functionality. +See man ovn-nb(5) for table NB_Global column +options key disable_vxlan_mode for more details. + diff --git a/ovn-nb.xml b/ovn-nb.xml index 5cb6ba640..a99e663e5 100644 --- a/ovn-nb.xml +++ b/ovn-nb.xml @@ -381,6 +381,16 @@ of SB changes would be very noticeable. + +By default if at least one chassis in OVN
[ovs-dev] [PATCH ovn v4 0/2] Add support to disable vxlan mode.
v4: - Addressed Dumitru's and Ihar's review comments; - single patch was split into two: 1. function call replaced with a global variable `vxlan_mode`; 2. introduced `disable_vxlan_mode` configuration knob; - rebased onto latest main branch. v3: - Removed accidental ovs submodule change. v2: - Added NEWS item. Vladislav Odintsov (2): northd: Make `vxlan_mode` a global variable. northd: Add support for disabling vxlan mode. NEWS | 3 ++ northd/en-global-config.c | 9 +++- northd/northd.c | 100 +- northd/northd.h | 6 ++- ovn-architecture.7.xml| 17 --- ovn-nb.xml| 10 tests/ovn-northd.at | 29 +++ 7 files changed, 110 insertions(+), 64 deletions(-) -- 2.44.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn] ovn-ctl: Support for --config-file ovsdb-server option.
Since OVS 3.3.0 ovsdb-server accepts databases and remotes configuration via JSON text file. This patch adds support for such option. Signed-off-by: Vladislav Odintsov --- NEWS | 1 + utilities/ovn-ctl | 39 +++ 2 files changed, 36 insertions(+), 4 deletions(-) diff --git a/NEWS b/NEWS index 9adf6a31c..39ea88d78 100644 --- a/NEWS +++ b/NEWS @@ -16,6 +16,7 @@ Post v24.03.0 - Remove "ovn-set-local-ip" config option from vswitchd external-ids, the option is no longer needed as it became effectively "true" for all scenarios. + - Add support for ovsdb-server `--config-file` option in ovn-ctl. OVN v24.03.0 - 01 Mar 2024 -- diff --git a/utilities/ovn-ctl b/utilities/ovn-ctl index dae5e22f4..fd1ae1256 100755 --- a/utilities/ovn-ctl +++ b/utilities/ovn-ctl @@ -169,6 +169,7 @@ start_ovsdb__() { local sync_from_port local file local schema +local config_file local logfile local log local sock @@ -199,6 +200,7 @@ start_ovsdb__() { eval sync_from_port=\$DB_${DB}_SYNC_FROM_PORT eval file=\$DB_${DB}_FILE eval schema=\$DB_${DB}_SCHEMA +eval config_file=\$DB_${DB}_CONFIG_FILE eval logfile=\$OVN_${DB}_LOGFILE eval log=\$OVN_${DB}_LOG eval sock=\$DB_${DB}_SOCK @@ -281,7 +283,12 @@ $cluster_remote_port set ovsdb-server set "$@" $log --log-file=$logfile -set "$@" --remote=punix:$sock --pidfile=$db_pid_file +set "$@" --pidfile=$db_pid_file +if test X"$config_file" == X; then +set "$@" --remote=punix:$sock +else +set "$@" --config-file=$config_file +fi set "$@" --unixctl=$ctrl_sock [ "$OVN_USER" != "" ] && set "$@" --user "$OVN_USER" @@ -297,7 +304,7 @@ $cluster_remote_port set exec "$@" fi -if test X"$use_remote_in_db" != Xno; then +if test X"$use_remote_in_db" != Xno && test X"$config_file" == X; then set "$@" --remote=db:$schema_name,$table_name,connections fi @@ -343,6 +350,11 @@ $cluster_remote_port local run_ovsdb_in_bg="no" local process_id= + +if test X$config_file = X; then +set "$@" "$file" +fi + if test X$detach = Xno && test $mode = cluster && test -z "$cluster_remote_addr" ; then # When detach is no (for run_nb_ovsdb/run_sb_ovsdb commands) # we want to run ovsdb-server in background rather than running it in @@ -351,10 +363,10 @@ $cluster_remote_port # Note: We run only the ovsdb-server in backgroud which created the # cluster (i.e cluster_remote_addr is not set.). run_ovsdb_in_bg="yes" -"$@" $file & +"$@" & process_id=$! else -start_wrapped_daemon "$wrapper" ovsdb-$db "" "$@" "$file" +start_wrapped_daemon "$wrapper" ovsdb-$db "" "$@" fi # Initialize the database if it's NOT joining a cluster. @@ -776,6 +788,7 @@ set_defaults () { DB_NB_SYNC_FROM_PORT=6641 DB_NB_PROBE_INTERVAL_TO_ACTIVE=6 DB_NB_ELECTION_TIMER= +DB_NB_CONFIG_FILE= DB_SB_SOCK=$OVN_RUNDIR/ovnsb_db.sock DB_SB_PIDFILE=$OVN_RUNDIR/ovnsb_db.pid @@ -788,6 +801,7 @@ set_defaults () { DB_SB_SYNC_FROM_PORT=6642 DB_SB_PROBE_INTERVAL_TO_ACTIVE=6 DB_SB_ELECTION_TIMER= +DB_SB_CONFIG_FILE= DB_IC_NB_SOCK=$OVN_RUNDIR/ovn_ic_nb_db.sock DB_IC_NB_PIDFILE=$OVN_RUNDIR/ovn_ic_nb_db.pid @@ -798,6 +812,7 @@ set_defaults () { DB_IC_NB_SYNC_FROM_PROTO=tcp DB_IC_NB_SYNC_FROM_ADDR= DB_IC_NB_SYNC_FROM_PORT=6645 +DB_IC_NB_CONFIG_FILE= DB_IC_SB_SOCK=$OVN_RUNDIR/ovn_ic_sb_db.sock DB_IC_SB_PIDFILE=$OVN_RUNDIR/ovn_ic_sb_db.pid @@ -808,6 +823,7 @@ set_defaults () { DB_IC_SB_SYNC_FROM_PROTO=tcp DB_IC_SB_SYNC_FROM_ADDR= DB_IC_SB_SYNC_FROM_PORT=6646 +DB_IC_SB_CONFIG_FILE= DB_NB_SCHEMA=$ovn_datadir/ovn-nb.ovsschema DB_SB_SCHEMA=$ovn_datadir/ovn-sb.ovsschema @@ -951,6 +967,7 @@ set_defaults () { OVN_SB_RELAY_DB_SSL_CERT="" OVN_SB_RELAY_DB_SSL_CA_CERT="" DB_SB_RELAY_USE_REMOTE_IN_DB="yes" +DB_SB_RELAY_CONFIG_FILE= DB_CLUSTER_SCHEMA_UPGRADE="yes" } @@ -1124,12 +1141,16 @@ File location options: --db-nb-create-insecure-remote=yes|no Create ptcp OVN Northbound remote (default: $DB_NB_CREATE_INSECURE_REMOTE) --db-nb-probe-interval-to-active Active probe interval from standby to active ovsdb-server remote (default: $DB_NB_PROBE_INTERVAL_TO_ACTIVE) --db-nb-election-timer=MS OVN Northbound RAFT db election timer to use on db creation (in
[ovs-dev] [ovn] ovn-controller segmentation fault in svc_monitor_send_tcp_health_check__()
Hi all, I’m running ovn 22.09 and sometimes see that ovn-controllers crash with segmentation fault. The backtrace is next: (gdb) bt #0 0x7f0742707de1 in __strlen_sse2 () from /lib64/libc.so.6 #1 0x7f0742788c5d in inet_pton () from /lib64/libc.so.6 #2 0x564f45a1c784 in ip_parse (s=, ip=ip@entry=0x7f074040f90c) at lib/packets.c:698 #3 0x564f4594cbfb in svc_monitor_send_tcp_health_check__ (swconn=swconn@entry=0x7f0738000940, svc_mon=svc_mon@entry=0x564f4c2960c0, ctl_flags=ctl_flags@entry=2, tcp_seq=3858078915, tcp_ack=tcp_ack@entry=0, tcp_src=) at controller/pinctrl.c:7513 #4 0x564f4594d47c in svc_monitor_send_tcp_health_check__ (tcp_src=, tcp_ack=0, tcp_seq=, ctl_flags=2, svc_mon=0x564f4c2960c0, swconn=0x7f0738000940) at controller/pinctrl.c:7502 #5 svc_monitor_send_health_check (swconn=swconn@entry=0x7f0738000940, svc_mon=svc_mon@entry=0x564f4c2960c0) at controller/pinctrl.c:7621 #6 0x564f4595869b in svc_monitors_run (svc_monitors_next_run_time=0x564f45dd3970 , swconn=0x7f0738000940) at controller/pinctrl.c:7693 #7 pinctrl_handler (arg_=0x564f45e11240 ) at controller/pinctrl.c:3499 #8 0x564f45a0ad6f in ovsthread_wrapper (aux_=) at lib/ovs-thread.c:422 #9 0x7f074325bea5 in start_thread () from /lib64/libpthread.so.0 #10 0x7f07427798dd in clone () from /lib64/libc.so.6 After moving to frame #3, I can get actual data from svc_mon structure (port/protocol/dp_key/port_key) - I’ve looked them up in SB DB and found port_binding, which belongs to a logical port, which resides on this chassis. It has configured LB with HC. Here everything seems good. But if to check svc_mon->sb_svc_mon structure, it seems to me that it contains garbage - Address 0x564f out of bounds; logical_port == 0, etc (but I can be wrong): $1 = (const struct sbrec_service_monitor *) 0x564f54db2b40 (gdb) print *svc_mon->sb_svc_mon $2 = {header_ = {hmap_node = {hash = 94898726054728, next = 0x0}, uuid = {parts = {0, 0, 0, 0}}, src_arcs = {prev = 0x564f54aae0d0, next = 0x0}, dst_arcs = {prev = 0x564f7f8bd470, next = 0x564f7f8bd540}, table = 0x64, old_datum = 0xf, parsed = 152, reparse_node = {prev = 0x0, next = 0x0}, new_datum = 0x0, prereqs = 0x52eb8916, written = 0x171, txn_node = {hash = 1, next = 0x564f54db2db0}, map_op_written = 0x0, map_op_lists = 0x0, set_op_written = 0x0, set_op_lists = 0x0, change_seqno = {0, 0, 0}, track_node = {prev = 0x564f, next = 0x0}, updated = 0x0, tracked_old_datum = 0x0}, external_ids = {map = {buckets = 0x1, one = 0x564f54db2d90, mask = 0, n = 0}}, ip = 0x564f , logical_port = 0x0, options = {map = {buckets = 0x0, one = 0x0, mask = 1, n = 94898780242768}}, port = 0, protocol = 0x0, src_ip = 0x1 , src_mac = 0x564f54db2d70 "`Ջ\177OV", status = 0x0} … (gdb) print svc_mon->state $8 = SVC_MON_S_ONLINE (gdb) print svc_mon->status $9 = SVC_MON_ST_ONLINE (gdb) print svc_mon->protocol $10 = SVC_MON_PROTO_TCP (gdb) print svc_mon->sb_svc_mon This crash occurred right after ovsdb SB connection loss due to inactivity probe failure. So, ovn-controller was re-connecting to SB, and I guess, this could somehow re-initialize SB IDL objects. I’m not sure I can try to reproduce this behaviour on latest main branch, so my question, if this theoretically can be connected with re-initialization of IDL? If yes, what should be done to avoid such behavior? Should ovn-controller process changes if its IDL is in inconsistent state? Any help is appreciated. Regards, Vladislav Odintsov ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v4] Make tunnel ids exhaustion test trigger the problem.
Hi Ihar, Thanks for cooperation and enhancements in the testcases! The patch looks good to me. > On 5 Apr 2024, at 19:14, Ihar Hrachyshka wrote: > > The original version of the scenario passed with or without the fix. > This is because all LSs were processed in one go, so the allocate > function was never entered with *hint==0. > > Also, added another scenario that will check behavior when *hint is out > of [min;max] bounds but > max (this happens in an obscure scenario where > a vxlan chassis is added to the cluster mid-light, forcing northd to > reduce its effective max value for tunnel ids; which may become lower > than the current *hint for ports.) > > Fixes: a1f165a7b807 ("northd: fix infinite loop in ovn_allocate_tnlid()") > Co-Authored-By: Vladislav Odintsov > Signed-off-by: Vladislav Odintsov > Signed-off-by: Ihar Hrachyshka > --- > v1: initial version. > v2: cover both cases of hint = 0 and hint > max. > v3: reduce the number of ports to create in the hint > max scenario needed to > trigger the problem. > v4: remove spurious lib/ovn-util.c change. > --- > tests/ovn-northd.at | 43 --- > 1 file changed, 40 insertions(+), 3 deletions(-) > > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > index be006fb32..1a4e7274d 100644 > --- a/tests/ovn-northd.at > +++ b/tests/ovn-northd.at > @@ -2823,7 +2823,7 @@ AT_CLEANUP > ]) > > OVN_FOR_EACH_NORTHD_NO_HV([ > -AT_SETUP([check tunnel ids exhaustion]) > +AT_SETUP([check datapath tunnel ids exhaustion]) > ovn_start > > # Create a fake chassis with vxlan encap to lower MAX DP tunnel key to 2^12 > @@ -2833,13 +2833,18 @@ ovn-sbctl \ > > cmd="ovn-nbctl --wait=sb" > > -for i in {1..4097}; do > +for i in {1..4095}; do > cmd="${cmd} -- ls-add lsw-${i}" > done > > eval $cmd > > -check_row_count nb:Logical_Switch 4097 > +check_row_count nb:Logical_Switch 4095 > +wait_row_count sb:Datapath_Binding 4095 > + > +ovn-nbctl ls-add lsw-exhausted > + > +check_row_count nb:Logical_Switch 4096 > wait_row_count sb:Datapath_Binding 4095 > > OVS_WAIT_UNTIL([grep "all datapath tunnel ids exhausted" > northd/ovn-northd.log]) > @@ -2847,6 +2852,38 @@ OVS_WAIT_UNTIL([grep "all datapath tunnel ids > exhausted" northd/ovn-northd.log]) > AT_CLEANUP > ]) > > +OVN_FOR_EACH_NORTHD_NO_HV([ > +AT_SETUP([check port tunnel ids exhaustion; vxlan chassis pops up midflight]) > +ovn_start > + > +cmd="ovn-nbctl --wait=sb" > + > +cmd="${cmd} -- ls-add lsw" > +for i in {1..2048}; do > +cmd="${cmd} -- lsp-add lsw lsp-${i}" > +done > + > +eval $cmd > + > +check_row_count nb:Logical_Switch_Port 2048 > +wait_row_count sb:Port_Binding 2048 > + > +# Now create a fake chassis with vxlan encap to lower MAX port tunnel key to > 2^11 > +ovn-sbctl \ > +--id=@e create encap chassis_name=hv1 ip="192.168.0.1" type="vxlan" \ > +-- --id=@c create chassis name=hv1 encaps=@e > + > +ovn-nbctl lsp-add lsw lsp-exhausted > + > +check_row_count nb:Logical_Switch_Port 2049 > +wait_row_count sb:Port_Binding 2048 > + > +OVS_WAIT_UNTIL([grep "all port tunnel ids exhausted" northd/ovn-northd.log]) > + > +AT_CLEANUP > +]) > + > + > > OVN_FOR_EACH_NORTHD_NO_HV([ > AT_SETUP([Logical Flow Datapath Groups]) > -- > 2.41.0 > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev Regards, Vladislav Odintsov ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] Make tunnel ids exhaustion test case trigger the problem.
> On 5 Apr 2024, at 18:35, Ihar Hrachyshka wrote: > > On Thu, Apr 4, 2024 at 3:56 PM Vladislav Odintsov <mailto:odiv...@gmail.com>> wrote: >> Thanks Ihar for the patch. >> >> It definitely triggers the bug mentioned in Fixes commit, but how do you >> like next diff as an alternative? >> It seems a little easier to me, because it shows the real limit and the >> situation where the problem was (separate ls-add): >> > > Ah, I think we are talking about two separate scenarios, both resulting in > *hint out of [min; max] bounds! > > - You are talking about hint=0 with min:max = [1; 4096] - which indeed can be > triggered by creating a new DP *after* tunnel ids are exhausted; > - I am talking about a more obscure case where hint=4097 (because originally > there were no vxlan chassis in the cluster); then a vxlan chassis is created > (reducing max to 4096); then the allocation function is entered with > hint=4097. > > Both scenarios are fixed by your patch. It may be worth having both test > cases, one per scenario, in the test suite then. What do you think? I agree, it’s worth adding both. Thanks for clarification! > > (Side Note: I now find the runtime flip of max cap as a vxlan chassis is > added - that I myself implemented - unfortunate.) > > Ihar > >> diff --git a/tests/ovn-northd.at <http://ovn-northd.at/> >> b/tests/ovn-northd.at <http://ovn-northd.at/> >> index 6edb1129e..cef144f10 100644 >> --- a/tests/ovn-northd.at <http://ovn-northd.at/> >> +++ b/tests/ovn-northd.at <http://ovn-northd.at/> >> @@ -2862,13 +2862,18 @@ ovn-sbctl \ >> >> cmd="ovn-nbctl --wait=sb" >> >> -for i in {1..4097}; do >> +for i in {1..4095}; do >> cmd="${cmd} -- ls-add lsw-${i}" >> done >> >> eval $cmd >> >> -check_row_count nb:Logical_Switch 4097 >> +check_row_count nb:Logical_Switch 4095 >> +wait_row_count sb:Datapath_Binding 4095 >> + >> +ovn-nbctl ls-add lsw-exhausted >> + >> +check_row_count nb:Logical_Switch 4096 >> wait_row_count sb:Datapath_Binding 4095 >> >> OVS_WAIT_UNTIL([grep "all datapath tunnel ids exhausted" >> northd/ovn-northd.log]) >> >> >>> On 4 Apr 2024, at 20:13, Ihar Hrachyshka >> <mailto:ihrac...@redhat.com>> wrote: >>> >>> The original version of the scenario passed with or without the fix. >>> >>> Fixes: a1f165a7b807 ("northd: fix infinite loop in ovn_allocate_tnlid()") >>> Signed-off-by: Ihar Hrachyshka >> <mailto:ihrac...@redhat.com>> >>> --- >>> tests/ovn-northd.at <http://ovn-northd.at/> | 17 +++-- >>> 1 file changed, 11 insertions(+), 6 deletions(-) >>> >>> diff --git a/tests/ovn-northd.at <http://ovn-northd.at/> >>> b/tests/ovn-northd.at <http://ovn-northd.at/> >>> index fc2c972a4..e8ea8b050 100644 >>> --- a/tests/ovn-northd.at <http://ovn-northd.at/> >>> +++ b/tests/ovn-northd.at <http://ovn-northd.at/> >>> @@ -2826,11 +2826,6 @@ OVN_FOR_EACH_NORTHD_NO_HV([ >>> AT_SETUP([check tunnel ids exhaustion]) >>> ovn_start >>> >>> -# Create a fake chassis with vxlan encap to lower MAX DP tunnel key to 2^12 >>> -ovn-sbctl \ >>> ---id=@e create encap chassis_name=hv1 ip="192.168.0.1" type="vxlan" \ >>> --- --id=@c create chassis name=hv1 encaps=@e >>> - >>> cmd="ovn-nbctl --wait=sb" >>> >>> for i in {1..4097}; do >>> @@ -2840,7 +2835,17 @@ done >>> eval $cmd >>> >>> check_row_count nb:Logical_Switch 4097 >>> -wait_row_count sb:Datapath_Binding 4095 >>> +wait_row_count sb:Datapath_Binding 4097 >>> + >>> +# Now create a fake chassis with vxlan encap to lower MAX DP tunnel key to >>> 2^12 >>> +ovn-sbctl \ >>> +--id=@e create encap chassis_name=hv1 ip="192.168.0.1" type="vxlan" \ >>> +-- --id=@c create chassis name=hv1 encaps=@e >>> + >>> +ovn-nbctl --wait=sb ls-add lsw-exhausted >>> + >>> +check_row_count nb:Logical_Switch 4098 >>> +wait_row_count sb:Datapath_Binding 4097 >>> >>> OVS_WAIT_UNTIL([grep "all datapath tunnel ids exhausted" >>> northd/ovn-northd.log]) >>> >>> -- >>> 2.41.0 >>> >>> ___ >>> dev mailing list >>> d...@openvswitch.org <mailto:d...@openvswitch.org> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> >> >> >> Regards, >> Vladislav Odintsov >> Regards, Vladislav Odintsov ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] Make tunnel ids exhaustion test case trigger the problem.
Yes, this diff is from main. To trigger an initial bug it is enough to create a new ls/lr while all available tunnel ids are used for datapaths (4095). This is because we need to enter ovn_allocate_tnlid() with *hint=0 to trigger infinite loop. That is why I suggest just to create 4095 LSs and then create another one. I’ve tested this diff and see that northd goes to 100% CPU and doesn’t print warn log about ids exhaustion. > On 4 Apr 2024, at 23:34, Ihar Hrachyshka wrote: > > On Thu, Apr 4, 2024 at 3:56 PM Vladislav Odintsov wrote: > >> Thanks Ihar for the patch. >> >> It definitely triggers the bug mentioned in Fixes commit, but how do you >> like next diff as an alternative? >> It seems a little easier to me, because it shows the real limit and the >> situation where the problem was (separate ls-add): >> > > Is it a diff from main? I don't think it will trigger the issue. The key is > to trigger northd to change its max cap for tunnel ids AFTER it bumped hint > beyond the "vxlan mode max tun_id" (which is why I have to create vxlan > chassis AFTER I create enough LSs to get into unsafe territory.) > > Note: I haven't tried your version yet; I may check your version some time > later. So it's the initial thought only. > > >> >> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >> index 6edb1129e..cef144f10 100644 >> --- a/tests/ovn-northd.at >> +++ b/tests/ovn-northd.at >> @@ -2862,13 +2862,18 @@ ovn-sbctl \ >> >> cmd="ovn-nbctl --wait=sb" >> >> -for i in {1..4097}; do >> +for i in {1..4095}; do >> cmd="${cmd} -- ls-add lsw-${i}" >> done >> >> eval $cmd >> >> -check_row_count nb:Logical_Switch 4097 >> +check_row_count nb:Logical_Switch 4095 >> +wait_row_count sb:Datapath_Binding 4095 >> + >> +ovn-nbctl ls-add lsw-exhausted >> + >> +check_row_count nb:Logical_Switch 4096 >> wait_row_count sb:Datapath_Binding 4095 >> >> OVS_WAIT_UNTIL([grep "all datapath tunnel ids exhausted" >> northd/ovn-northd.log]) >> >> >> On 4 Apr 2024, at 20:13, Ihar Hrachyshka wrote: >> >> The original version of the scenario passed with or without the fix. >> >> Fixes: a1f165a7b807 ("northd: fix infinite loop in ovn_allocate_tnlid()") >> Signed-off-by: Ihar Hrachyshka >> --- >> tests/ovn-northd.at | 17 +++-- >> 1 file changed, 11 insertions(+), 6 deletions(-) >> >> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >> index fc2c972a4..e8ea8b050 100644 >> --- a/tests/ovn-northd.at >> +++ b/tests/ovn-northd.at >> @@ -2826,11 +2826,6 @@ OVN_FOR_EACH_NORTHD_NO_HV([ >> AT_SETUP([check tunnel ids exhaustion]) >> ovn_start >> >> -# Create a fake chassis with vxlan encap to lower MAX DP tunnel key to >> 2^12 >> -ovn-sbctl \ >> ---id=@e create encap chassis_name=hv1 ip="192.168.0.1" type="vxlan" \ >> --- --id=@c create chassis name=hv1 encaps=@e >> - >> cmd="ovn-nbctl --wait=sb" >> >> for i in {1..4097}; do >> @@ -2840,7 +2835,17 @@ done >> eval $cmd >> >> check_row_count nb:Logical_Switch 4097 >> -wait_row_count sb:Datapath_Binding 4095 >> +wait_row_count sb:Datapath_Binding 4097 >> + >> +# Now create a fake chassis with vxlan encap to lower MAX DP tunnel key >> to 2^12 >> +ovn-sbctl \ >> +--id=@e create encap chassis_name=hv1 ip="192.168.0.1" type="vxlan" \ >> +-- --id=@c create chassis name=hv1 encaps=@e >> + >> +ovn-nbctl --wait=sb ls-add lsw-exhausted >> + >> +check_row_count nb:Logical_Switch 4098 >> +wait_row_count sb:Datapath_Binding 4097 >> >> OVS_WAIT_UNTIL([grep "all datapath tunnel ids exhausted" >> northd/ovn-northd.log]) >> >> -- >> 2.41.0 >> >> ___ >> dev mailing list >> d...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> >> >> >> >> Regards, >> Vladislav Odintsov >> >> > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev Regards, Vladislav Odintsov ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] northd: fix infinite loop in ovn_allocate_tnlid()
> On 4 Apr 2024, at 22:51, Mark Michelson wrote: > > On 4/4/24 12:46, Dumitru Ceara wrote: >> On 4/4/24 17:52, Vladislav Odintsov wrote: >>> Thanks Dumitru! >>> I’m totally fine with your change. >>> Should I send backport patches with resolved conflicts for remaining >>> branches at least till 22.03, which is an LTS? >>> >> Well, 24.03 is the most recent LTS. We don't really backport patches to >> 22.03 unless they fix critical issues. I'm not completely convinced >> that this is such a critical issue though. You need 4K logical >> datapaths with vxlan enabled before this gets hit. In any case, Mark, >> what do you think? > > I don't think this needs backporting down to 22.03. I just wanted to mention that to reproduce this bug it is only enough to have at least one chassis with vxlan encap and create 4096 LSs/LRs. If the problem is triggered, ovn-northd starts consuming 100% CPU and hangs (doesn’t process any change) until excess LS/LR is removed and northd is restarted. I can submit backport patches for old branches if needed (already rebased). > >> Regards, >> Dumitru >>>> On 4 Apr 2024, at 18:26, Dumitru Ceara wrote: >>>> >>>> On 4/1/24 16:27, Mark Michelson wrote: >>>>> Thanks Vladislav, >>>>> >>>>> Acked-by: Mark Michelson >>>> <mailto:mmich...@redhat.com>> >>>>> >>>> >>>> Thanks, Vladislav and Mark! Applied to main and backported down to >>>> 23.06 with a minor test change, please see below. >>>> >>>>> On 4/1/24 08:15, Vladislav Odintsov wrote: >>>>>> In case if all tunnel ids are exhausted, ovn_allocate_tnlid() function >>>>>> iterates over tnlids indefinitely when *hint is outside of [min, max]. >>>>>> This is because when tnlid reaches max, next tnlid is min and for-loop >>>>>> never reaches exit condition for tnlid != *hint. >>>>>> >>>>>> This patch fixes mentioned issue and adds a testcase. >>>>>> >>>>>> Signed-off-by: Vladislav Odintsov >>>>>> --- >>>>>> lib/ovn-util.c | 10 +++--- >>>>>> tests/ovn-northd.at | 26 ++ >>>>>> 2 files changed, 33 insertions(+), 3 deletions(-) >>>>>> >>>>>> diff --git a/lib/ovn-util.c b/lib/ovn-util.c >>>>>> index ee5cbcdc3..9f97ae2ca 100644 >>>>>> --- a/lib/ovn-util.c >>>>>> +++ b/lib/ovn-util.c >>>>>> @@ -693,13 +693,17 @@ uint32_t >>>>>> ovn_allocate_tnlid(struct hmap *set, const char *name, uint32_t min, >>>>>> uint32_t max, uint32_t *hint) >>>>>> { >>>>>> -for (uint32_t tnlid = next_tnlid(*hint, min, max); tnlid != *hint; >>>>>> - tnlid = next_tnlid(tnlid, min, max)) { >>>>>> +/* Normalize hint, because it can be outside of [min, max]. */ >>>>>> +*hint = next_tnlid(*hint, min, max); >>>>>> + >>>>>> +uint32_t tnlid = *hint; >>>>>> +do { >>>>>> if (ovn_add_tnlid(set, tnlid)) { >>>>>> *hint = tnlid; >>>>>> return tnlid; >>>>>> } >>>>>> -} >>>>>> +tnlid = next_tnlid(tnlid, min, max); >>>>>> +} while (tnlid != *hint); >>>>>> static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); >>>>>> VLOG_WARN_RL(&rl, "all %s tunnel ids exhausted", name); >>>>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >>>>>> index cd53755b2..174dbacda 100644 >>>>>> --- a/tests/ovn-northd.at >>>>>> +++ b/tests/ovn-northd.at >>>>>> @@ -2822,6 +2822,32 @@ AT_CHECK([test $lsp02 = 3 && test $ls1 = 123]) >>>>>> AT_CLEANUP >>>>>> ]) >>>>>> +OVN_FOR_EACH_NORTHD_NO_HV([ >>>>>> +AT_SETUP([check tunnel ids exhaustion]) >>>>>> +ovn_start >>>>>> + >>>>>> +# Create a fake chassis with vxlan encap to lower MAX DP tunnel key >>>>>> to 2^12 >>>>>> +ovn-sbctl \ >>>>>> +--id=@e create encap chassis_name=hv1 ip="192.168.0.1" >>>>>> type="vxlan" \ >>>>>> +-- --id=@c create chassis name=hv1 encaps=@e >>>>>> + >>>>>> +cmd="ovn-nbctl --wait=sb" >>>>>> + >>>>>> +for i in {1..4097..1}; do >>>> >>>> This can be changed to: >>>> >>>> for i in {1..4097}; do >>>> >>>>>> +cmd="${cmd} -- ls-add lsw-${i}" >>>>>> +done >>>>>> + >>>>>> +eval $cmd >>>>>> + >>>>>> +check_row_count nb:Logical_Switch 4097 >>>>>> +wait_row_count sb:Datapath_Binding 4095 >>>>>> + >>>>>> +OVS_WAIT_UNTIL([grep "all datapath tunnel ids exhausted" >>>>>> northd/ovn-northd.log]) >>>>>> + >>>>>> +AT_CLEANUP >>>>>> +]) >>>>>> + >>>>>> + >>>>>> OVN_FOR_EACH_NORTHD_NO_HV([ >>>>>> AT_SETUP([Logical Flow Datapath Groups]) >>>>>> ovn_start >>>> >>>> Regards, >>>> Dumitru >>>> >>>> ___ >>>> dev mailing list >>>> d...@openvswitch.org <mailto:d...@openvswitch.org> >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> >>> >>> Regards, >>> Vladislav Odintsov >>> >>> > Regards, Vladislav Odintsov ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] Make tunnel ids exhaustion test case trigger the problem.
Thanks Ihar for the patch. It definitely triggers the bug mentioned in Fixes commit, but how do you like next diff as an alternative? It seems a little easier to me, because it shows the real limit and the situation where the problem was (separate ls-add): diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at index 6edb1129e..cef144f10 100644 --- a/tests/ovn-northd.at +++ b/tests/ovn-northd.at @@ -2862,13 +2862,18 @@ ovn-sbctl \ cmd="ovn-nbctl --wait=sb" -for i in {1..4097}; do +for i in {1..4095}; do cmd="${cmd} -- ls-add lsw-${i}" done eval $cmd -check_row_count nb:Logical_Switch 4097 +check_row_count nb:Logical_Switch 4095 +wait_row_count sb:Datapath_Binding 4095 + +ovn-nbctl ls-add lsw-exhausted + +check_row_count nb:Logical_Switch 4096 wait_row_count sb:Datapath_Binding 4095 OVS_WAIT_UNTIL([grep "all datapath tunnel ids exhausted" northd/ovn-northd.log]) > On 4 Apr 2024, at 20:13, Ihar Hrachyshka wrote: > > The original version of the scenario passed with or without the fix. > > Fixes: a1f165a7b807 ("northd: fix infinite loop in ovn_allocate_tnlid()") > Signed-off-by: Ihar Hrachyshka > --- > tests/ovn-northd.at | 17 +++-- > 1 file changed, 11 insertions(+), 6 deletions(-) > > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > index fc2c972a4..e8ea8b050 100644 > --- a/tests/ovn-northd.at > +++ b/tests/ovn-northd.at > @@ -2826,11 +2826,6 @@ OVN_FOR_EACH_NORTHD_NO_HV([ > AT_SETUP([check tunnel ids exhaustion]) > ovn_start > > -# Create a fake chassis with vxlan encap to lower MAX DP tunnel key to 2^12 > -ovn-sbctl \ > ---id=@e create encap chassis_name=hv1 ip="192.168.0.1" type="vxlan" \ > --- --id=@c create chassis name=hv1 encaps=@e > - > cmd="ovn-nbctl --wait=sb" > > for i in {1..4097}; do > @@ -2840,7 +2835,17 @@ done > eval $cmd > > check_row_count nb:Logical_Switch 4097 > -wait_row_count sb:Datapath_Binding 4095 > +wait_row_count sb:Datapath_Binding 4097 > + > +# Now create a fake chassis with vxlan encap to lower MAX DP tunnel key to > 2^12 > +ovn-sbctl \ > +--id=@e create encap chassis_name=hv1 ip="192.168.0.1" type="vxlan" \ > +-- --id=@c create chassis name=hv1 encaps=@e > + > +ovn-nbctl --wait=sb ls-add lsw-exhausted > + > +check_row_count nb:Logical_Switch 4098 > +wait_row_count sb:Datapath_Binding 4097 > > OVS_WAIT_UNTIL([grep "all datapath tunnel ids exhausted" > northd/ovn-northd.log]) > > -- > 2.41.0 > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev Regards, Vladislav Odintsov ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] northd: fix infinite loop in ovn_allocate_tnlid()
> On 4 Apr 2024, at 21:07, Ihar Hrachyshka wrote: > > On Thu, Apr 4, 2024 at 1:46 PM Dumitru Ceara <mailto:dce...@redhat.com>> wrote: >> On 4/4/24 19:17, Ihar Hrachyshka wrote: >> > I tried to revert the util change and the test case passed just fine. >> > >> >> I had done that before pushing the patch but.. I got tricked by the fact >> that northd was spinning and using cpu 100% while the switches were >> added. My bad. >> >> > I think the scenario that may get the hint out of bounds is 1) start with >> > no vxlan chassis; 2) create 4097 DPs; 3) add a vxlan chassis - this makes >> > northd downgrade its max key to 4096. Now when we create a DP, it will spin >> > in circles. Posted this here: >> > https://patchwork.ozlabs.org/project/ovn/patch/20240404171358.54678-1-ihrac...@redhat.com/ >> > Nice catch! Thanks for the patch! >> > (We can probably discuss in this context whether it's a good idea for a >> > cluster to change the max tun id value as chassis come and go. Perhaps it >> > should be initialized once and for all?) >> > >> > What I also notice is that with the new patch, *hint is always overridden >> > at the start of the function, so it's always bumped by 1. I am not sure it >> > was intended. Comments? >> > >> >> But the actual change in behavior for '*hint' is only for the case in >> which we run out of IDs, or am I missing something? It didn't seem that >> big of a deal to me. > > Yes, I also don't see a problem, but want the author to confirm if there's a > reason for that. I’ve just revised the code again and see that for the case, where *hint = 0 and min=10, max=20 this still will not work. However I’m not sure if this must be fixed, while there are no such cases for now. What do you think? *hint bump every time in normal situation (where we have enough available IDs) should be safe because it has similar behaviour to previous implementation. First, tnlid was set to * + 1 and then *hint was set by current tnlid. It seems the same to me. Am I missing something? > >> >> > This is all probably relevant to the question of whether any backports >> > should happen for this patch. >> > >> > Ihar >> > >> >> Regards, >> Dumitru >> >> > >> > On Thu, Apr 4, 2024 at 12:46 PM Dumitru Ceara > > <mailto:dce...@redhat.com>> wrote: >> > >> >> On 4/4/24 17:52, Vladislav Odintsov wrote: >> >>> Thanks Dumitru! >> >>> I’m totally fine with your change. >> >>> Should I send backport patches with resolved conflicts for remaining >> >> branches at least till 22.03, which is an LTS? >> >>> >> >> >> >> Well, 24.03 is the most recent LTS. We don't really backport patches to >> >> 22.03 unless they fix critical issues. I'm not completely convinced >> >> that this is such a critical issue though. You need 4K logical >> >> datapaths with vxlan enabled before this gets hit. In any case, Mark, >> >> what do you think? >> >> >> >> Regards, >> >> Dumitru >> >> >> >>>> On 4 Apr 2024, at 18:26, Dumitru Ceara > >>>> <mailto:dce...@redhat.com>> wrote: >> >>>> >> >>>> On 4/1/24 16:27, Mark Michelson wrote: >> >>>>> Thanks Vladislav, >> >>>>> >> >>>>> Acked-by: Mark Michelson > >>>>> <mailto:mmich...@redhat.com> > >> mmich...@redhat.com <mailto:mmich...@redhat.com>>> >> >>>>> >> >>>> >> >>>> Thanks, Vladislav and Mark! Applied to main and backported down to >> >>>> 23.06 with a minor test change, please see below. >> >>>> >> >>>>> On 4/1/24 08:15, Vladislav Odintsov wrote: >> >>>>>> In case if all tunnel ids are exhausted, ovn_allocate_tnlid() function >> >>>>>> iterates over tnlids indefinitely when *hint is outside of [min, max]. >> >>>>>> This is because when tnlid reaches max, next tnlid is min and for-loop >> >>>>>> never reaches exit condition for tnlid != *hint. >> >>>>>> >> >>>>>> This patch fixes mentioned issue and adds a testcase. >> >>>>>> >> >>>>>> Signed-off-by: Vladislav Odintsov > >>
[ovs-dev] [PATCH ovn v3] northd: Add support for disabling vxlan mode.
Commit [1] introduced a "vxlan mode" concept. It brought a limitation for available tunnel IDs because of lack of space in VXLAN VNI. In vxlan mode OVN is limited by 4095 datapaths (LRs or non-transit LSs) and 2047 logical switch ports per datapath. Prior to this patch vxlan mode was enabled automatically if at least one chassis had encap of vxlan type. In scenarios where one want to use VXLAN only for HW VTEP (RAMP) switch, such limitation makes no sence. This patch adds support for explicit disabling of vxlan mode via Northbound database. 1: https://github.com/ovn-org/ovn/commit/b07f1bc3d068 CC: Ihar Hrachyshka Fixes: b07f1bc3d068 ("Add VXLAN support for non-VTEP datapath bindings") Signed-off-by: Vladislav Odintsov --- NEWS | 3 ++ northd/en-global-config.c | 9 +++- northd/northd.c | 90 ++- northd/northd.h | 6 ++- ovn-nb.xml| 12 ++ tests/ovn-northd.at | 29 + 6 files changed, 97 insertions(+), 52 deletions(-) diff --git a/NEWS b/NEWS index 141f1831c..fe95391cd 100644 --- a/NEWS +++ b/NEWS @@ -13,6 +13,9 @@ Post v24.03.0 "lflow-stage-to-oftable STAGE_NAME" that converts stage name into OpenFlow table id. - Rename the ovs-sandbox script to ovn-sandbox. + - Added new global config option NB_Global:options:disable_vxlan_mode to +extend available tunnel IDs space for datapaths from 4095 to 16711680. +For more details see man ovn-nb for mentioned option. OVN v24.03.0 - 01 Mar 2024 -- diff --git a/northd/en-global-config.c b/northd/en-global-config.c index 34e393b33..9310c4575 100644 --- a/northd/en-global-config.c +++ b/northd/en-global-config.c @@ -115,8 +115,8 @@ en_global_config_run(struct engine_node *node , void *data) config_data->svc_monitor_mac); } -char *max_tunid = xasprintf("%d", -get_ovn_max_dp_key_local(sbrec_chassis_table)); +init_vxlan_mode(&nb->options, sbrec_chassis_table); +char *max_tunid = xasprintf("%d", get_ovn_max_dp_key_local()); smap_replace(options, "max_tunid", max_tunid); free(max_tunid); @@ -523,6 +523,11 @@ check_nb_options_out_of_sync(const struct nbrec_nb_global *nb, return true; } +if (config_out_of_sync(&nb->options, &config_data->nb_options, + "disable_vxlan_mode", false)) { +return true; +} + return false; } diff --git a/northd/northd.c b/northd/northd.c index c568f6360..859b233e8 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -90,6 +90,10 @@ static bool use_ct_inv_match = true; */ static bool default_acl_drop; +/* If this option is 'true' northd will use limited 24-bit space for datapath + * and ports tunnel key allocation (12 bits for each instead of default 16). */ +static bool vxlan_mode; + #define MAX_OVN_TAGS 4096 @@ -875,24 +879,31 @@ join_datapaths(const struct nbrec_logical_switch_table *nbrec_ls_table, } } -static bool -is_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) +void +init_vxlan_mode(const struct smap *nb_options, +const struct sbrec_chassis_table *sbrec_chassis_table) { +if (smap_get_bool(nb_options, "disable_vxlan_mode", false)) { +vxlan_mode = false; +return; +} + const struct sbrec_chassis *chassis; SBREC_CHASSIS_TABLE_FOR_EACH (chassis, sbrec_chassis_table) { for (int i = 0; i < chassis->n_encaps; i++) { if (!strcmp(chassis->encaps[i]->type, "vxlan")) { -return true; +vxlan_mode = true; +return; } } } -return false; +vxlan_mode = false; } uint32_t -get_ovn_max_dp_key_local(const struct sbrec_chassis_table *sbrec_chassis_table) +get_ovn_max_dp_key_local(void) { -if (is_vxlan_mode(sbrec_chassis_table)) { +if (vxlan_mode) { /* OVN_MAX_DP_GLOBAL_NUM doesn't apply for vxlan mode. */ return OVN_MAX_DP_VXLAN_KEY; } @@ -900,15 +911,14 @@ get_ovn_max_dp_key_local(const struct sbrec_chassis_table *sbrec_chassis_table) } static void -ovn_datapath_allocate_key(const struct sbrec_chassis_table *sbrec_ch_table, - struct hmap *datapaths, struct hmap *dp_tnlids, +ovn_datapath_allocate_key(struct hmap *datapaths, struct hmap *dp_tnlids, struct ovn_datapath *od, uint32_t *hint) { if (!od->tunnel_key) { od->tunnel_key = ovn_allocate_tnlid(dp_tnlids, "datapath", -OVN_MIN_DP_KEY_LOCAL, -get_ovn_max_dp_key_local(sbrec_ch_table), -hint); +OVN_MIN_DP_KEY_LOC
Re: [ovs-dev] [PATCH ovn v2] northd: Add support for disabling vxlan mode.
Oh, my bad. I’ll send out v3. Sorry. > On 4 Apr 2024, at 19:53, Dumitru Ceara wrote: > > On 4/4/24 18:06, Vladislav Odintsov wrote: >> Commit [1] introduced a "vxlan mode" concept. It brought a limitation >> for available tunnel IDs because of lack of space in VXLAN VNI. >> In vxlan mode OVN is limited by 4095 datapaths (LRs or non-transit LSs) >> and 2047 logical switch ports per datapath. >> >> Prior to this patch vxlan mode was enabled automatically if at least one >> chassis had encap of vxlan type. In scenarios where one want to use VXLAN >> only for HW VTEP (RAMP) switch, such limitation makes no sence. >> >> This patch adds support for explicit disabling of vxlan mode via >> Northbound database. >> >> 1: https://github.com/ovn-org/ovn/commit/b07f1bc3d068 >> >> CC: Ihar Hrachyshka >> Fixes: b07f1bc3d068 ("Add VXLAN support for non-VTEP datapath bindings") >> Signed-off-by: Vladislav Odintsov >> --- > > >> diff --git a/ovs b/ovs >> index fe55ce37a..94191b7a4 16 >> --- a/ovs >> +++ b/ovs >> @@ -1 +1 @@ >> -Subproject commit fe55ce37a7b090d09dee5c01ae0797320ad678f6 >> +Subproject commit 94191b7a4926204510931770c52992c9ea24d4e2 > > Looks like you included a submodule change by accident. This causes the > CI to fail: > https://github.com/ovsrobot/ovn/actions/runs/8557944614 > > Regards, > Dumitru Regards, Vladislav Odintsov ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] northd: Add support for disabling vxlan mode.
I’ve sent v2: https://patchwork.ozlabs.org/project/ovn/patch/20240404160628.970615-1-odiv...@gmail.com/ > On 4 Apr 2024, at 18:27, Dumitru Ceara wrote: > > On 4/4/24 14:38, Vladislav Odintsov wrote: >> *Patch [1] is >> https://patchwork.ozlabs.org/project/ovn/patch/20240401121510.758326-1-odiv...@gmail.com/ >> >>> On 4 Apr 2024, at 15:33, Vladislav Odintsov wrote: >>> >>> Hi Dumitru, >>> >>> thanks for your attention on this! >>> >>>> On 4 Apr 2024, at 13:06, Dumitru Ceara wrote: >>>> >>>> On 4/3/24 22:05, Vladislav Odintsov wrote: >>>>> re-sending email because ovs list rejected previous its content for some >>>>> reason: >>>>> >>>>> Hi Ihar, >>>>> >>>> >>>> Hi Vladislav, Ihar, >>>> >>>>> thanks for your quick reaction! >>>>> I didn’t see mentioned thread, but I think that it is not safe enough to >>>>> have automatic detection of this scenario here. >>>>> >>>>> Imagine: for VXLAN with HW VTEP scenario besides VXLAN encap one must >>>>> configure also either GENEVE and/or STT encap(s) for HV chassis. >>>>> >>>>> So, detection could be implemented like this: >>>>> Check all non-VTEP chassis' encaps and find "effective encap" for each of >>>>> them. If we detect at least one chassis with "effective encap" == vxlan, >>>>> then enable vxlan mode. Normal mode otherwise. >>>>> "effective encap" means that for 'vxlan,geneve,stt' encaps effective is >>>>> geneve, for 'vxlan,stt' -> stt, for 'vxlan' -> vxlan. >>>>> Such behavior was my first idea. >>>>> >>>>> But I decided that there possible flapping of modes if there is a >>>>> problem/bug in deployment tooling and it is enough to have only one >>>>> chassis with wrong encap set to affect vxlan mode for entire OVN cluster. >>>>> Such mode flapping can result in problems with tunnel ids allocation. >>>> >>>> These are valid points. >>>> >>>>> So it seems that to have an option that statically sets vxlan mode is >>>>> more resilient. >>>> >>>> In general we try to avoid new config knobs. >>>> . >>>>> What do you think? >>>>> >>>> >>>> But in this case it make actually be easier if we offload the work of >>>> determining vxlan-mode to the CMS. >>>> >>>>> >>>>>> On 3 Apr 2024, at 20:43, Ihar Hrachyshka wrote: >>>>>> >>>>>> Thank you Vladislav. >>>>>> >>>>>> FYI it was reported in the past in >>>>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2022-July/051931.html >>>>>> but fell through cracks then. Thanks for picking it up! >>>>>> >>>>>> In your patch, you introduce a new config option to disable the >>>>>> 'vxlan-mode' behavior. This will definitely work. But I wonder if we can >>>>>> automatically detect this scenario by ignoring the chassis that are VTEP >>>>>> from consideration? I believe ovn-controller-vtep sets `is-vtep` in >>>>>> other_config, so - would it work if we modify is_vxlan_mode to consider >>>>>> it too? >>>>>> >>>>>> Thanks again for looking into this. >>>>>> Ihar >>>>>> >>>>>> On Wed, Apr 3, 2024 at 6:34 AM Vladislav Odintsov >>>>> <mailto:odiv...@gmail.com>> wrote: >>>>>>> Commit [1] introduced a "vxlan mode" concept. It brought a limitation >>>>>>> for available tunnel IDs because of lack of space in VXLAN VNI. >>>>>>> In vxlan mode OVN is limited by 4095 datapaths (LRs or non-transit LSs) >>>>>>> and 2047 logical switch ports per datapath. >>>>>>> >>>>>>> Prior to this patch vxlan mode was enabled automatically if at least one >>>>>>> chassis had encap of vxlan type. In scenarios where one want to use >>>>>>> VXLAN >>>>>>> only for HW VTEP (RAMP) switch, such limitation makes no sence. >>>>>>> >>>>>>> This patch
[ovs-dev] [PATCH ovn v2] northd: Add support for disabling vxlan mode.
Commit [1] introduced a "vxlan mode" concept. It brought a limitation for available tunnel IDs because of lack of space in VXLAN VNI. In vxlan mode OVN is limited by 4095 datapaths (LRs or non-transit LSs) and 2047 logical switch ports per datapath. Prior to this patch vxlan mode was enabled automatically if at least one chassis had encap of vxlan type. In scenarios where one want to use VXLAN only for HW VTEP (RAMP) switch, such limitation makes no sence. This patch adds support for explicit disabling of vxlan mode via Northbound database. 1: https://github.com/ovn-org/ovn/commit/b07f1bc3d068 CC: Ihar Hrachyshka Fixes: b07f1bc3d068 ("Add VXLAN support for non-VTEP datapath bindings") Signed-off-by: Vladislav Odintsov --- NEWS | 3 ++ northd/en-global-config.c | 9 +++- northd/northd.c | 90 ++- northd/northd.h | 6 ++- ovn-nb.xml| 12 ++ ovs | 2 +- tests/ovn-northd.at | 29 + 7 files changed, 98 insertions(+), 53 deletions(-) diff --git a/NEWS b/NEWS index 141f1831c..fe95391cd 100644 --- a/NEWS +++ b/NEWS @@ -13,6 +13,9 @@ Post v24.03.0 "lflow-stage-to-oftable STAGE_NAME" that converts stage name into OpenFlow table id. - Rename the ovs-sandbox script to ovn-sandbox. + - Added new global config option NB_Global:options:disable_vxlan_mode to +extend available tunnel IDs space for datapaths from 4095 to 16711680. +For more details see man ovn-nb for mentioned option. OVN v24.03.0 - 01 Mar 2024 -- diff --git a/northd/en-global-config.c b/northd/en-global-config.c index 34e393b33..9310c4575 100644 --- a/northd/en-global-config.c +++ b/northd/en-global-config.c @@ -115,8 +115,8 @@ en_global_config_run(struct engine_node *node , void *data) config_data->svc_monitor_mac); } -char *max_tunid = xasprintf("%d", -get_ovn_max_dp_key_local(sbrec_chassis_table)); +init_vxlan_mode(&nb->options, sbrec_chassis_table); +char *max_tunid = xasprintf("%d", get_ovn_max_dp_key_local()); smap_replace(options, "max_tunid", max_tunid); free(max_tunid); @@ -523,6 +523,11 @@ check_nb_options_out_of_sync(const struct nbrec_nb_global *nb, return true; } +if (config_out_of_sync(&nb->options, &config_data->nb_options, + "disable_vxlan_mode", false)) { +return true; +} + return false; } diff --git a/northd/northd.c b/northd/northd.c index c568f6360..859b233e8 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -90,6 +90,10 @@ static bool use_ct_inv_match = true; */ static bool default_acl_drop; +/* If this option is 'true' northd will use limited 24-bit space for datapath + * and ports tunnel key allocation (12 bits for each instead of default 16). */ +static bool vxlan_mode; + #define MAX_OVN_TAGS 4096 @@ -875,24 +879,31 @@ join_datapaths(const struct nbrec_logical_switch_table *nbrec_ls_table, } } -static bool -is_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) +void +init_vxlan_mode(const struct smap *nb_options, +const struct sbrec_chassis_table *sbrec_chassis_table) { +if (smap_get_bool(nb_options, "disable_vxlan_mode", false)) { +vxlan_mode = false; +return; +} + const struct sbrec_chassis *chassis; SBREC_CHASSIS_TABLE_FOR_EACH (chassis, sbrec_chassis_table) { for (int i = 0; i < chassis->n_encaps; i++) { if (!strcmp(chassis->encaps[i]->type, "vxlan")) { -return true; +vxlan_mode = true; +return; } } } -return false; +vxlan_mode = false; } uint32_t -get_ovn_max_dp_key_local(const struct sbrec_chassis_table *sbrec_chassis_table) +get_ovn_max_dp_key_local(void) { -if (is_vxlan_mode(sbrec_chassis_table)) { +if (vxlan_mode) { /* OVN_MAX_DP_GLOBAL_NUM doesn't apply for vxlan mode. */ return OVN_MAX_DP_VXLAN_KEY; } @@ -900,15 +911,14 @@ get_ovn_max_dp_key_local(const struct sbrec_chassis_table *sbrec_chassis_table) } static void -ovn_datapath_allocate_key(const struct sbrec_chassis_table *sbrec_ch_table, - struct hmap *datapaths, struct hmap *dp_tnlids, +ovn_datapath_allocate_key(struct hmap *datapaths, struct hmap *dp_tnlids, struct ovn_datapath *od, uint32_t *hint) { if (!od->tunnel_key) { od->tunnel_key = ovn_allocate_tnlid(dp_tnlids, "datapath", -OVN_MIN_DP_KEY_LOCAL, -get_ovn_max_dp_key_local(sbrec_ch_table), -hint); +
Re: [ovs-dev] [PATCH ovn] northd: fix infinite loop in ovn_allocate_tnlid()
Thanks Dumitru! I’m totally fine with your change. Should I send backport patches with resolved conflicts for remaining branches at least till 22.03, which is an LTS? > On 4 Apr 2024, at 18:26, Dumitru Ceara wrote: > > On 4/1/24 16:27, Mark Michelson wrote: >> Thanks Vladislav, >> >> Acked-by: Mark Michelson mailto:mmich...@redhat.com>> >> > > Thanks, Vladislav and Mark! Applied to main and backported down to > 23.06 with a minor test change, please see below. > >> On 4/1/24 08:15, Vladislav Odintsov wrote: >>> In case if all tunnel ids are exhausted, ovn_allocate_tnlid() function >>> iterates over tnlids indefinitely when *hint is outside of [min, max]. >>> This is because when tnlid reaches max, next tnlid is min and for-loop >>> never reaches exit condition for tnlid != *hint. >>> >>> This patch fixes mentioned issue and adds a testcase. >>> >>> Signed-off-by: Vladislav Odintsov >>> --- >>> lib/ovn-util.c | 10 +++--- >>> tests/ovn-northd.at | 26 ++ >>> 2 files changed, 33 insertions(+), 3 deletions(-) >>> >>> diff --git a/lib/ovn-util.c b/lib/ovn-util.c >>> index ee5cbcdc3..9f97ae2ca 100644 >>> --- a/lib/ovn-util.c >>> +++ b/lib/ovn-util.c >>> @@ -693,13 +693,17 @@ uint32_t >>> ovn_allocate_tnlid(struct hmap *set, const char *name, uint32_t min, >>> uint32_t max, uint32_t *hint) >>> { >>> -for (uint32_t tnlid = next_tnlid(*hint, min, max); tnlid != *hint; >>> - tnlid = next_tnlid(tnlid, min, max)) { >>> +/* Normalize hint, because it can be outside of [min, max]. */ >>> +*hint = next_tnlid(*hint, min, max); >>> + >>> +uint32_t tnlid = *hint; >>> +do { >>> if (ovn_add_tnlid(set, tnlid)) { >>> *hint = tnlid; >>> return tnlid; >>> } >>> -} >>> +tnlid = next_tnlid(tnlid, min, max); >>> +} while (tnlid != *hint); >>> static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); >>> VLOG_WARN_RL(&rl, "all %s tunnel ids exhausted", name); >>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >>> index cd53755b2..174dbacda 100644 >>> --- a/tests/ovn-northd.at >>> +++ b/tests/ovn-northd.at >>> @@ -2822,6 +2822,32 @@ AT_CHECK([test $lsp02 = 3 && test $ls1 = 123]) >>> AT_CLEANUP >>> ]) >>> +OVN_FOR_EACH_NORTHD_NO_HV([ >>> +AT_SETUP([check tunnel ids exhaustion]) >>> +ovn_start >>> + >>> +# Create a fake chassis with vxlan encap to lower MAX DP tunnel key >>> to 2^12 >>> +ovn-sbctl \ >>> +--id=@e create encap chassis_name=hv1 ip="192.168.0.1" >>> type="vxlan" \ >>> +-- --id=@c create chassis name=hv1 encaps=@e >>> + >>> +cmd="ovn-nbctl --wait=sb" >>> + >>> +for i in {1..4097..1}; do > > This can be changed to: > > for i in {1..4097}; do > >>> +cmd="${cmd} -- ls-add lsw-${i}" >>> +done >>> + >>> +eval $cmd >>> + >>> +check_row_count nb:Logical_Switch 4097 >>> +wait_row_count sb:Datapath_Binding 4095 >>> + >>> +OVS_WAIT_UNTIL([grep "all datapath tunnel ids exhausted" >>> northd/ovn-northd.log]) >>> + >>> +AT_CLEANUP >>> +]) >>> + >>> + >>> OVN_FOR_EACH_NORTHD_NO_HV([ >>> AT_SETUP([Logical Flow Datapath Groups]) >>> ovn_start > > Regards, > Dumitru > > ___ > dev mailing list > d...@openvswitch.org <mailto:d...@openvswitch.org> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev Regards, Vladislav Odintsov ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] northd: Add support for disabling vxlan mode.
*Patch [1] is https://patchwork.ozlabs.org/project/ovn/patch/20240401121510.758326-1-odiv...@gmail.com/ > On 4 Apr 2024, at 15:33, Vladislav Odintsov wrote: > > Hi Dumitru, > > thanks for your attention on this! > >> On 4 Apr 2024, at 13:06, Dumitru Ceara wrote: >> >> On 4/3/24 22:05, Vladislav Odintsov wrote: >>> re-sending email because ovs list rejected previous its content for some >>> reason: >>> >>> Hi Ihar, >>> >> >> Hi Vladislav, Ihar, >> >>> thanks for your quick reaction! >>> I didn’t see mentioned thread, but I think that it is not safe enough to >>> have automatic detection of this scenario here. >>> >>> Imagine: for VXLAN with HW VTEP scenario besides VXLAN encap one must >>> configure also either GENEVE and/or STT encap(s) for HV chassis. >>> >>> So, detection could be implemented like this: >>> Check all non-VTEP chassis' encaps and find "effective encap" for each of >>> them. If we detect at least one chassis with "effective encap" == vxlan, >>> then enable vxlan mode. Normal mode otherwise. >>> "effective encap" means that for 'vxlan,geneve,stt' encaps effective is >>> geneve, for 'vxlan,stt' -> stt, for 'vxlan' -> vxlan. >>> Such behavior was my first idea. >>> >>> But I decided that there possible flapping of modes if there is a >>> problem/bug in deployment tooling and it is enough to have only one chassis >>> with wrong encap set to affect vxlan mode for entire OVN cluster. Such mode >>> flapping can result in problems with tunnel ids allocation. >> >> These are valid points. >> >>> So it seems that to have an option that statically sets vxlan mode is more >>> resilient. >> >> In general we try to avoid new config knobs. >> . >>> What do you think? >>> >> >> But in this case it make actually be easier if we offload the work of >> determining vxlan-mode to the CMS. >> >>> >>>> On 3 Apr 2024, at 20:43, Ihar Hrachyshka wrote: >>>> >>>> Thank you Vladislav. >>>> >>>> FYI it was reported in the past in >>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2022-July/051931.html >>>> but fell through cracks then. Thanks for picking it up! >>>> >>>> In your patch, you introduce a new config option to disable the >>>> 'vxlan-mode' behavior. This will definitely work. But I wonder if we can >>>> automatically detect this scenario by ignoring the chassis that are VTEP >>>> from consideration? I believe ovn-controller-vtep sets `is-vtep` in >>>> other_config, so - would it work if we modify is_vxlan_mode to consider it >>>> too? >>>> >>>> Thanks again for looking into this. >>>> Ihar >>>> >>>> On Wed, Apr 3, 2024 at 6:34 AM Vladislav Odintsov >>> <mailto:odiv...@gmail.com>> wrote: >>>>> Commit [1] introduced a "vxlan mode" concept. It brought a limitation >>>>> for available tunnel IDs because of lack of space in VXLAN VNI. >>>>> In vxlan mode OVN is limited by 4095 datapaths (LRs or non-transit LSs) >>>>> and 2047 logical switch ports per datapath. >>>>> >>>>> Prior to this patch vxlan mode was enabled automatically if at least one >>>>> chassis had encap of vxlan type. In scenarios where one want to use VXLAN >>>>> only for HW VTEP (RAMP) switch, such limitation makes no sence. >>>>> >>>>> This patch adds support for explicit disabling of vxlan mode via >>>>> Northbound database. >>>>> >>>>> 0: https://github.com/ovn-org/ovn/commit/b07f1bc3d068 >>>>> >>>>> CC: Ihar Hrachyshka mailto:ihrac...@redhat.com>> >>>>> Fixes: b07f1bc3d068 ("Add VXLAN support for non-VTEP datapath bindings") >>>>> Signed-off-by: Vladislav Odintsov >>>> <mailto:odiv...@gmail.com>> >>>>> --- >>>>> northd/en-global-config.c | 9 +++- >>>>> northd/northd.c | 90 ++- >>>>> northd/northd.h | 6 ++- >>>>> ovn-nb.xml| 12 ++ >>>>> tests/ovn-northd.at <http://ovn-northd.at/> | 29 + >&g
Re: [ovs-dev] [PATCH ovn] northd: Add support for disabling vxlan mode.
Hi Dumitru, thanks for your attention on this! > On 4 Apr 2024, at 13:06, Dumitru Ceara wrote: > > On 4/3/24 22:05, Vladislav Odintsov wrote: >> re-sending email because ovs list rejected previous its content for some >> reason: >> >> Hi Ihar, >> > > Hi Vladislav, Ihar, > >> thanks for your quick reaction! >> I didn’t see mentioned thread, but I think that it is not safe enough to >> have automatic detection of this scenario here. >> >> Imagine: for VXLAN with HW VTEP scenario besides VXLAN encap one must >> configure also either GENEVE and/or STT encap(s) for HV chassis. >> >> So, detection could be implemented like this: >> Check all non-VTEP chassis' encaps and find "effective encap" for each of >> them. If we detect at least one chassis with "effective encap" == vxlan, >> then enable vxlan mode. Normal mode otherwise. >> "effective encap" means that for 'vxlan,geneve,stt' encaps effective is >> geneve, for 'vxlan,stt' -> stt, for 'vxlan' -> vxlan. >> Such behavior was my first idea. >> >> But I decided that there possible flapping of modes if there is a >> problem/bug in deployment tooling and it is enough to have only one chassis >> with wrong encap set to affect vxlan mode for entire OVN cluster. Such mode >> flapping can result in problems with tunnel ids allocation. > > These are valid points. > >> So it seems that to have an option that statically sets vxlan mode is more >> resilient. > > In general we try to avoid new config knobs. > . >> What do you think? >> > > But in this case it make actually be easier if we offload the work of > determining vxlan-mode to the CMS. > >> >>> On 3 Apr 2024, at 20:43, Ihar Hrachyshka wrote: >>> >>> Thank you Vladislav. >>> >>> FYI it was reported in the past in >>> https://mail.openvswitch.org/pipermail/ovs-discuss/2022-July/051931.html >>> but fell through cracks then. Thanks for picking it up! >>> >>> In your patch, you introduce a new config option to disable the >>> 'vxlan-mode' behavior. This will definitely work. But I wonder if we can >>> automatically detect this scenario by ignoring the chassis that are VTEP >>> from consideration? I believe ovn-controller-vtep sets `is-vtep` in >>> other_config, so - would it work if we modify is_vxlan_mode to consider it >>> too? >>> >>> Thanks again for looking into this. >>> Ihar >>> >>> On Wed, Apr 3, 2024 at 6:34 AM Vladislav Odintsov >> <mailto:odiv...@gmail.com>> wrote: >>>> Commit [1] introduced a "vxlan mode" concept. It brought a limitation >>>> for available tunnel IDs because of lack of space in VXLAN VNI. >>>> In vxlan mode OVN is limited by 4095 datapaths (LRs or non-transit LSs) >>>> and 2047 logical switch ports per datapath. >>>> >>>> Prior to this patch vxlan mode was enabled automatically if at least one >>>> chassis had encap of vxlan type. In scenarios where one want to use VXLAN >>>> only for HW VTEP (RAMP) switch, such limitation makes no sence. >>>> >>>> This patch adds support for explicit disabling of vxlan mode via >>>> Northbound database. >>>> >>>> 0: https://github.com/ovn-org/ovn/commit/b07f1bc3d068 >>>> >>>> CC: Ihar Hrachyshka mailto:ihrac...@redhat.com>> >>>> Fixes: b07f1bc3d068 ("Add VXLAN support for non-VTEP datapath bindings") >>>> Signed-off-by: Vladislav Odintsov >>> <mailto:odiv...@gmail.com>> >>>> --- >>>> northd/en-global-config.c | 9 +++- >>>> northd/northd.c | 90 ++- >>>> northd/northd.h | 6 ++- >>>> ovn-nb.xml| 12 ++ >>>> tests/ovn-northd.at <http://ovn-northd.at/> | 29 + >>>> 5 files changed, 94 insertions(+), 52 deletions(-) >>>> >>>> diff --git a/northd/en-global-config.c b/northd/en-global-config.c >>>> index 34e393b33..9310c4575 100644 >>>> --- a/northd/en-global-config.c >>>> +++ b/northd/en-global-config.c >>>> @@ -115,8 +115,8 @@ en_global_config_run(struct engine_node *node , void >>>> *data) >>>> config_data->svc_monitor_mac); >>>> } >>>> >>>> -cha
Re: [ovs-dev] [PATCH ovn] northd: Add support for disabling vxlan mode.
re-sending email because ovs list rejected previous its content for some reason: Hi Ihar, thanks for your quick reaction! I didn’t see mentioned thread, but I think that it is not safe enough to have automatic detection of this scenario here. Imagine: for VXLAN with HW VTEP scenario besides VXLAN encap one must configure also either GENEVE and/or STT encap(s) for HV chassis. So, detection could be implemented like this: Check all non-VTEP chassis' encaps and find "effective encap" for each of them. If we detect at least one chassis with "effective encap" == vxlan, then enable vxlan mode. Normal mode otherwise. "effective encap" means that for 'vxlan,geneve,stt' encaps effective is geneve, for 'vxlan,stt' -> stt, for 'vxlan' -> vxlan. Such behavior was my first idea. But I decided that there possible flapping of modes if there is a problem/bug in deployment tooling and it is enough to have only one chassis with wrong encap set to affect vxlan mode for entire OVN cluster. Such mode flapping can result in problems with tunnel ids allocation. So it seems that to have an option that statically sets vxlan mode is more resilient. What do you think? > On 3 Apr 2024, at 20:43, Ihar Hrachyshka wrote: > > Thank you Vladislav. > > FYI it was reported in the past in > https://mail.openvswitch.org/pipermail/ovs-discuss/2022-July/051931.html but > fell through cracks then. Thanks for picking it up! > > In your patch, you introduce a new config option to disable the 'vxlan-mode' > behavior. This will definitely work. But I wonder if we can automatically > detect this scenario by ignoring the chassis that are VTEP from > consideration? I believe ovn-controller-vtep sets `is-vtep` in other_config, > so - would it work if we modify is_vxlan_mode to consider it too? > > Thanks again for looking into this. > Ihar > > On Wed, Apr 3, 2024 at 6:34 AM Vladislav Odintsov <mailto:odiv...@gmail.com>> wrote: >> Commit [1] introduced a "vxlan mode" concept. It brought a limitation >> for available tunnel IDs because of lack of space in VXLAN VNI. >> In vxlan mode OVN is limited by 4095 datapaths (LRs or non-transit LSs) >> and 2047 logical switch ports per datapath. >> >> Prior to this patch vxlan mode was enabled automatically if at least one >> chassis had encap of vxlan type. In scenarios where one want to use VXLAN >> only for HW VTEP (RAMP) switch, such limitation makes no sence. >> >> This patch adds support for explicit disabling of vxlan mode via >> Northbound database. >> >> 0: https://github.com/ovn-org/ovn/commit/b07f1bc3d068 >> >> CC: Ihar Hrachyshka mailto:ihrac...@redhat.com>> >> Fixes: b07f1bc3d068 ("Add VXLAN support for non-VTEP datapath bindings") >> Signed-off-by: Vladislav Odintsov > <mailto:odiv...@gmail.com>> >> --- >> northd/en-global-config.c | 9 +++- >> northd/northd.c | 90 ++- >> northd/northd.h | 6 ++- >> ovn-nb.xml| 12 ++ >> tests/ovn-northd.at <http://ovn-northd.at/> | 29 + >> 5 files changed, 94 insertions(+), 52 deletions(-) >> >> diff --git a/northd/en-global-config.c b/northd/en-global-config.c >> index 34e393b33..9310c4575 100644 >> --- a/northd/en-global-config.c >> +++ b/northd/en-global-config.c >> @@ -115,8 +115,8 @@ en_global_config_run(struct engine_node *node , void >> *data) >> config_data->svc_monitor_mac); >> } >> >> -char *max_tunid = xasprintf("%d", >> -get_ovn_max_dp_key_local(sbrec_chassis_table)); >> +init_vxlan_mode(&nb->options, sbrec_chassis_table); >> +char *max_tunid = xasprintf("%d", get_ovn_max_dp_key_local()); >> smap_replace(options, "max_tunid", max_tunid); >> free(max_tunid); >> >> @@ -523,6 +523,11 @@ check_nb_options_out_of_sync(const struct >> nbrec_nb_global *nb, >> return true; >> } >> >> +if (config_out_of_sync(&nb->options, &config_data->nb_options, >> + "disable_vxlan_mode", false)) { >> +return true; >> +} >> + >> return false; >> } >> >> diff --git a/northd/northd.c b/northd/northd.c >> index c568f6360..859b233e8 100644 >> --- a/northd/northd.c >> +++ b/northd/northd.c >> @@ -90,6 +90,10 @@ static bool use_ct_inv_match = true; >> */ >> static bool default_acl_drop; >> >>
Re: [ovs-dev] [PATCH ovn] northd: Add support for disabling vxlan mode.
The failed new testcase assumes that patch [1] is applied. Should I resend them both as a single patchset? 1: https://patchwork.ozlabs.org/project/ovn/patch/20240401121510.758326-1-odiv...@gmail.com/ > On 3 Apr 2024, at 13:34, Vladislav Odintsov wrote: > > Commit [1] introduced a "vxlan mode" concept. It brought a limitation > for available tunnel IDs because of lack of space in VXLAN VNI. > In vxlan mode OVN is limited by 4095 datapaths (LRs or non-transit LSs) > and 2047 logical switch ports per datapath. > > Prior to this patch vxlan mode was enabled automatically if at least one > chassis had encap of vxlan type. In scenarios where one want to use VXLAN > only for HW VTEP (RAMP) switch, such limitation makes no sence. > > This patch adds support for explicit disabling of vxlan mode via > Northbound database. > > 0: https://github.com/ovn-org/ovn/commit/b07f1bc3d068 > > CC: Ihar Hrachyshka > Fixes: b07f1bc3d068 ("Add VXLAN support for non-VTEP datapath bindings") > Signed-off-by: Vladislav Odintsov > --- > northd/en-global-config.c | 9 +++- > northd/northd.c | 90 ++- > northd/northd.h | 6 ++- > ovn-nb.xml| 12 ++ > tests/ovn-northd.at | 29 + > 5 files changed, 94 insertions(+), 52 deletions(-) > > diff --git a/northd/en-global-config.c b/northd/en-global-config.c > index 34e393b33..9310c4575 100644 > --- a/northd/en-global-config.c > +++ b/northd/en-global-config.c > @@ -115,8 +115,8 @@ en_global_config_run(struct engine_node *node , void > *data) > config_data->svc_monitor_mac); > } > > -char *max_tunid = xasprintf("%d", > -get_ovn_max_dp_key_local(sbrec_chassis_table)); > +init_vxlan_mode(&nb->options, sbrec_chassis_table); > +char *max_tunid = xasprintf("%d", get_ovn_max_dp_key_local()); > smap_replace(options, "max_tunid", max_tunid); > free(max_tunid); > > @@ -523,6 +523,11 @@ check_nb_options_out_of_sync(const struct > nbrec_nb_global *nb, > return true; > } > > +if (config_out_of_sync(&nb->options, &config_data->nb_options, > + "disable_vxlan_mode", false)) { > +return true; > +} > + > return false; > } > > diff --git a/northd/northd.c b/northd/northd.c > index c568f6360..859b233e8 100644 > --- a/northd/northd.c > +++ b/northd/northd.c > @@ -90,6 +90,10 @@ static bool use_ct_inv_match = true; > */ > static bool default_acl_drop; > > +/* If this option is 'true' northd will use limited 24-bit space for datapath > + * and ports tunnel key allocation (12 bits for each instead of default 16). > */ > +static bool vxlan_mode; > + > #define MAX_OVN_TAGS 4096 > > > @@ -875,24 +879,31 @@ join_datapaths(const struct nbrec_logical_switch_table > *nbrec_ls_table, > } > } > > -static bool > -is_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) > +void > +init_vxlan_mode(const struct smap *nb_options, > +const struct sbrec_chassis_table *sbrec_chassis_table) > { > +if (smap_get_bool(nb_options, "disable_vxlan_mode", false)) { > +vxlan_mode = false; > +return; > +} > + > const struct sbrec_chassis *chassis; > SBREC_CHASSIS_TABLE_FOR_EACH (chassis, sbrec_chassis_table) { > for (int i = 0; i < chassis->n_encaps; i++) { > if (!strcmp(chassis->encaps[i]->type, "vxlan")) { > -return true; > +vxlan_mode = true; > +return; > } > } > } > -return false; > +vxlan_mode = false; > } > > uint32_t > -get_ovn_max_dp_key_local(const struct sbrec_chassis_table > *sbrec_chassis_table) > +get_ovn_max_dp_key_local(void) > { > -if (is_vxlan_mode(sbrec_chassis_table)) { > +if (vxlan_mode) { > /* OVN_MAX_DP_GLOBAL_NUM doesn't apply for vxlan mode. */ > return OVN_MAX_DP_VXLAN_KEY; > } > @@ -900,15 +911,14 @@ get_ovn_max_dp_key_local(const struct > sbrec_chassis_table *sbrec_chassis_table) > } > > static void > -ovn_datapath_allocate_key(const struct sbrec_chassis_table *sbrec_ch_table, > - struct hmap *datapaths, struct hmap *dp_tnlids, > +ovn_datapath_allocate_key(struct hmap *datapaths, struct hmap *dp_tnlids, > struct ovn_datapath *od, uint32_t *hint) > { > if (!od->tunnel_key) { > od->tunnel_key = ovn_allocate_tnlid(dp_tnlids, &quo
[ovs-dev] [PATCH ovn] northd: Add support for disabling vxlan mode.
Commit [1] introduced a "vxlan mode" concept. It brought a limitation for available tunnel IDs because of lack of space in VXLAN VNI. In vxlan mode OVN is limited by 4095 datapaths (LRs or non-transit LSs) and 2047 logical switch ports per datapath. Prior to this patch vxlan mode was enabled automatically if at least one chassis had encap of vxlan type. In scenarios where one want to use VXLAN only for HW VTEP (RAMP) switch, such limitation makes no sence. This patch adds support for explicit disabling of vxlan mode via Northbound database. 1: https://github.com/ovn-org/ovn/commit/b07f1bc3d068 CC: Ihar Hrachyshka Fixes: b07f1bc3d068 ("Add VXLAN support for non-VTEP datapath bindings") Signed-off-by: Vladislav Odintsov --- northd/en-global-config.c | 9 +++- northd/northd.c | 90 ++- northd/northd.h | 6 ++- ovn-nb.xml| 12 ++ tests/ovn-northd.at | 29 + 5 files changed, 94 insertions(+), 52 deletions(-) diff --git a/northd/en-global-config.c b/northd/en-global-config.c index 34e393b33..9310c4575 100644 --- a/northd/en-global-config.c +++ b/northd/en-global-config.c @@ -115,8 +115,8 @@ en_global_config_run(struct engine_node *node , void *data) config_data->svc_monitor_mac); } -char *max_tunid = xasprintf("%d", -get_ovn_max_dp_key_local(sbrec_chassis_table)); +init_vxlan_mode(&nb->options, sbrec_chassis_table); +char *max_tunid = xasprintf("%d", get_ovn_max_dp_key_local()); smap_replace(options, "max_tunid", max_tunid); free(max_tunid); @@ -523,6 +523,11 @@ check_nb_options_out_of_sync(const struct nbrec_nb_global *nb, return true; } +if (config_out_of_sync(&nb->options, &config_data->nb_options, + "disable_vxlan_mode", false)) { +return true; +} + return false; } diff --git a/northd/northd.c b/northd/northd.c index c568f6360..859b233e8 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -90,6 +90,10 @@ static bool use_ct_inv_match = true; */ static bool default_acl_drop; +/* If this option is 'true' northd will use limited 24-bit space for datapath + * and ports tunnel key allocation (12 bits for each instead of default 16). */ +static bool vxlan_mode; + #define MAX_OVN_TAGS 4096 @@ -875,24 +879,31 @@ join_datapaths(const struct nbrec_logical_switch_table *nbrec_ls_table, } } -static bool -is_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) +void +init_vxlan_mode(const struct smap *nb_options, +const struct sbrec_chassis_table *sbrec_chassis_table) { +if (smap_get_bool(nb_options, "disable_vxlan_mode", false)) { +vxlan_mode = false; +return; +} + const struct sbrec_chassis *chassis; SBREC_CHASSIS_TABLE_FOR_EACH (chassis, sbrec_chassis_table) { for (int i = 0; i < chassis->n_encaps; i++) { if (!strcmp(chassis->encaps[i]->type, "vxlan")) { -return true; +vxlan_mode = true; +return; } } } -return false; +vxlan_mode = false; } uint32_t -get_ovn_max_dp_key_local(const struct sbrec_chassis_table *sbrec_chassis_table) +get_ovn_max_dp_key_local(void) { -if (is_vxlan_mode(sbrec_chassis_table)) { +if (vxlan_mode) { /* OVN_MAX_DP_GLOBAL_NUM doesn't apply for vxlan mode. */ return OVN_MAX_DP_VXLAN_KEY; } @@ -900,15 +911,14 @@ get_ovn_max_dp_key_local(const struct sbrec_chassis_table *sbrec_chassis_table) } static void -ovn_datapath_allocate_key(const struct sbrec_chassis_table *sbrec_ch_table, - struct hmap *datapaths, struct hmap *dp_tnlids, +ovn_datapath_allocate_key(struct hmap *datapaths, struct hmap *dp_tnlids, struct ovn_datapath *od, uint32_t *hint) { if (!od->tunnel_key) { od->tunnel_key = ovn_allocate_tnlid(dp_tnlids, "datapath", -OVN_MIN_DP_KEY_LOCAL, -get_ovn_max_dp_key_local(sbrec_ch_table), -hint); +OVN_MIN_DP_KEY_LOCAL, +get_ovn_max_dp_key_local(), +hint); if (!od->tunnel_key) { if (od->sb) { sbrec_datapath_binding_delete(od->sb); @@ -921,7 +931,6 @@ ovn_datapath_allocate_key(const struct sbrec_chassis_table *sbrec_ch_table, static void ovn_datapath_assign_requested_tnl_id( -const struct sbrec_chassis_table *sbrec_chassis_table, struct hmap *dp_tnlids, struct ovn_datapath *od) { const struct smap *other_config = (od->n
[ovs-dev] [PATCH ovn] northd: Add support for disabling vxlan mode.
Commit [1] introduced a "vxlan mode" concept. It brought a limitation for available tunnel IDs because of lack of space in VXLAN VNI. In vxlan mode OVN is limited by 4095 datapaths (LRs or non-transit LSs) and 2047 logical switch ports per datapath. Prior to this patch vxlan mode was enabled automatically if at least one chassis had encap of vxlan type. In scenarios where one want to use VXLAN only for HW VTEP (RAMP) switch, such limitation makes no sence. This patch adds support for explicit disabling of vxlan mode via Northbound database. 0: https://github.com/ovn-org/ovn/commit/b07f1bc3d068 CC: Ihar Hrachyshka Fixes: b07f1bc3d068 ("Add VXLAN support for non-VTEP datapath bindings") Signed-off-by: Vladislav Odintsov --- northd/en-global-config.c | 9 +++- northd/northd.c | 90 ++- northd/northd.h | 6 ++- ovn-nb.xml| 12 ++ tests/ovn-northd.at | 29 + 5 files changed, 94 insertions(+), 52 deletions(-) diff --git a/northd/en-global-config.c b/northd/en-global-config.c index 34e393b33..9310c4575 100644 --- a/northd/en-global-config.c +++ b/northd/en-global-config.c @@ -115,8 +115,8 @@ en_global_config_run(struct engine_node *node , void *data) config_data->svc_monitor_mac); } -char *max_tunid = xasprintf("%d", -get_ovn_max_dp_key_local(sbrec_chassis_table)); +init_vxlan_mode(&nb->options, sbrec_chassis_table); +char *max_tunid = xasprintf("%d", get_ovn_max_dp_key_local()); smap_replace(options, "max_tunid", max_tunid); free(max_tunid); @@ -523,6 +523,11 @@ check_nb_options_out_of_sync(const struct nbrec_nb_global *nb, return true; } +if (config_out_of_sync(&nb->options, &config_data->nb_options, + "disable_vxlan_mode", false)) { +return true; +} + return false; } diff --git a/northd/northd.c b/northd/northd.c index c568f6360..859b233e8 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -90,6 +90,10 @@ static bool use_ct_inv_match = true; */ static bool default_acl_drop; +/* If this option is 'true' northd will use limited 24-bit space for datapath + * and ports tunnel key allocation (12 bits for each instead of default 16). */ +static bool vxlan_mode; + #define MAX_OVN_TAGS 4096 @@ -875,24 +879,31 @@ join_datapaths(const struct nbrec_logical_switch_table *nbrec_ls_table, } } -static bool -is_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) +void +init_vxlan_mode(const struct smap *nb_options, +const struct sbrec_chassis_table *sbrec_chassis_table) { +if (smap_get_bool(nb_options, "disable_vxlan_mode", false)) { +vxlan_mode = false; +return; +} + const struct sbrec_chassis *chassis; SBREC_CHASSIS_TABLE_FOR_EACH (chassis, sbrec_chassis_table) { for (int i = 0; i < chassis->n_encaps; i++) { if (!strcmp(chassis->encaps[i]->type, "vxlan")) { -return true; +vxlan_mode = true; +return; } } } -return false; +vxlan_mode = false; } uint32_t -get_ovn_max_dp_key_local(const struct sbrec_chassis_table *sbrec_chassis_table) +get_ovn_max_dp_key_local(void) { -if (is_vxlan_mode(sbrec_chassis_table)) { +if (vxlan_mode) { /* OVN_MAX_DP_GLOBAL_NUM doesn't apply for vxlan mode. */ return OVN_MAX_DP_VXLAN_KEY; } @@ -900,15 +911,14 @@ get_ovn_max_dp_key_local(const struct sbrec_chassis_table *sbrec_chassis_table) } static void -ovn_datapath_allocate_key(const struct sbrec_chassis_table *sbrec_ch_table, - struct hmap *datapaths, struct hmap *dp_tnlids, +ovn_datapath_allocate_key(struct hmap *datapaths, struct hmap *dp_tnlids, struct ovn_datapath *od, uint32_t *hint) { if (!od->tunnel_key) { od->tunnel_key = ovn_allocate_tnlid(dp_tnlids, "datapath", -OVN_MIN_DP_KEY_LOCAL, -get_ovn_max_dp_key_local(sbrec_ch_table), -hint); +OVN_MIN_DP_KEY_LOCAL, +get_ovn_max_dp_key_local(), +hint); if (!od->tunnel_key) { if (od->sb) { sbrec_datapath_binding_delete(od->sb); @@ -921,7 +931,6 @@ ovn_datapath_allocate_key(const struct sbrec_chassis_table *sbrec_ch_table, static void ovn_datapath_assign_requested_tnl_id( -const struct sbrec_chassis_table *sbrec_chassis_table, struct hmap *dp_tnlids, struct ovn_datapath *od) { const struct smap *other_config = (od->n
[ovs-dev] [PATCH ovn] northd: fix infinite loop in ovn_allocate_tnlid()
In case if all tunnel ids are exhausted, ovn_allocate_tnlid() function iterates over tnlids indefinitely when *hint is outside of [min, max]. This is because when tnlid reaches max, next tnlid is min and for-loop never reaches exit condition for tnlid != *hint. This patch fixes mentioned issue and adds a testcase. Signed-off-by: Vladislav Odintsov --- lib/ovn-util.c | 10 +++--- tests/ovn-northd.at | 26 ++ 2 files changed, 33 insertions(+), 3 deletions(-) diff --git a/lib/ovn-util.c b/lib/ovn-util.c index ee5cbcdc3..9f97ae2ca 100644 --- a/lib/ovn-util.c +++ b/lib/ovn-util.c @@ -693,13 +693,17 @@ uint32_t ovn_allocate_tnlid(struct hmap *set, const char *name, uint32_t min, uint32_t max, uint32_t *hint) { -for (uint32_t tnlid = next_tnlid(*hint, min, max); tnlid != *hint; - tnlid = next_tnlid(tnlid, min, max)) { +/* Normalize hint, because it can be outside of [min, max]. */ +*hint = next_tnlid(*hint, min, max); + +uint32_t tnlid = *hint; +do { if (ovn_add_tnlid(set, tnlid)) { *hint = tnlid; return tnlid; } -} +tnlid = next_tnlid(tnlid, min, max); +} while (tnlid != *hint); static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); VLOG_WARN_RL(&rl, "all %s tunnel ids exhausted", name); diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at index cd53755b2..174dbacda 100644 --- a/tests/ovn-northd.at +++ b/tests/ovn-northd.at @@ -2822,6 +2822,32 @@ AT_CHECK([test $lsp02 = 3 && test $ls1 = 123]) AT_CLEANUP ]) +OVN_FOR_EACH_NORTHD_NO_HV([ +AT_SETUP([check tunnel ids exhaustion]) +ovn_start + +# Create a fake chassis with vxlan encap to lower MAX DP tunnel key to 2^12 +ovn-sbctl \ +--id=@e create encap chassis_name=hv1 ip="192.168.0.1" type="vxlan" \ +-- --id=@c create chassis name=hv1 encaps=@e + +cmd="ovn-nbctl --wait=sb" + +for i in {1..4097..1}; do +cmd="${cmd} -- ls-add lsw-${i}" +done + +eval $cmd + +check_row_count nb:Logical_Switch 4097 +wait_row_count sb:Datapath_Binding 4095 + +OVS_WAIT_UNTIL([grep "all datapath tunnel ids exhausted" northd/ovn-northd.log]) + +AT_CLEANUP +]) + + OVN_FOR_EACH_NORTHD_NO_HV([ AT_SETUP([Logical Flow Datapath Groups]) ovn_start -- 2.44.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] netdev-offload-tc: Fix offload of tunnel key tp_src.
Thanks Ilya for the clarification. I didn’t know about check_logs logic. Tested-by: Vladislav Odintsov > On 4 Dec 2023, at 15:59, Ilya Maximets wrote: > > On 12/2/23 16:16, Vladislav Odintsov wrote: >> Hi Ilya, thanks for the added test! >> >> I’ve got one question about it, psb. >> >>> On 2 Dec 2023, at 02:06, Ilya Maximets wrote: >>> >>> There is no TCA_TUNNEL_KEY_ENC_SRC_PORT in the kernel, so the offload >>> should not be attempted if OVS_TUNNEL_KEY_ATTR_TP_SRC is requested >>> by OVS. Current code just ignores the attribute in the tunnel(set()) >>> action leading to a flow mismatch and potential incorrect datapath >>> behavior: >>> >>> |tc(handler21)|DBG|tc flower compare failed action compare >>> ... >>> Action 0 mismatch: >>> - Expected Action: >>> 0010 01 00 00 00 00 00 00 00-00 00 00 00 00 ff 00 11 >>> 0020 c0 5b 17 c1 00 40 00 00-0a 01 00 6d 0a 01 01 12 >>> 0050 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>> 0060 01 02 80 01 00 18 00 0b-00 00 00 00 00 00 00 00 >>> ... >>> - Received Action: >>> 0010 01 00 00 00 00 00 00 00-00 00 00 00 00 ff 00 11 >>> 0020 00 00 17 c1 00 40 00 00-0a 01 00 6d 0a 01 01 12 >>> 0050 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>> 0060 01 02 80 01 00 18 00 0b-00 00 00 00 00 00 00 00 >>> ... >>> >>> In the tc_action dump above we can see the difference on the second >>> line. The action dumped from a kernel is missing 'c0 5b' - source >>> port for a tunnel(set()) action on the second line. >>> >>> Removing the field from the tc_action_encap structure entirely to >>> avoid any potential confusion. >>> >>> Note: In general, the source port number in the tunnel(set()) action >>> is not particularly useful for most tunnels, because they may just >>> ignore the value. Specs for Geneve and VXLAN suggest using a value >>> based on the headers of the inner packet as a source port. >>> In vast majority of scenarios the source port doesn't actually end >>> up in the action itself. >>> Having a mismatch between the userspace and TC leads to constant >>> revalidation of the flow and warnings in the log. >>> >>> Adding a test case that demonstrates a scenario where the issue >>> occurs - bridging of two tunnels. >>> >>> Fixes: 8f283af89298 ("netdev-tc-offloads: Implement netdev flow put using >>> tc interface") >>> Reported-at: >>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-October/052744.html >>> Reported-by: Vladislav Odintsov >>> Signed-off-by: Ilya Maximets >>> --- >>> >>> Version 2: >>> * Slightly adjusted a commit message now that we understand the >>>scenario better. >>> * Added a test case that reproduces the issue. >>> >>> >>> lib/netdev-offload-tc.c | 4 ++- >>> lib/tc.h| 3 +- >>> tests/system-traffic.at | 77 + >>> 3 files changed, 82 insertions(+), 2 deletions(-) >>> >>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c >>> index b846a63c2..164c7eef6 100644 >>> --- a/lib/netdev-offload-tc.c >>> +++ b/lib/netdev-offload-tc.c >>> @@ -1627,7 +1627,9 @@ parse_put_flow_set_action(struct tc_flower *flower, >>> struct tc_action *action, >>> } >>> break; >>> case OVS_TUNNEL_KEY_ATTR_TP_SRC: { >>> -action->encap.tp_src = nl_attr_get_be16(tun_attr); >>> +/* There is no corresponding attribute in TC. */ >>> +VLOG_DBG_RL(&rl, "unsupported tunnel key attribute TP_SRC"); >>> +return EOPNOTSUPP; >>> } >>> break; >>> case OVS_TUNNEL_KEY_ATTR_TP_DST: { >>> diff --git a/lib/tc.h b/lib/tc.h >>> index 06707ffa4..fdbcf4b7c 100644 >>> --- a/lib/tc.h >>> +++ b/lib/tc.h >>> @@ -213,7 +213,8 @@ enum nat_type { >>> struct tc_action_encap { >>> bool id_present; >>> ovs_be64 id; >>> -ovs_be16 tp_src; >>> +/* ovs_be16 tp_src; Could have been here, but there is no >>> + * TCA_TUNNEL_KEY_ENC_ attribute for it in the kernel. */ >>> ovs_be16 tp_dst; >>> uint8_t tos; >>> uint8_t ttl; >>> diff --git a/tests/system-t
[ovs-dev] HWOL with Nvidia ConnectX-6 Dx v2: OVN NAT functionality
,dst=0.0.0.0/128.0.0.0,proto=1,frag=no), packets:200, bytes:15854, used:0.010s, actions:set(ipv4(src=X.X.X.X)),5 -- So, I’ve got some questions here: 1. Should these scenarios work with HW offload? 2. If yes, what I probably could configure wrong, so offload is only partial/not working? 3. How does conntrack in kernel interacts with conntrack module inside SmartNICs? Is there any documentation on this? As always, I’m ready to provide any additional information or do some extra checks. Thanks in advance! 0: https://mail.openvswitch.org/pipermail/ovs-discuss/2023-October/052744.html 1: https://patchwork.ozlabs.org/project/openvswitch/patch/20231201230836.3093792-1-i.maxim...@ovn.org/ 2: https://patchwork.ozlabs.org/project/openvswitch/patch/20231201210523.3085560-1-i.maxim...@ovn.org/ Regards, Vladislav Odintsov ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] tunnel: Do not carry source port from a previous tunnel.
Hi Ilya, thanks for the patch! I’ve validated it against my setup with geneve tunnels hairpin. With [0] applied, tc offloading works as expected: no warnings/errors in ovs-vswitchd.log, and only first packets of a flow reach system/CPU/are visible in tcpdump. Also, ovs-appctl dpctl/dump-flows type=offloaded outputs related flows. I’ve got some more test results, were offloading is not working (snat/dnat_and_snat usecases), but I’ll start a new thread for this discussion. Tested-by: Vladislav Odintsov > On 2 Dec 2023, at 00:04, Ilya Maximets wrote: > > If a packet is received from a UDP tunnel, it has a source port > populated in the tunnel metadata. This field cannot be read or > changed with OpenFlow or the tunnel configuration. However, while > sending this packet to a different tunnel, the value remains in > the metadata and is being sent to the datapath to use as a source > port for this new tunnel. Tunnel implementations largely ignore > this value, and it is a random value from a different tunnel > anyway. > > Clear it while sending to a different tunnel, so the unnecessary > information is not being passed to the datapath. This additionally > allows traffic from one tunnel to anther to be offloaded to TC, > as TC doesn't allow setting the source port at all. > > Signed-off-by: Ilya Maximets > --- > > Vladislav, it would be great if you can test this change on your > setup along with the previous offloading fix: > > https://patchwork.ozlabs.org/project/openvswitch/patch/20231030140031.75157-1-i.maxim...@ovn.org/ > > ofproto/tunnel.c | 1 + > tests/tunnel.at | 44 > 2 files changed, 45 insertions(+) > > diff --git a/ofproto/tunnel.c b/ofproto/tunnel.c > index 3455ed233..80ddee78a 100644 > --- a/ofproto/tunnel.c > +++ b/ofproto/tunnel.c > @@ -432,6 +432,7 @@ tnl_port_send(const struct ofport_dpif *ofport, struct > flow *flow, > flow->tunnel.ipv6_dst = in6addr_any; > } > } > +flow->tunnel.tp_src = 0; /* Do not carry from a previous tunnel. */ > flow->tunnel.tp_dst = cfg->dst_port; > if (!cfg->out_key_flow) { > flow->tunnel.tun_id = cfg->out_key; > diff --git a/tests/tunnel.at b/tests/tunnel.at > index 05613bcc3..282651ac7 100644 > --- a/tests/tunnel.at > +++ b/tests/tunnel.at > @@ -333,6 +333,50 @@ > set(tunnel(tun_id=0x5,dst=4.4.4.4,ttl=64,flags(df|key))),1 > OVS_VSWITCHD_STOP > AT_CLEANUP > > +AT_SETUP([tunnel - set_tunnel VXLAN]) > +OVS_VSWITCHD_START([dnl > +add-port br0 p1 -- set Interface p1 type=vxlan options:key=flow \ > +options:remote_ip=1.1.1.1 ofport_request=1 \ > +-- add-port br0 p2 -- set Interface p2 type=vxlan options:key=flow \ > +options:remote_ip=2.2.2.2 ofport_request=2 \ > +-- add-port br0 p3 -- set Interface p3 type=vxlan options:key=flow \ > +options:remote_ip=3.3.3.3 ofport_request=3 \ > +-- add-port br0 p4 -- set Interface p4 type=vxlan options:key=flow \ > +options:remote_ip=4.4.4.4 ofport_request=4]) > +AT_DATA([flows.txt], [dnl > +actions=set_tunnel:1,output:1,set_tunnel:2,output:2,set_tunnel:3,output:3,set_tunnel:5,output:4 > +]) > + > +OVS_VSWITCHD_DISABLE_TUNNEL_PUSH_POP > +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) > + > +AT_CHECK([ovs-appctl dpif/show | tail -n +3], [0], [dnl > +br0 65534/100: (dummy-internal) > +p1 1/4789: (vxlan: key=flow, remote_ip=1.1.1.1) > +p2 2/4789: (vxlan: key=flow, remote_ip=2.2.2.2) > +p3 3/4789: (vxlan: key=flow, remote_ip=3.3.3.3) > +p4 4/4789: (vxlan: key=flow, remote_ip=4.4.4.4) > +]) > + > +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy > 'in_port(100),eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=1,tos=0,ttl=128,frag=no),icmp(type=8,code=0)'], > [0], [stdout]) > +AT_CHECK([tail -1 stdout], [0], [Datapath actions: dnl > +set(tunnel(tun_id=0x1,dst=1.1.1.1,ttl=64,tp_dst=4789,flags(df|key))),4789,dnl > +set(tunnel(tun_id=0x2,dst=2.2.2.2,ttl=64,tp_dst=4789,flags(df|key))),4789,dnl > +set(tunnel(tun_id=0x3,dst=3.3.3.3,ttl=64,tp_dst=4789,flags(df|key))),4789,dnl > +set(tunnel(tun_id=0x5,dst=4.4.4.4,ttl=64,tp_dst=4789,flags(df|key))),4789 > +]) > + > +dnl With pre-existing tunnel metadata. > +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy > 'tunnel(tun_id=0x1,src=1.1.1.1,dst=5.5.5.5,tp_src=12345,tp_dst=4789,ttl=64,flags(key)),in_port(4789),eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=1,tos=0,ttl=128,frag=no),icmp(type=8,code=0)'], > [0], [stdout]) > +AT_CHECK([tail -1 stdout], [0], [Datapath actions: dnl > +set(tunnel(tun_id=
Re: [ovs-dev] [PATCH v2] netdev-offload-tc: Fix offload of tunnel key tp_src.
Hi Ilya, thanks for the added test! I’ve got one question about it, psb. > On 2 Dec 2023, at 02:06, Ilya Maximets wrote: > > There is no TCA_TUNNEL_KEY_ENC_SRC_PORT in the kernel, so the offload > should not be attempted if OVS_TUNNEL_KEY_ATTR_TP_SRC is requested > by OVS. Current code just ignores the attribute in the tunnel(set()) > action leading to a flow mismatch and potential incorrect datapath > behavior: > > |tc(handler21)|DBG|tc flower compare failed action compare > ... > Action 0 mismatch: > - Expected Action: > 0010 01 00 00 00 00 00 00 00-00 00 00 00 00 ff 00 11 > 0020 c0 5b 17 c1 00 40 00 00-0a 01 00 6d 0a 01 01 12 > 0050 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0060 01 02 80 01 00 18 00 0b-00 00 00 00 00 00 00 00 > ... > - Received Action: > 0010 01 00 00 00 00 00 00 00-00 00 00 00 00 ff 00 11 > 0020 00 00 17 c1 00 40 00 00-0a 01 00 6d 0a 01 01 12 > 0050 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0060 01 02 80 01 00 18 00 0b-00 00 00 00 00 00 00 00 > ... > > In the tc_action dump above we can see the difference on the second > line. The action dumped from a kernel is missing 'c0 5b' - source > port for a tunnel(set()) action on the second line. > > Removing the field from the tc_action_encap structure entirely to > avoid any potential confusion. > > Note: In general, the source port number in the tunnel(set()) action > is not particularly useful for most tunnels, because they may just > ignore the value. Specs for Geneve and VXLAN suggest using a value > based on the headers of the inner packet as a source port. > In vast majority of scenarios the source port doesn't actually end > up in the action itself. > Having a mismatch between the userspace and TC leads to constant > revalidation of the flow and warnings in the log. > > Adding a test case that demonstrates a scenario where the issue > occurs - bridging of two tunnels. > > Fixes: 8f283af89298 ("netdev-tc-offloads: Implement netdev flow put using tc > interface") > Reported-at: > https://mail.openvswitch.org/pipermail/ovs-discuss/2023-October/052744.html > Reported-by: Vladislav Odintsov > Signed-off-by: Ilya Maximets > --- > > Version 2: > * Slightly adjusted a commit message now that we understand the >scenario better. > * Added a test case that reproduces the issue. > > > lib/netdev-offload-tc.c | 4 ++- > lib/tc.h| 3 +- > tests/system-traffic.at | 77 + > 3 files changed, 82 insertions(+), 2 deletions(-) > > diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c > index b846a63c2..164c7eef6 100644 > --- a/lib/netdev-offload-tc.c > +++ b/lib/netdev-offload-tc.c > @@ -1627,7 +1627,9 @@ parse_put_flow_set_action(struct tc_flower *flower, > struct tc_action *action, > } > break; > case OVS_TUNNEL_KEY_ATTR_TP_SRC: { > -action->encap.tp_src = nl_attr_get_be16(tun_attr); > +/* There is no corresponding attribute in TC. */ > +VLOG_DBG_RL(&rl, "unsupported tunnel key attribute TP_SRC"); > +return EOPNOTSUPP; > } > break; > case OVS_TUNNEL_KEY_ATTR_TP_DST: { > diff --git a/lib/tc.h b/lib/tc.h > index 06707ffa4..fdbcf4b7c 100644 > --- a/lib/tc.h > +++ b/lib/tc.h > @@ -213,7 +213,8 @@ enum nat_type { > struct tc_action_encap { > bool id_present; > ovs_be64 id; > -ovs_be16 tp_src; > +/* ovs_be16 tp_src; Could have been here, but there is no > + * TCA_TUNNEL_KEY_ENC_ attribute for it in the kernel. */ > ovs_be16 tp_dst; > uint8_t tos; > uint8_t ttl; > diff --git a/tests/system-traffic.at b/tests/system-traffic.at > index a7d4ed83b..fd55fdee1 100644 > --- a/tests/system-traffic.at > +++ b/tests/system-traffic.at > @@ -903,6 +903,83 @@ ovs-pcap p0.pcap > AT_CHECK([ovs-pcap p0.pcap | grep -Eq > "^[[[:xdigit:]]]{24}86dd603a1140fc000100fc01[[[:xdigit:]]]{4}17c1003a[[[:xdigit:]]]{4}6558fa163e949d8008060001080006040001[[[:xdigit:]]]{12}0af40afe$"]) > AT_CLEANUP > > +AT_SETUP([datapath - bridging two geneve tunnels]) > +OVS_CHECK_TUNNEL_TSO() > +OVS_CHECK_GENEVE() > + > +OVS_TRAFFIC_VSWITCHD_START() > +ADD_BR([br-underlay-0]) > +ADD_BR([br-underlay-1]) > + > +ADD_NAMESPACES(at_ns0) > +ADD_NAMESPACES(at_ns1) > + > +dnl Set up underlay link from host into the namespaces using veth pairs. > +ADD_VETH(p0, at_ns0, br-underlay-0, "172.31.1.1/24") > +AT_CHECK([ip addr add
Re: [ovs-dev] [PATCH] netdev-offload-tc: Fix offload of tunnel key tp_src.
Hi Ilya, > On 15 Nov 2023, at 21:51, Ilya Maximets wrote: > > On 11/15/23 14:13, Vladislav Odintsov wrote: >> Hi Ilya, >> >>> On 13 Nov 2023, at 22:25, Ilya Maximets wrote: >>> >>> On 11/13/23 13:13, Eelco Chaudron wrote: >>>> >>>> >>>> On 13 Nov 2023, at 12:43, Vladislav Odintsov wrote: >>>> >>>>>> On 13 Nov 2023, at 14:17, Eelco Chaudron wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 8 Nov 2023, at 14:39, Vladislav Odintsov wrote: >>>>>> >>>>>>> Hi Ilya, Eelco, >>>>>>> >>>>>>> I’ve tried this patch against 3.1 and latest master branch. There are >>>>>>> no warnings anymore, >>>>>>> but it seems that in my installation it has broken offload capability. >>>>>> >>>>>> Yes, this is expected, this specific flow can not be offloaded with TC >>>>>> and therefore will be processed by the kernel datapath. >>>>> >>>>> But why did it work before the patch? The traffic was offloaded to it was >>>>> flowing correctly. >>>> >>>> It seemed to work, but your rule had an action to set the source port to >>>> 59507, however, this is not happening with tc. >>>> >>>> >>>> actions:set(tunnel(tun_id=0xff0011,src=10.1.0.109,dst=10.1.1.18,ttl=64,tp_src=59507,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x18000b}),flags(df|csum|key))),4 >>>> >>>> If you still want to offload this flow, you should not configure the >>>> tp_src for this action, and it will be offloaded. >>>> >>>> Ilya, is this done by OVN? If so, it might need a change there also. >>> >>> I don't think there is a direct way to force the tp_src into the action. >>> OVS is making decision based on the set of flows it has, but I'm not >>> sure why exactly. >>> >>> Vladislav, could you try running ofproto/trace for the packet of interest >>> on your setup? This may shed some light on why exactly OVS wants this >>> field to be part of the action. >> >> There were no DP flows with tp_src in action in ovs-appctl dpctl/dump-flows >> output, so I grab such flow (with tp_src set in action) from ovs-vswitchd >> logs >> with enabled dpif_netlink dbg loglevel: >> >> 2023-11-15T12:52:36.279Z|00056|dpif_netlink(handler3)|DBG|added flow >> 2023-11-15T12:52:36.279Z|00057|dpif_netlink(handler3)|DBG|system@ovs-system: >> put[create] ufid:23548c16-67c6-47af-a710-2d80cab0a361 >> recirc_id(0),dp_hash(0/0),skb_priority(0/0),tunnel(tun_id=0x6,src=10.1.0.103,dst=10.1.0.109,ttl=64/0,tp_src=3275/0,tp_dst=6081/0,geneve({class=0x102,type=0x80,len=4,0x40003}),flags(-df+csum+key)),in_port(5),skb_mark(0/0),ct_state(0/0x2f),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x3),eth(src=00:00:c9:99:bd:0e,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=169.254.100.1/0.0.0.0,tip=169.254.100.3,op=1,sha=00:00:c9:99:bd:0e/00:00:00:00:00:00,tha=00:00:00:00:00:00/00:00:00:00:00:00), >> >> actions:set(tunnel(tun_id=0xff0002,src=10.1.0.109,dst=10.1.1.18,ttl=64,tp_src=3275,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x55008e}),flags(df|csum|key))),5 >> >> and tracing: >> >> # ovs-appctl ofproto/trace >> 'recirc_id(0),dp_hash(0/0),skb_priority(0/0),tunnel(tun_id=0x6,src=10.1.0.103,dst=10.1.0.109,ttl=64/0,tp_src=3275/0,tp_dst=6081/0,geneve({class=0x102,type=0x80,len=4,0x40003}),flags(-df+csum+key)),in_port(5),skb_mark(0/0),ct_state(0/0x2f),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x3),eth(src=00:00:c9:99:bd:0e,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=169.254.100.1/0.0.0.0,tip=169.254.100.3,op=1,sha=00:00:c9:99:bd:0e/00:00:00:00:00:00,tha=00:00:00:00:00:00/00:00:00:00:00:00)' >> Flow: >> arp,tun_id=0x6,tun_src=10.1.0.103,tun_dst=10.1.0.109,tun_ipv6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=0,tun_ttl=64,tun_erspan_ver=0,gtpu_flags=0,gtpu_msgtype=0,tun_flags=csum|key,tun_metadata0=0x40003,in_port=17,vlan_tci=0x,dl_src=00:00:c9:99:bd:0e,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=169.254.100.1,arp_tpa=169.254.100.3,arp_op=1,arp_sha=00:00:c9:99:bd:0e,arp_tha=00:00:00:00:00:00 >> >> bridge("br-int") >> >> 0. in_port=17, priority 100 >> move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23] >> -> OXM_OF_METADATA[0..23] is now 0x6 >> move:NXM_NX_TUN_METADATA0[16..30]->NXM_NX_REG14[0..14] >> -> NXM_NX_REG14[0..14] is now 0x4 >> move
Re: [ovs-dev] [PATCH] netdev-offload-tc: Fix offload of tunnel key tp_src.
Hi Marcelo, PSB. > On 16 Nov 2023, at 15:02, Marcelo Ricardo Leitner wrote: > > Hi Vladislav, > > On Wed, Nov 15, 2023 at 04:13:13PM +0300, Vladislav Odintsov wrote: > ... >> Final flow: >> arp,reg11=0x2,reg12=0x6,reg14=0x4,reg15=0x2,tun_id=0x6,tun_src=10.1.0.103,tun_dst=10.1.0.109,tun_ipv6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=0,tun_ttl=64,tun_erspan_ver=0,gtpu_flags=0,gtpu_msgtype=0,tun_flags=csum|key,tun_metadata0=0x40003,metadata=0x6,in_port=17,vlan_tci=0x,dl_src=00:00:c9:99:bd:0e,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=169.254.100.1,arp_tpa=169.254.100.3,arp_op=1,arp_sha=00:00:c9:99:bd:0e,arp_tha=00:00:00:00:00:00 >> Megaflow: >> recirc_id=0,ct_state=-new-est-rel-rpl-trk,ct_label=0/0x3,eth,arp,tun_id=0x6,tun_src=10.1.0.103,tun_dst=10.1.0.109,tun_tos=0,tun_flags=-df+csum+key,tun_metadata0=0x40003,in_port=17,dl_src=00:00:c9:99:bd:0e,dl_dst=ff:ff:ff:ff:ff:ff,arp_tpa=169.254.100.3,arp_op=1 >> Datapath actions: >> set(tunnel(tun_id=0xff0002,src=10.1.0.109,dst=10.1.1.18,ttl=64,tp_src=3275,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x55008e}),flags(df|csum|key))),5 >> >> The traffic is an OVN interconnection usecase, where there is a transit >> switch >> and two LRs are connected to it on different Availability Zones. LR on AZ1 >> tries to resolve ARP for LR in another AZ. > > In the flow here I see in_port=17 and output to port 5. Can you please > confirm which ports are those? The 2 ports of the CX6, maybe? > Asking because I'm a bit surprised that hairpin traffic with such > encap operations actually works. Port 5 is a geneve tunnel OVS port. The next flow is working fine on my setup (first packet reaches system and others don’t; traffic is flowing offloaded): 2023-11-20T10:29:25.220Z|00034|dpif_netlink(handler1)|DBG|system@ovs-system: put[create] ufid:417d13c0-d5d9-4832-a155-8e83c99c1fe4 recirc_id(0),dp_hash(0/0),skb_priority(0/0),tunnel(tun_id=0x6,src=10.1.0.107,dst=10.1.0.109,ttl=64/0,tp_src=19915/0,tp_dst=6081/0,geneve({class=0x102,type=0x80,len=4,0x50003}),flags(-df+csum+key)),in_port(4),skb_mark(0/0),ct_state(0/0x2f),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x3),eth(src=00:00:c9:99:bd:0e,dst=00:01:c9:99:bd:0e),eth_type(0x0800),ipv4(src=172.31.32.6/0.0.0.0,dst=172.31.0.6/0.0.0.0,proto=1,tos=0/0x3,ttl=63/0,frag=no),icmp(type=8/0,code=0/0), actions:set(tunnel(tun_id=0xff0002,src=10.1.0.109,dst=10.1.1.18,ttl=64,tp_src=19915,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x55008e}),flags(df|csum|key))),4 (port 4 is genev_sys_6081 here). > > Thanks, > Marcelo > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev Regards, Vladislav Odintsov ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev