On 15 Jun 2023, at 21:19, Vladislav Odintsov wrote:
> Also, probably it’s worth to detect on the OVN side a routing loop and not to > create associated to affected routes logical flows to SB. > But it could be not so easy to build full map of all interconnected routes > and find routes which create a loop... I’ll leave this for the OVN people on this mailing list ;) //Eelco >> On 15 Jun 2023, at 18:26, Mike Pattrick <m...@redhat.com> wrote: >> >> On Thu, Jun 15, 2023 at 11:11 AM Eelco Chaudron <echau...@redhat.com >> <mailto:echau...@redhat.com>> wrote: >>> >>> >>> >>> On 15 Jun 2023, at 17:07, Vladislav Odintsov wrote: >>> >>>>> On 15 Jun 2023, at 16:16, Eelco Chaudron via discuss >>>>> <ovs-discuss@openvswitch.org> wrote: >>>>> >>>>> >>>>> >>>>> On 15 Jun 2023, at 14:36, Vladislav Odintsov via discuss wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I’ve faced condition in flow lookup where OVS crashes with segmentation >>>>>> violation because of insufficient stack limit size for ovs-vswitchd >>>>>> daemon. >>>>>> Below is the reproducer: >>>>>> >>>>>> # ---> >>>>>> # Ensure there is a default LimitSTACK in ovs-vswitchd.service file with >>>>>> which OVS is run (should be 2M): >>>>>> grep LimitSTACK /usr/lib/systemd/system/ovs-vswitchd.service >>>>>> >>>>>> # create 2 LRs and connect them via ls >>>>>> ovn-nbctl lr-add lr1 >>>>>> ovn-nbctl lr-add lr2 >>>>>> ovn-nbctl lrp-add lr1 lrp1 00:00:00:00:00:01 10.0.0.1/24 >>>>>> ovn-nbctl lrp-add lr2 lrp2 00:00:00:00:00:02 10.0.0.2/24 >>>>>> ovn-nbctl ls-add ls >>>>>> ovn-nbctl lsp-add ls ls-lrp1 -- lsp-set-type ls-lrp1 router -- >>>>>> lsp-set-addresses ls-lrp1 router -- lsp-set-options ls-lrp1 >>>>>> router-port=lrp1 >>>>>> ovn-nbctl lsp-add ls ls-lrp2 -- lsp-set-type ls-lrp2 router -- >>>>>> lsp-set-addresses ls-lrp2 router -- lsp-set-options ls-lrp2 >>>>>> router-port=lrp2 >>>>>> >>>>>> # create route to same cidr looping routing >>>>>> ovn-nbctl lr-route-add lr1 1.1.1.1/32 10.0.0.2 >>>>>> ovn-nbctl lr-route-add lr2 1.1.1.1/32 10.0.0.1 >>>>>> >>>>>> # create vif lport and configure it >>>>>> ovn-nbctl lsp-add ls lp1 -- lsp-set-addresses lp1 00:00:00:00:00:f1 >>>>>> ovs-vsctl add-port br-int lp1 -- set int lp1 type=internal >>>>>> external_ids:iface-id=lp1 >>>>>> ip li set lp1 addr 00:00:00:00:00:f1 >>>>>> ip a add 10.0.0.200/24 dev lp1 >>>>>> ip li set lp1 up >>>>>> ip r add 1.1.1.1/32 via 10.0.0.1 >>>>>> ping 1.1.1.1 -c1 >>>>>> >>>>>> # <--- >>>>>> >>>>>> This problem was first described in [1] and continued in [2]. >>>>>> I’m wonder whether [2] was discussed somewhere in another place or had >>>>>> no resolution. >>>>>> >>>>>> OVS crash reproduces on different versions: 2.13, 2.17, 3.1. >>>>>> Default stack limit shipped with OVS looks not enough to reach >>>>>> 'Recursion too deep'. In my tests for this reproducer it is needed at >>>>>> least 2293K to work properly. >>>>>> >>>>>> I understand, that such configuration should be validated and avoided >>>>>> from the CMS side, but I think that there should be no possibility so >>>>>> easily bring system to crashed state. >>>>>> >>>>>> Should the default OVS StackLimit in systemd.unit be increased? >>>>>> Or, maybe, OVN should document the need to increase OVS stack limit >>>>>> manually by users? >>>>>> Or, should OVN supply systemd drop-in unit to override default OVS >>>>>> StackLimit? >>>>>> >>>>>> 1: https://bugzilla.redhat.com/show_bug.cgi?id=1821185#c3 >>>>>> 2: https://mail.openvswitch.org/pipermail/ovs-dev/2020-April/369776.html >>>>> >>>>> There is a recent patch series sent out by Mike, you might want to give >>>>> that a try. >>>>> >>>>> https://patchwork.ozlabs.org/project/openvswitch/list/?submitter=82705 >>>>> >>>>> >>>>> Would be interesting to see if that solves the crash part. >>>> >>>> Hi Eelco, >>>> >>>> thank you for pointing out to this patch. >>>> >>>> I’ve tried it and it seems working with default ovs-vswitchd stack limit >>>> (2M)! That’s cool. >>>> Also, I’ve tried to find new watermark for the described scenario (to find >>>> a new stack limit value, where ovs-vswitchd will crash with segv). >>>> So, its value decreased from 2293KB to 1633K. >>> >>> Thanks for testing, this is good news :) >>> >>>> Thanks @Mike for your improvement! >>>> >>>> Can this patch be considered for backporting after merge to upstream as it >>>> fixes this issue? >>> >>> Guess this is up to the maintainers to decide, maybe Mike cant tell how >>> impactful the change is. >>> I still need to review the latest revision, so can’t comment on this from >>> the top of my head. >> >> I just tested applying it back to 2.15. It applied relatively cleanly >> with only minor changes required. So I think a backport is reasonable, >> but as Eelco said this is up to the maintainers. >> >> >> -M >> >>> >>> //Eelco > > > Regards, > Vladislav Odintsov _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss