Also, probably it’s worth to detect on the OVN side a routing loop and not to 
create associated to affected routes logical flows to SB.
But it could be not so easy to build full map of all interconnected routes and 
find routes which create a loop...

> On 15 Jun 2023, at 18:26, Mike Pattrick <m...@redhat.com> wrote:
> 
> On Thu, Jun 15, 2023 at 11:11 AM Eelco Chaudron <echau...@redhat.com 
> <mailto:echau...@redhat.com>> wrote:
>> 
>> 
>> 
>> On 15 Jun 2023, at 17:07, Vladislav Odintsov wrote:
>> 
>>>> On 15 Jun 2023, at 16:16, Eelco Chaudron via discuss 
>>>> <ovs-discuss@openvswitch.org> wrote:
>>>> 
>>>> 
>>>> 
>>>> On 15 Jun 2023, at 14:36, Vladislav Odintsov via discuss wrote:
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I’ve faced condition in flow lookup where OVS crashes with segmentation 
>>>>> violation because of insufficient stack limit size for ovs-vswitchd 
>>>>> daemon.
>>>>> Below is the reproducer:
>>>>> 
>>>>> # --->
>>>>> # Ensure there is a default LimitSTACK in ovs-vswitchd.service file with 
>>>>> which OVS is run (should be 2M):
>>>>> grep LimitSTACK /usr/lib/systemd/system/ovs-vswitchd.service
>>>>> 
>>>>> # create 2 LRs and connect them via ls
>>>>> ovn-nbctl lr-add lr1
>>>>> ovn-nbctl lr-add lr2
>>>>> ovn-nbctl lrp-add lr1 lrp1 00:00:00:00:00:01 10.0.0.1/24
>>>>> ovn-nbctl lrp-add lr2 lrp2 00:00:00:00:00:02 10.0.0.2/24
>>>>> ovn-nbctl ls-add ls
>>>>> ovn-nbctl lsp-add ls ls-lrp1 -- lsp-set-type ls-lrp1 router -- 
>>>>> lsp-set-addresses ls-lrp1 router -- lsp-set-options ls-lrp1 
>>>>> router-port=lrp1
>>>>> ovn-nbctl lsp-add ls ls-lrp2 -- lsp-set-type ls-lrp2 router -- 
>>>>> lsp-set-addresses ls-lrp2 router -- lsp-set-options ls-lrp2 
>>>>> router-port=lrp2
>>>>> 
>>>>> # create route to same cidr looping routing
>>>>> ovn-nbctl lr-route-add lr1 1.1.1.1/32 10.0.0.2
>>>>> ovn-nbctl lr-route-add lr2 1.1.1.1/32 10.0.0.1
>>>>> 
>>>>> # create vif lport and configure it
>>>>> ovn-nbctl lsp-add ls lp1 -- lsp-set-addresses lp1 00:00:00:00:00:f1
>>>>> ovs-vsctl add-port br-int lp1 -- set int lp1 type=internal 
>>>>> external_ids:iface-id=lp1
>>>>> ip li set lp1 addr 00:00:00:00:00:f1
>>>>> ip a add 10.0.0.200/24 dev lp1
>>>>> ip li set lp1 up
>>>>> ip r add 1.1.1.1/32 via 10.0.0.1
>>>>> ping 1.1.1.1 -c1
>>>>> 
>>>>> # <---
>>>>> 
>>>>> This problem was first described in [1] and continued in [2].
>>>>> I’m wonder whether [2] was discussed somewhere in another place or had no 
>>>>> resolution.
>>>>> 
>>>>> OVS crash reproduces on different versions: 2.13, 2.17, 3.1.
>>>>> Default stack limit shipped with OVS looks not enough to reach 'Recursion 
>>>>> too deep'. In my tests for this reproducer it is needed at least 2293K to 
>>>>> work properly.
>>>>> 
>>>>> I understand, that such configuration should be validated and avoided 
>>>>> from the CMS side, but I think that there should be no possibility so 
>>>>> easily bring system to crashed state.
>>>>> 
>>>>> Should the default OVS StackLimit in systemd.unit be increased?
>>>>> Or, maybe, OVN should document the need to increase OVS stack limit 
>>>>> manually by users?
>>>>> Or, should OVN supply systemd drop-in unit to override default OVS 
>>>>> StackLimit?
>>>>> 
>>>>> 1: https://bugzilla.redhat.com/show_bug.cgi?id=1821185#c3
>>>>> 2: https://mail.openvswitch.org/pipermail/ovs-dev/2020-April/369776.html
>>>> 
>>>> There is a recent patch series sent out by Mike, you might want to give 
>>>> that a try.
>>>> 
>>>> https://patchwork.ozlabs.org/project/openvswitch/list/?submitter=82705
>>>> 
>>>> 
>>>> Would be interesting to see if that solves the crash part.
>>> 
>>> Hi Eelco,
>>> 
>>> thank you for pointing out to this patch.
>>> 
>>> I’ve tried it and it seems working with default ovs-vswitchd stack limit 
>>> (2M)! That’s cool.
>>> Also, I’ve tried to find new watermark for the described scenario (to find 
>>> a new stack limit value, where ovs-vswitchd will crash with segv).
>>> So, its value decreased from 2293KB to 1633K.
>> 
>> Thanks for testing, this is good news :)
>> 
>>> Thanks @Mike for your improvement!
>>> 
>>> Can this patch be considered for backporting after merge to upstream as it 
>>> fixes this issue?
>> 
>> Guess this is up to the maintainers to decide, maybe Mike cant tell how 
>> impactful the change is.
>> I still need to review the latest revision, so can’t comment on this from 
>> the top of my head.
> 
> I just tested applying it back to 2.15. It applied relatively cleanly
> with only minor changes required. So I think a backport is reasonable,
> but as Eelco said this is up to the maintainers.
> 
> 
> -M
> 
>> 
>> //Eelco


Regards,
Vladislav Odintsov

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to