On 15 Jun 2023, at 21:19, Vladislav Odintsov wrote:

> Also, probably it’s worth to detect on the OVN side a routing loop and not to 
> create associated to affected routes logical flows to SB.
> But it could be not so easy to build full map of all interconnected routes 
> and find routes which create a loop...

I’ll leave this for the OVN people on this mailing list ;)

//Eelco

>> On 15 Jun 2023, at 18:26, Mike Pattrick <m...@redhat.com> wrote:
>>
>> On Thu, Jun 15, 2023 at 11:11 AM Eelco Chaudron <echau...@redhat.com 
>> <mailto:echau...@redhat.com>> wrote:
>>>
>>>
>>>
>>> On 15 Jun 2023, at 17:07, Vladislav Odintsov wrote:
>>>
>>>>> On 15 Jun 2023, at 16:16, Eelco Chaudron via discuss 
>>>>> <ovs-discuss@openvswitch.org> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 15 Jun 2023, at 14:36, Vladislav Odintsov via discuss wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I’ve faced condition in flow lookup where OVS crashes with segmentation 
>>>>>> violation because of insufficient stack limit size for ovs-vswitchd 
>>>>>> daemon.
>>>>>> Below is the reproducer:
>>>>>>
>>>>>> # --->
>>>>>> # Ensure there is a default LimitSTACK in ovs-vswitchd.service file with 
>>>>>> which OVS is run (should be 2M):
>>>>>> grep LimitSTACK /usr/lib/systemd/system/ovs-vswitchd.service
>>>>>>
>>>>>> # create 2 LRs and connect them via ls
>>>>>> ovn-nbctl lr-add lr1
>>>>>> ovn-nbctl lr-add lr2
>>>>>> ovn-nbctl lrp-add lr1 lrp1 00:00:00:00:00:01 10.0.0.1/24
>>>>>> ovn-nbctl lrp-add lr2 lrp2 00:00:00:00:00:02 10.0.0.2/24
>>>>>> ovn-nbctl ls-add ls
>>>>>> ovn-nbctl lsp-add ls ls-lrp1 -- lsp-set-type ls-lrp1 router -- 
>>>>>> lsp-set-addresses ls-lrp1 router -- lsp-set-options ls-lrp1 
>>>>>> router-port=lrp1
>>>>>> ovn-nbctl lsp-add ls ls-lrp2 -- lsp-set-type ls-lrp2 router -- 
>>>>>> lsp-set-addresses ls-lrp2 router -- lsp-set-options ls-lrp2 
>>>>>> router-port=lrp2
>>>>>>
>>>>>> # create route to same cidr looping routing
>>>>>> ovn-nbctl lr-route-add lr1 1.1.1.1/32 10.0.0.2
>>>>>> ovn-nbctl lr-route-add lr2 1.1.1.1/32 10.0.0.1
>>>>>>
>>>>>> # create vif lport and configure it
>>>>>> ovn-nbctl lsp-add ls lp1 -- lsp-set-addresses lp1 00:00:00:00:00:f1
>>>>>> ovs-vsctl add-port br-int lp1 -- set int lp1 type=internal 
>>>>>> external_ids:iface-id=lp1
>>>>>> ip li set lp1 addr 00:00:00:00:00:f1
>>>>>> ip a add 10.0.0.200/24 dev lp1
>>>>>> ip li set lp1 up
>>>>>> ip r add 1.1.1.1/32 via 10.0.0.1
>>>>>> ping 1.1.1.1 -c1
>>>>>>
>>>>>> # <---
>>>>>>
>>>>>> This problem was first described in [1] and continued in [2].
>>>>>> I’m wonder whether [2] was discussed somewhere in another place or had 
>>>>>> no resolution.
>>>>>>
>>>>>> OVS crash reproduces on different versions: 2.13, 2.17, 3.1.
>>>>>> Default stack limit shipped with OVS looks not enough to reach 
>>>>>> 'Recursion too deep'. In my tests for this reproducer it is needed at 
>>>>>> least 2293K to work properly.
>>>>>>
>>>>>> I understand, that such configuration should be validated and avoided 
>>>>>> from the CMS side, but I think that there should be no possibility so 
>>>>>> easily bring system to crashed state.
>>>>>>
>>>>>> Should the default OVS StackLimit in systemd.unit be increased?
>>>>>> Or, maybe, OVN should document the need to increase OVS stack limit 
>>>>>> manually by users?
>>>>>> Or, should OVN supply systemd drop-in unit to override default OVS 
>>>>>> StackLimit?
>>>>>>
>>>>>> 1: https://bugzilla.redhat.com/show_bug.cgi?id=1821185#c3
>>>>>> 2: https://mail.openvswitch.org/pipermail/ovs-dev/2020-April/369776.html
>>>>>
>>>>> There is a recent patch series sent out by Mike, you might want to give 
>>>>> that a try.
>>>>>
>>>>> https://patchwork.ozlabs.org/project/openvswitch/list/?submitter=82705
>>>>>
>>>>>
>>>>> Would be interesting to see if that solves the crash part.
>>>>
>>>> Hi Eelco,
>>>>
>>>> thank you for pointing out to this patch.
>>>>
>>>> I’ve tried it and it seems working with default ovs-vswitchd stack limit 
>>>> (2M)! That’s cool.
>>>> Also, I’ve tried to find new watermark for the described scenario (to find 
>>>> a new stack limit value, where ovs-vswitchd will crash with segv).
>>>> So, its value decreased from 2293KB to 1633K.
>>>
>>> Thanks for testing, this is good news :)
>>>
>>>> Thanks @Mike for your improvement!
>>>>
>>>> Can this patch be considered for backporting after merge to upstream as it 
>>>> fixes this issue?
>>>
>>> Guess this is up to the maintainers to decide, maybe Mike cant tell how 
>>> impactful the change is.
>>> I still need to review the latest revision, so can’t comment on this from 
>>> the top of my head.
>>
>> I just tested applying it back to 2.15. It applied relatively cleanly
>> with only minor changes required. So I think a backport is reasonable,
>> but as Eelco said this is up to the maintainers.
>>
>>
>> -M
>>
>>>
>>> //Eelco
>
>
> Regards,
> Vladislav Odintsov

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to