On 5/7/26 10:11 PM, Dumitru Ceara wrote:
> On 5/7/26 4:31 PM, Dumitru Ceara wrote:
>> Hi all,
>>
>> It seems that the test added by this patch of mine sometimes fails in
>> GitHub CI when ovs-vswitchd is stopped; the failure is due to:
>>
>> ./ovn.at:10193: check_logs "
>> $error
>> /connection failed (No such file or directory)/d
>> /has no network name*/d
>> /receive tunnel port not found*/d
>> /Failed to locate tunnel to reach main chassis/d
>> /Transaction causes multiple rows.*MAC_Binding/d
>> /Transaction causes multiple rows.*FDB/d
>> " $sbox
>> --- /dev/null 2026-05-06 14:47:18.250105001 +0000
>> +++ /workspace/ovn-tmp/tests/testsuite.dir/at-groups/155/stdout
>> 2026-05-06 14:54:35.770811381 +0000
>> @@ -0,0 +1,2 @@
>> +2026-05-06T14:54:35.548Z|00471|ofproto_dpif_rid|ERR|recirc_id 4 left
>> allocated when ofproto (br-int) is destructed
>> +2026-05-06T14:54:35.548Z|00472|ofproto_dpif_rid|ERR|recirc_id 2 left
>> allocated when ofproto (br-int) is destructed
>>
>> https://github.com/ovsrobot/ovn/actions/runs/25442425151/job/74637759422#step:12:5664
>>
>> I'm failing to reproduce the issue locally but I'll keep investigating.
>>
>
> +Eelco as we were discussing offline about this earlier.
>
> So, if I understand correctly, it's just a race when ovs-vswitchd exits:
>
> ofproto-dpif.c:destruct() is called, it removes all OF rules and then
> calls recirc_free_ofproto() which assumes there's no remaining recirc_id.
>
> However, close_dpif_backer() which in turn calls udpif_destroy() and
> deletes all the ukeys (including recirc_ids they might have) is only
> called at the end of destruct().
The destruction of recirculation ids is first happening in dpif_flush()
while stopping the threads and running revalidator purge, before calling
the destruct() function. But there are two races with this:
1. The flush will remove ukeys, but their actual destruction and the
freeing of recirculation node references is postponed, so if the RCU
thread is not fast enough, they will still be allocated at the time
of ofproto-dpif destruction.
2. Flush restarts the threads, so technically new ukeys can be created
after the flush and before the destruction. Though this should not
generally be happening in unit tests where all the traffic is controlled.
>
> static void
> destruct(struct ofproto *ofproto_, bool del)
> {
> ...
> ofproto->backer->need_revalidate = REV_RECONFIGURE;
> xlate_txn_start();
> xlate_remove_ofproto(ofproto);
> xlate_txn_commit();
>
> hmap_remove(&all_ofproto_dpifs_by_name,
> &ofproto->all_ofproto_dpifs_by_name_node);
>
> OFPROTO_FOR_EACH_TABLE (table, &ofproto->up) {
> CLS_FOR_EACH (rule, up.cr, &table->cls) {
> ofproto_rule_delete(&ofproto->up, &rule->up);
> }
> }
> ...
>
> recirc_free_ofproto(ofproto, ofproto->up.name);
> ...
> /* Wait for all the meter destroy work to finish. */
> ovsrcu_barrier();
> close_dpif_backer(ofproto->backer, del);
> }
>
> Is this something that's worth addressing in OVS?
There should be an RCU barrier before the leak check (recirc_free_ofproto)
to avoid the first race. But we also need to check after the final close
in order to avoid the second race. Both can be solved by moving the check
to the end of the function where we already have the barrier. Second barrier
after the backer closing doesn't seem to be necessary because udpif_destroy
doesn't postpone the ukey destruction. I can try and send a patch for this.
There is also some dead code to be removed (rule->recirc_id seem to be unused).
> I understand it's
> benign because we're exiting anyway but I am a bit surprised we don't
> hit it more often in tests (OVN tests too).
I wonder if there is something special about this test that make OVS starve
for CPU resources and not being able to run RCU callbacks fast enough. Maybe
some OVN processes taking too much CPU for one reason or another?
>
> Alternatively, would it be an option to change OVN's OVN_CLEANUP_VSWITCH
> macro to call revalidator/purge before stopping vswitchd? I.e.:
>
> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
> index 39f03ba62b..b0b22e2fe6 100644
> --- a/tests/ovn-macros.at
> +++ b/tests/ovn-macros.at
> @@ -223,6 +223,7 @@ m4_define([OVN_CLEANUP_VSWITCH],[
> echo
> echo "$1: clean up vswitch"
> as $1
> + ovs-appctl revalidator/purge
> OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
> OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> ])
>
> Thanks,
> Dumitru
>
>> Regards,
>> Dumitru
>>
>> On 5/6/26 1:49 PM, Dumitru Ceara wrote:
>>> On 5/6/26 10:33 AM, Mairtin O'Loingsigh wrote:
>>>> On Mon, May 04, 2026 at 09:05:40AM +0200, Dumitru Ceara wrote:
>>>>> Hi Mairtin,
>>>>>
>>>>> Thanks for the review!
>>>>>
>>>>> On 4/29/26 10:28 AM, Mairtin O'Loingsigh wrote:
>>>>>> On Fri, Apr 24, 2026 at 05:35:58PM +0200, Dumitru Ceara via dev wrote:
>>>>>>> The ARP/ND responder stage (ls_in_arp_rsp) unconditionally
>>>>>>> bypassed all traffic arriving from localnet ports via a
>>>>>>> priority-100 "next;" flow. This caused broadcast ARP/ND
>>>>>>> requests from the physical network to be flooded to every
>>>>>>> logical switch port instead of being handled by proxy
>>>>>>> ARP/ND. On switches with ~200+ ports the resulting
>>>>>>> multicast replication exceeded the OVS 4K resubmit limit,
>>>>>>> dropping the packets and breaking connectivity.
>>>>>>>
>>>>>>> Replace the bypass with a targeted mechanism:
>>>>>>>
>>>>>>> - In ls_in_lookup_fdb, set flags.localnet = 1 for
>>>>>>> packets arriving from localnet ports (P50 fallback;
>>>>>>> the existing P100 FDB-learning flow already sets this
>>>>>>> flag when FDB learning is enabled).
>>>>>>>
>>>>>>> - In the P50 ARP/ND reply flows, append the condition
>>>>>>> "((flags.localnet == 1 && is_chassis_resident(port))
>>>>>>> || flags.localnet == 0)" on switches that have
>>>>>>> localnet ports.
>>>>>>>
>>>>>>> This ensures that ARP/ND requests from localnet are only
>>>>>>> answered on the chassis hosting the target VIF, preventing
>>>>>>> both the flood and duplicate replies from multiple
>>>>>>> hypervisors. VIF-to-VIF proxy ARP/ND is unchanged because
>>>>>>> flags.localnet is 0 for non-localnet-sourced traffic.
>>>>>>>
>>>>>>> Fixes: f763a3273b84 ("ovn: Avoid ARP responder for packets from
>>>>>>> localnet port")
>>>>>>> Reported-at: https://redhat.atlassian.net/browse/FDP-3436
>>>>>>> Assisted-by: Claude Opus 4.6, Claude Code
>>>>>>> Signed-off-by: Dumitru Ceara <[email protected]>
>>>>>>> ---
>>>>>
>>>>> [...]
>>>>>
>>>>>>>
>>>>>>> +/* On switches with localnet ports, restrict ARP/ND replies for
>>>>>>> + * localnet-sourced requests to the chassis hosting the target VIF
>>>>>>> + * (preventing duplicate replies from every hypervisor). Non-localnet
>>>>>>> + * requests (VIF-to-VIF) are answered unconditionally as before. */
>>>>>>> +static void
>>>>>>> +build_lswitch_arp_nd_local_resp_match(struct ds *match,
>>>>>>> + const struct ovn_port *op)
>>>>>>> +{
>>>>>>> + if (!ls_has_localnet_port(op->od)) {
>>>>>>> + return;
>>>>>>> + }
>>>>>>> +
>>>>>>> + ds_put_format(match,
>>>>>>> + " && ((flags.localnet == 1 && is_chassis_resident(%s))"
>>>>>>> + " || flags.localnet == 0)", op->json_key);
>>>>>> nit: spacing
>>>>>
>>>>> I had actually done this on purpose to make it a bit more visible that "
>>>>> || flags.localnet == 0" is part of the condition in parenthesis. But I
>>>>> have no strong preference in the end. Please let me know if you still
>>>>> would like me to change it.
>>>>>
>>>>>>> +}
>>>>>>> +
>>>>>
>>>>> [...]
>>>>>
>>>>>>>
>>>>>> LGTM. Just one small nit.
>>>>>>
>>>>>> Acked-by: Mairtin O'Loingsigh <[email protected]>
>>>>>>
>>>>>
>>>>> Regards,
>>>>> Dumitru
>>>>>
>>>>
>>>> This spacing does look more readable. No need to change.
>>>>
>>>
>>> Hi Mairtin,
>>>
>>> Thanks for the confirmation! Applied to main and all stable branches
>>> down to 24.03.
>>>
>>> Regards,
>>> Dumitru
>>>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev