On 9/25/25 3:10 PM, Lorenzo Bianconi wrote:
>> Hi Lorenzo,
>>
>> Clearing CT in this scenario fixes the sampling issue, but I fear it
>> could break some ACLs. "allow-related" ACLs that are expected to be
>> applied in this scenario may no longer be applied since conntrack
>> state is being cleared. Would it be possible to just suppress the
>> sampling in this scenario instead of clearing CT completely?
> 
> Hi Mark,
> 

Hi Lorenzo, Mark,

> thx for the review. Sampling decision is based only on ct state info, logical
> flows does not depend on the input or output port used (e.g. ingress ACL
> sampling flows):
> 
> table=10(ls_in_acl_sample   ), priority=1000 , match=(ip && ct.trk && (ct.est 
> || ct.rel) && ct_label.obs_unused == 0 && !ct.rpl && ct_mark.obs_collector_id 
> == 1 && ct_mark.obs_stage == 0), 
> action=(sample(probability=65535,collector_set=100,obs_domain=43,obs_point=ct_label.obs_point_id);
>  next;)
> table=10(ls_in_acl_sample   ), priority=1000 , match=(ip && ct.trk && (ct.est 
> || ct.rel) && ct_label.obs_unused == 0 && ct.rpl && ct_mark.obs_collector_id 
> == 2), 
> action=(sample(probability=65535,collector_set=200,obs_domain=43,obs_point=ct_label.obs_point_id);
>  next;)
> 
> IIUC the only way to avoid sampling is to clear ct state, right? Another
> approach would be to just avoid sampling on all possible patch ports connected
> to a router but this probably will introduce new lflows.
> Do you think it is better? @Dumitru: what do you think?
> 

I'm afraid there are valid cases (e.g., no stateful ACLs) when ACLs are
correctly applied to traffic being sent out from logical switches to
logical routers (via OVN patch ports).  It would be bad if we couldn't
sample that kind of traffic.

I have a different suggestion: can't we use a bit in the per-packet
"flags" register to track whether a packet has been sampled in the
current conntrack zone?  The "ct_clear" action would have to be changed
to zero this flag.

Would that work to address the problem this patch is trying to fix?

> Moreover, the patch clears ct_state just for patch ports on distributed 
> routers.
> Is it correct to use LSP ct zone in this case or is it just an 'abuse' of
> previous behaviour?
> 

I agree with Mark on this one.  It feels weird to only change behavior
for one type of router ports.

Regards,
Dumitru

>>
>> I have a couple of other questions inline below
>>
>> On Mon, Sep 22, 2025 at 8:58 AM Lorenzo Bianconi via dev
>> <[email protected]> wrote:
>>>
>>>> Clear connection tracking state in the logical switch egress pipeline if
>>>> the packet is sent to a distributed logical router via a patch port.
>>>> This patch will fix the oversampling issue reported in
>>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2025-May/053626.html
>>>>
>>>> Reported-at: https://issues.redhat.com/browse/FDP-1408
>>>> Signed-off-by: Lorenzo Bianconi <[email protected]>
>>>
>>> Hi Oscar,
>>>
>>> any chance to test if this patch fixes the reported problem? Thanks in 
>>> advance.
>>>
>>> Regards,
>>> Lorenzo
>>>
>>>> ---
>>>>  northd/northd.c     |  16 ++++--
>>>>  tests/ovn-northd.at |   4 +-
>>>>  tests/system-ovn.at | 129 +++++++++++++++++++++++++++++++++++++++++++-
>>>>  3 files changed, 141 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>> index 8b5413ef3..2e433bfdd 100644
>>>> --- a/northd/northd.c
>>>> +++ b/northd/northd.c
>>>> @@ -5832,7 +5832,7 @@ build_lswitch_output_port_sec_od(struct ovn_datapath 
>>>> *od,
>>>>
>>>>  static void
>>>>  skip_port_from_conntrack(const struct ovn_datapath *od, struct ovn_port 
>>>> *op,
>>>> -                         bool has_stateful_acl, enum ovn_stage in_stage,
>>>> +                         bool apply_ct, enum ovn_stage in_stage,
>>>>                           enum ovn_stage out_stage, uint16_t priority,
>>>>                           struct lflow_table *lflows,
>>>>                           struct lflow_ref *lflow_ref)
>>>> @@ -5848,9 +5848,17 @@ skip_port_from_conntrack(const struct ovn_datapath 
>>>> *od, struct ovn_port *op,
>>>>       * conntrack state across all chassis. */
>>>>
>>>>      const char *ingress_action = "next;";
>>>> -    const char *egress_action = has_stateful_acl
>>>> -                                ? "next;"
>>>> -                                : "ct_clear; next;";
>>>> +    struct ovn_port *peer = op->peer;
>>>> +    if (peer && lsp_is_router(op->nbsp)) {
>>>> +        struct ovn_datapath *peer_od = peer->od;
>>>> +
>>>> +        if (peer_od &&
>>>> +            !peer_od->is_gw_router && 
>>>> vector_is_empty(&peer_od->l3dgw_ports)) {
>>>> +            /* We can clear ct for distributed routers. */
>>>> +            apply_ct = false;
>>>> +        }
>>
>> I'm a bit confused about why apply_ct remains true for gateway routers
>> and routers with DGPs. Wouldn't this lead to the same double sampling
>> issue as before? Consider a scenario with a gateway router where both
>> vms are bound to the gateway chassis. The traffic pattern is identical
>> to what is done in the system test you've added, but in this case, CT
>> is not cleared because the router is a gateway router. But even if the
>> vms are bound to a different chassis than the gateway router, then why
>> would we not clear CT in this case? Wouldn't the double sampling still
>> happen then?
>>
>> I think this reinforces the suggestion I had that instead of clearing
>> CT, we should just suppress the sampling on "router" type logical
>> switch ports, no matter what attributes the router has.
>>
>>>> +    }
>>>> +    const char *egress_action = apply_ct ? "next;" : "ct_clear; next;";
>>>>
>>>>      char *ingress_match = xasprintf("ip && inport == %s", op->json_key);
>>>>      char *egress_match = xasprintf("ip && outport == %s", op->json_key);
>>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
>>>> index c9e998129..8eb4bae1c 100644
>>>> --- a/tests/ovn-northd.at
>>>> +++ b/tests/ovn-northd.at
>>>> @@ -4819,7 +4819,7 @@ check_stateful_flows() {
>>>>    table=??(ls_out_pre_lb      ), priority=100  , match=(ip), 
>>>> action=(reg0[[2]] = 1; next;)
>>>>    table=??(ls_out_pre_lb      ), priority=110  , match=(eth.mcast), 
>>>> action=(next;)
>>>>    table=??(ls_out_pre_lb      ), priority=110  , match=(eth.src == 
>>>> \$svc_monitor_mac), action=(next;)
>>>> -  table=??(ls_out_pre_lb      ), priority=110  , match=(ip && outport == 
>>>> "sw0-lr0"), action=($action)
>>
>> The $action variable is only used in this one line of
>> check_stateful_flows(). If the variable is not going to be used
>> anymore, then you can remove it entirely and clean up the calls to
>> check_stateful_flows() so they do not pass an argument any longer.
>>
>> Depending on what is done for v2, this comment may not be relevant though.
>>
>>>> +  table=??(ls_out_pre_lb      ), priority=110  , match=(ip && outport == 
>>>> "sw0-lr0"), action=(ct_clear; next;)
>>>>    table=??(ls_out_pre_lb      ), priority=110  , match=(nd || nd_rs || 
>>>> nd_ra || mldv1 || mldv2), action=(next;)
>>>>    table=??(ls_out_pre_lb      ), priority=110  , match=(reg0[[16]] == 1), 
>>>> action=(next;)
>>>>  ])
>>>> @@ -4886,7 +4886,7 @@ AT_CHECK([grep "ls_out_pre_lb" sw0flows | 
>>>> ovn_strip_lflows], [0], [dnl
>>>>    table=??(ls_out_pre_lb      ), priority=0    , match=(1), action=(next;)
>>>>    table=??(ls_out_pre_lb      ), priority=110  , match=(eth.mcast), 
>>>> action=(next;)
>>>>    table=??(ls_out_pre_lb      ), priority=110  , match=(eth.src == 
>>>> $svc_monitor_mac), action=(next;)
>>>> -  table=??(ls_out_pre_lb      ), priority=110  , match=(ip && outport == 
>>>> "sw0-lr0"), action=(next;)
>>>> +  table=??(ls_out_pre_lb      ), priority=110  , match=(ip && outport == 
>>>> "sw0-lr0"), action=(ct_clear; next;)
>>>>    table=??(ls_out_pre_lb      ), priority=110  , match=(nd || nd_rs || 
>>>> nd_ra || mldv1 || mldv2), action=(next;)
>>>>    table=??(ls_out_pre_lb      ), priority=110  , match=(reg0[[16]] == 1), 
>>>> action=(next;)
>>>>  ])
>>>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
>>>> index 8e356df6f..3e4ff32d1 100644
>>>> --- a/tests/system-ovn.at
>>>> +++ b/tests/system-ovn.at
>>>> @@ -8542,8 +8542,8 @@ check ovn-nbctl lsp-set-addresses sw1p1 
>>>> "50:54:00:00:00:03 20.0.0.3"
>>>>
>>>>  check ovn-nbctl --apply-after-lb acl-add sw1 from-lport 1001 'inport == 
>>>> "sw1p1" && ip4' drop
>>>>
>>>> -check ovn-nbctl acl-add sw1 to-lport 1002 'ip4 && tcp && tcp.dst == 80' 
>>>> allow-related
>>>> -check ovn-nbctl acl-add sw1 to-lport 1001 'ip4' drop
>>>> +check ovn-nbctl acl-add sw1 to-lport 1002 'outport == "sw1p1" && ip4 && 
>>>> tcp && tcp.dst == 80' allow-related
>>>> +check ovn-nbctl acl-add sw1 to-lport 1001 'outport == "sw1p1" && ip4' drop
>>
>> Why are these changes necessary?
> 
> Because the test was relying on the previous ct zone assumption. Discussing
> with Dunitru we agreed this is wrong.
> @Dumitru: any input on it?
> 
> Regards,
> Lorenzo
> 
>>
>>>>
>>>>  ADD_NAMESPACES(sw0p1)
>>>>  ADD_VETH(sw0p1, sw0p1, br-int, "10.0.0.3/24", "50:54:00:00:00:02", \
>>>> @@ -13969,6 +13969,131 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query 
>>>> port patch-.*/d
>>>>  AT_CLEANUP
>>>>  ])
>>>>
>>>> +OVN_FOR_EACH_NORTHD([
>>>> +AT_SETUP([ovn -- ACL Sampling - Stateful ACL - routed traffic])
>>>> +AT_KEYWORDS([ACL])
>>>> +
>>>> +CHECK_CONNTRACK()
>>>> +CHECK_CONNTRACK_NAT()
>>>> +ovn_start
>>>> +OVS_TRAFFIC_VSWITCHD_START()
>>>> +ADD_BR([br-int])
>>>> +
>>>> +dnl Set external-ids in br-int needed for ovn-controller
>>>> +check ovs-vsctl \
>>>> +        -- set Open_vSwitch . external-ids:system-id=hv1 \
>>>> +        -- set Open_vSwitch . 
>>>> external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
>>>> +        -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
>>>> +        -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
>>>> +        -- set bridge br-int fail-mode=secure 
>>>> other-config:disable-in-band=true
>>>> +
>>>> +dnl Start ovn-controller
>>>> +start_daemon ovn-controller
>>>> +
>>>> +dnl Logical network:
>>>> +dnl 1 router
>>>> +dnl 2 logical switch
>>>> +dnl 2 VIF
>>>> +
>>>> +check ovn-nbctl                                      \
>>>> +  -- lr-add lr                                       \
>>>> +  -- lrp-add lr lrp1 00:00:00:00:01:00 42.42.42.1/24 \
>>>> +  -- lrp-add lr lrp2 00:00:00:00:02:00 43.43.43.1/24 \
>>>> +  -- ls-add ls1                                      \
>>>> +  -- ls-add ls2                                      \
>>>> +  -- lsp-add ls1 vm1                                 \
>>>> +  -- lsp-add ls2 vm2                                 \
>>>> +  -- lsp-set-addresses vm1 00:00:00:00:00:01         \
>>>> +  -- lsp-set-addresses vm2 00:00:00:00:00:02         \
>>>> +  -- lsp-add ls1 ls1-lr                              \
>>>> +  -- lsp-add ls2 ls2-lr                              \
>>>> +  -- lsp-set-type ls1-lr router                      \
>>>> +  -- lsp-set-type ls2-lr router                      \
>>>> +  -- lsp-set-options ls1-lr router-port=lrp1         \
>>>> +  -- lsp-set-options ls2-lr router-port=lrp2         \
>>>> +  -- lsp-set-addresses ls1-lr router                 \
>>>> +  -- lsp-set-addresses ls2-lr router
>>>> +check ovn-nbctl --wait=hv sync
>>>> +
>>>> +ADD_NAMESPACES(vm1)
>>>> +ADD_VETH(vm1, vm1, br-int, "42.42.42.2/24", "00:00:00:00:00:01", 
>>>> "42.42.42.1")
>>>> +ADD_NAMESPACES(vm2)
>>>> +ADD_VETH(vm2, vm2, br-int, "43.43.43.2/24", "00:00:00:00:00:02", 
>>>> "43.43.43.1")
>>>> +
>>>> +wait_for_ports_up
>>>> +
>>>> +NS_CHECK_EXEC([vm1], [ping -q -c 3 -i 0.3 -w 2 43.43.43.2 | FORMAT_PING], 
>>>> \
>>>> +[0], [dnl
>>>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>>>> +])
>>>> +
>>>> +collector1=$(ovn-nbctl create Sample_Collector id=1 name=c1 
>>>> probability=65535 set_id=100)
>>>> +collector2=$(ovn-nbctl create Sample_Collector id=2 name=c2 
>>>> probability=65535 set_id=200)
>>>> +check_row_count nb:Sample_Collector 2
>>>> +
>>>> +check_uuid ovn-nbctl create Sampling_App type="acl-est" id="43"
>>>> +check_row_count nb:Sampling_App 1
>>>> +
>>>> +check ovn-nbctl pg-add pg vm1 vm2
>>>> +check_uuid ovn-nbctl                                                      
>>>>                    \
>>>> +    -- --id=@sample1 create Sample collector="$collector1" metadata=1001  
>>>>                    \
>>>> +    -- --sample-est=@sample1 acl-add pg from-lport 200 "inport == @pg && 
>>>> udp" allow-related  \
>>>> +    -- --id=@sample2 create Sample collector="$collector2" metadata=1004  
>>>>                    \
>>>> +    -- --sample-est=@sample2 acl-add pg to-lport 200 "outport == @pg && 
>>>> udp" allow-related
>>>> +check ovn-nbctl --wait=hv sync
>>>> +check_row_count nb:ACL 2
>>>> +check_row_count nb:Sample 2
>>>> +
>>>> +NETNS_DAEMONIZE([vm1], [nc -l -u 1234], [udp1.pid])
>>>> +NETNS_DAEMONIZE([vm2], [nc -l -u 1235], [udp2.pid])
>>>> +
>>>> +dnl Start an IPFIX collector.
>>>> +DAEMONIZE([nfcapd -B 1024000 -w . -p 4242 2> collector.err], 
>>>> [collector.pid])
>>>> +dnl Wait for the collector to be up.
>>>> +OVS_WAIT_UNTIL([grep -q 'Startup nfcapd.' collector.err])
>>>> +
>>>> +dnl Configure the OVS flow sample collector.
>>>> +ovs-vsctl --id=@br get Bridge br-int \
>>>> +    -- --id=@ipfix1 create IPFIX targets=\"127.0.0.1:4242\" 
>>>> template_interval=1   \
>>>> +    -- --id=@cs1 create Flow_Sample_Collector_Set id=100 bridge=@br 
>>>> ipfix=@ipfix1 \
>>>> +    -- --id=@ipfix2 create IPFIX targets=\"127.0.0.1:4242\" 
>>>> template_interval=1   \
>>>> +    -- --id=@cs2 create Flow_Sample_Collector_Set id=200 bridge=@br 
>>>> ipfix=@ipfix2
>>>> +
>>>> +check ovn-nbctl --wait=hv sync
>>>> +dnl And wait for it to be up and running.
>>>> +OVS_WAIT_UNTIL([ovs-ofctl dump-ipfix-flow br-int | grep -q '2 ids'])
>>>> +
>>>> +# Create the connection in the CT table.
>>>> +echo "Hello" > Hello
>>>> +NS_EXEC([vm1], [nc -p 1234 -u 43.43.43.2 1235 < Hello])
>>>> +NS_EXEC([vm2], [nc -p 1235 -u 42.42.42.2 1234 < Hello])
>>>> +# Send some more packets.
>>>> +for i in $(seq 10); do
>>>> +NS_EXEC([vm1], [nc -p 1234 -u 43.43.43.2 1235 < Hello])
>>>> +NS_EXEC([vm2], [nc -p 1235 -u 42.42.42.2 1234 < Hello])
>>>> +done
>>>> +
>>>> +OVS_WAIT_UNTIL([ovs-ofctl dump-ipfix-flow br-int | grep 'id 100' | grep 
>>>> -q 'sampled pkts=21'])
>>>> +OVS_WAIT_UNTIL([ovs-ofctl dump-ipfix-flow br-int | grep 'id 200' | grep 
>>>> -q 'sampled pkts=21'])
>>>> +
>>>> +OVN_CLEANUP_CONTROLLER([hv1])
>>>> +
>>>> +as ovn-sb
>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>>>> +
>>>> +as ovn-nb
>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>>>> +
>>>> +as northd
>>>> +OVS_APP_EXIT_AND_WAIT([ovn-northd])
>>>> +
>>>> +as
>>>> +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
>>>> +/connection dropped.*/d"])
>>>> +
>>>> +AT_CLEANUP
>>>> +])
>>>> +
>>>>  OVN_FOR_EACH_NORTHD([
>>>>  AT_SETUP([SB Disconnect - MAC_Binding])
>>>>  ovn_start
>>>> --
>>>> 2.51.0
>>>>
>>> _______________________________________________
>>> dev mailing list
>>> [email protected]
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to