> On 9/25/25 3:10 PM, Lorenzo Bianconi wrote:
> >> Hi Lorenzo,
> >>
> >> Clearing CT in this scenario fixes the sampling issue, but I fear it
> >> could break some ACLs. "allow-related" ACLs that are expected to be
> >> applied in this scenario may no longer be applied since conntrack
> >> state is being cleared. Would it be possible to just suppress the
> >> sampling in this scenario instead of clearing CT completely?
> > 
> > Hi Mark,
> > 
> 
> Hi Lorenzo, Mark,
> 
> > thx for the review. Sampling decision is based only on ct state info, 
> > logical
> > flows does not depend on the input or output port used (e.g. ingress ACL
> > sampling flows):
> > 
> > table=10(ls_in_acl_sample   ), priority=1000 , match=(ip && ct.trk && 
> > (ct.est || ct.rel) && ct_label.obs_unused == 0 && !ct.rpl && 
> > ct_mark.obs_collector_id == 1 && ct_mark.obs_stage == 0), 
> > action=(sample(probability=65535,collector_set=100,obs_domain=43,obs_point=ct_label.obs_point_id);
> >  next;)
> > table=10(ls_in_acl_sample   ), priority=1000 , match=(ip && ct.trk && 
> > (ct.est || ct.rel) && ct_label.obs_unused == 0 && ct.rpl && 
> > ct_mark.obs_collector_id == 2), 
> > action=(sample(probability=65535,collector_set=200,obs_domain=43,obs_point=ct_label.obs_point_id);
> >  next;)
> > 
> > IIUC the only way to avoid sampling is to clear ct state, right? Another
> > approach would be to just avoid sampling on all possible patch ports 
> > connected
> > to a router but this probably will introduce new lflows.
> > Do you think it is better? @Dumitru: what do you think?
> > 
> 
> I'm afraid there are valid cases (e.g., no stateful ACLs) when ACLs are
> correctly applied to traffic being sent out from logical switches to
> logical routers (via OVN patch ports).  It would be bad if we couldn't
> sample that kind of traffic.
> 
> I have a different suggestion: can't we use a bit in the per-packet
> "flags" register to track whether a packet has been sampled in the
> current conntrack zone?  The "ct_clear" action would have to be changed
> to zero this flag.
> 
> Would that work to address the problem this patch is trying to fix?

Hi Dumitru,

I will investigate this approach, thx for pointing this out.

Regards,
Lorenzo

> 
> > Moreover, the patch clears ct_state just for patch ports on distributed 
> > routers.
> > Is it correct to use LSP ct zone in this case or is it just an 'abuse' of
> > previous behaviour?
> > 
> 
> I agree with Mark on this one.  It feels weird to only change behavior
> for one type of router ports.
> 
> Regards,
> Dumitru
> 
> >>
> >> I have a couple of other questions inline below
> >>
> >> On Mon, Sep 22, 2025 at 8:58 AM Lorenzo Bianconi via dev
> >> <[email protected]> wrote:
> >>>
> >>>> Clear connection tracking state in the logical switch egress pipeline if
> >>>> the packet is sent to a distributed logical router via a patch port.
> >>>> This patch will fix the oversampling issue reported in
> >>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2025-May/053626.html
> >>>>
> >>>> Reported-at: https://issues.redhat.com/browse/FDP-1408
> >>>> Signed-off-by: Lorenzo Bianconi <[email protected]>
> >>>
> >>> Hi Oscar,
> >>>
> >>> any chance to test if this patch fixes the reported problem? Thanks in 
> >>> advance.
> >>>
> >>> Regards,
> >>> Lorenzo
> >>>
> >>>> ---
> >>>>  northd/northd.c     |  16 ++++--
> >>>>  tests/ovn-northd.at |   4 +-
> >>>>  tests/system-ovn.at | 129 +++++++++++++++++++++++++++++++++++++++++++-
> >>>>  3 files changed, 141 insertions(+), 8 deletions(-)
> >>>>
> >>>> diff --git a/northd/northd.c b/northd/northd.c
> >>>> index 8b5413ef3..2e433bfdd 100644
> >>>> --- a/northd/northd.c
> >>>> +++ b/northd/northd.c
> >>>> @@ -5832,7 +5832,7 @@ build_lswitch_output_port_sec_od(struct 
> >>>> ovn_datapath *od,
> >>>>
> >>>>  static void
> >>>>  skip_port_from_conntrack(const struct ovn_datapath *od, struct ovn_port 
> >>>> *op,
> >>>> -                         bool has_stateful_acl, enum ovn_stage in_stage,
> >>>> +                         bool apply_ct, enum ovn_stage in_stage,
> >>>>                           enum ovn_stage out_stage, uint16_t priority,
> >>>>                           struct lflow_table *lflows,
> >>>>                           struct lflow_ref *lflow_ref)
> >>>> @@ -5848,9 +5848,17 @@ skip_port_from_conntrack(const struct 
> >>>> ovn_datapath *od, struct ovn_port *op,
> >>>>       * conntrack state across all chassis. */
> >>>>
> >>>>      const char *ingress_action = "next;";
> >>>> -    const char *egress_action = has_stateful_acl
> >>>> -                                ? "next;"
> >>>> -                                : "ct_clear; next;";
> >>>> +    struct ovn_port *peer = op->peer;
> >>>> +    if (peer && lsp_is_router(op->nbsp)) {
> >>>> +        struct ovn_datapath *peer_od = peer->od;
> >>>> +
> >>>> +        if (peer_od &&
> >>>> +            !peer_od->is_gw_router && 
> >>>> vector_is_empty(&peer_od->l3dgw_ports)) {
> >>>> +            /* We can clear ct for distributed routers. */
> >>>> +            apply_ct = false;
> >>>> +        }
> >>
> >> I'm a bit confused about why apply_ct remains true for gateway routers
> >> and routers with DGPs. Wouldn't this lead to the same double sampling
> >> issue as before? Consider a scenario with a gateway router where both
> >> vms are bound to the gateway chassis. The traffic pattern is identical
> >> to what is done in the system test you've added, but in this case, CT
> >> is not cleared because the router is a gateway router. But even if the
> >> vms are bound to a different chassis than the gateway router, then why
> >> would we not clear CT in this case? Wouldn't the double sampling still
> >> happen then?
> >>
> >> I think this reinforces the suggestion I had that instead of clearing
> >> CT, we should just suppress the sampling on "router" type logical
> >> switch ports, no matter what attributes the router has.
> >>
> >>>> +    }
> >>>> +    const char *egress_action = apply_ct ? "next;" : "ct_clear; next;";
> >>>>
> >>>>      char *ingress_match = xasprintf("ip && inport == %s", op->json_key);
> >>>>      char *egress_match = xasprintf("ip && outport == %s", op->json_key);
> >>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> >>>> index c9e998129..8eb4bae1c 100644
> >>>> --- a/tests/ovn-northd.at
> >>>> +++ b/tests/ovn-northd.at
> >>>> @@ -4819,7 +4819,7 @@ check_stateful_flows() {
> >>>>    table=??(ls_out_pre_lb      ), priority=100  , match=(ip), 
> >>>> action=(reg0[[2]] = 1; next;)
> >>>>    table=??(ls_out_pre_lb      ), priority=110  , match=(eth.mcast), 
> >>>> action=(next;)
> >>>>    table=??(ls_out_pre_lb      ), priority=110  , match=(eth.src == 
> >>>> \$svc_monitor_mac), action=(next;)
> >>>> -  table=??(ls_out_pre_lb      ), priority=110  , match=(ip && outport 
> >>>> == "sw0-lr0"), action=($action)
> >>
> >> The $action variable is only used in this one line of
> >> check_stateful_flows(). If the variable is not going to be used
> >> anymore, then you can remove it entirely and clean up the calls to
> >> check_stateful_flows() so they do not pass an argument any longer.
> >>
> >> Depending on what is done for v2, this comment may not be relevant though.
> >>
> >>>> +  table=??(ls_out_pre_lb      ), priority=110  , match=(ip && outport 
> >>>> == "sw0-lr0"), action=(ct_clear; next;)
> >>>>    table=??(ls_out_pre_lb      ), priority=110  , match=(nd || nd_rs || 
> >>>> nd_ra || mldv1 || mldv2), action=(next;)
> >>>>    table=??(ls_out_pre_lb      ), priority=110  , match=(reg0[[16]] == 
> >>>> 1), action=(next;)
> >>>>  ])
> >>>> @@ -4886,7 +4886,7 @@ AT_CHECK([grep "ls_out_pre_lb" sw0flows | 
> >>>> ovn_strip_lflows], [0], [dnl
> >>>>    table=??(ls_out_pre_lb      ), priority=0    , match=(1), 
> >>>> action=(next;)
> >>>>    table=??(ls_out_pre_lb      ), priority=110  , match=(eth.mcast), 
> >>>> action=(next;)
> >>>>    table=??(ls_out_pre_lb      ), priority=110  , match=(eth.src == 
> >>>> $svc_monitor_mac), action=(next;)
> >>>> -  table=??(ls_out_pre_lb      ), priority=110  , match=(ip && outport 
> >>>> == "sw0-lr0"), action=(next;)
> >>>> +  table=??(ls_out_pre_lb      ), priority=110  , match=(ip && outport 
> >>>> == "sw0-lr0"), action=(ct_clear; next;)
> >>>>    table=??(ls_out_pre_lb      ), priority=110  , match=(nd || nd_rs || 
> >>>> nd_ra || mldv1 || mldv2), action=(next;)
> >>>>    table=??(ls_out_pre_lb      ), priority=110  , match=(reg0[[16]] == 
> >>>> 1), action=(next;)
> >>>>  ])
> >>>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> >>>> index 8e356df6f..3e4ff32d1 100644
> >>>> --- a/tests/system-ovn.at
> >>>> +++ b/tests/system-ovn.at
> >>>> @@ -8542,8 +8542,8 @@ check ovn-nbctl lsp-set-addresses sw1p1 
> >>>> "50:54:00:00:00:03 20.0.0.3"
> >>>>
> >>>>  check ovn-nbctl --apply-after-lb acl-add sw1 from-lport 1001 'inport == 
> >>>> "sw1p1" && ip4' drop
> >>>>
> >>>> -check ovn-nbctl acl-add sw1 to-lport 1002 'ip4 && tcp && tcp.dst == 80' 
> >>>> allow-related
> >>>> -check ovn-nbctl acl-add sw1 to-lport 1001 'ip4' drop
> >>>> +check ovn-nbctl acl-add sw1 to-lport 1002 'outport == "sw1p1" && ip4 && 
> >>>> tcp && tcp.dst == 80' allow-related
> >>>> +check ovn-nbctl acl-add sw1 to-lport 1001 'outport == "sw1p1" && ip4' 
> >>>> drop
> >>
> >> Why are these changes necessary?
> > 
> > Because the test was relying on the previous ct zone assumption. Discussing
> > with Dunitru we agreed this is wrong.
> > @Dumitru: any input on it?
> > 
> > Regards,
> > Lorenzo
> > 
> >>
> >>>>
> >>>>  ADD_NAMESPACES(sw0p1)
> >>>>  ADD_VETH(sw0p1, sw0p1, br-int, "10.0.0.3/24", "50:54:00:00:00:02", \
> >>>> @@ -13969,6 +13969,131 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query 
> >>>> port patch-.*/d
> >>>>  AT_CLEANUP
> >>>>  ])
> >>>>
> >>>> +OVN_FOR_EACH_NORTHD([
> >>>> +AT_SETUP([ovn -- ACL Sampling - Stateful ACL - routed traffic])
> >>>> +AT_KEYWORDS([ACL])
> >>>> +
> >>>> +CHECK_CONNTRACK()
> >>>> +CHECK_CONNTRACK_NAT()
> >>>> +ovn_start
> >>>> +OVS_TRAFFIC_VSWITCHD_START()
> >>>> +ADD_BR([br-int])
> >>>> +
> >>>> +dnl Set external-ids in br-int needed for ovn-controller
> >>>> +check ovs-vsctl \
> >>>> +        -- set Open_vSwitch . external-ids:system-id=hv1 \
> >>>> +        -- set Open_vSwitch . 
> >>>> external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
> >>>> +        -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
> >>>> +        -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
> >>>> +        -- set bridge br-int fail-mode=secure 
> >>>> other-config:disable-in-band=true
> >>>> +
> >>>> +dnl Start ovn-controller
> >>>> +start_daemon ovn-controller
> >>>> +
> >>>> +dnl Logical network:
> >>>> +dnl 1 router
> >>>> +dnl 2 logical switch
> >>>> +dnl 2 VIF
> >>>> +
> >>>> +check ovn-nbctl                                      \
> >>>> +  -- lr-add lr                                       \
> >>>> +  -- lrp-add lr lrp1 00:00:00:00:01:00 42.42.42.1/24 \
> >>>> +  -- lrp-add lr lrp2 00:00:00:00:02:00 43.43.43.1/24 \
> >>>> +  -- ls-add ls1                                      \
> >>>> +  -- ls-add ls2                                      \
> >>>> +  -- lsp-add ls1 vm1                                 \
> >>>> +  -- lsp-add ls2 vm2                                 \
> >>>> +  -- lsp-set-addresses vm1 00:00:00:00:00:01         \
> >>>> +  -- lsp-set-addresses vm2 00:00:00:00:00:02         \
> >>>> +  -- lsp-add ls1 ls1-lr                              \
> >>>> +  -- lsp-add ls2 ls2-lr                              \
> >>>> +  -- lsp-set-type ls1-lr router                      \
> >>>> +  -- lsp-set-type ls2-lr router                      \
> >>>> +  -- lsp-set-options ls1-lr router-port=lrp1         \
> >>>> +  -- lsp-set-options ls2-lr router-port=lrp2         \
> >>>> +  -- lsp-set-addresses ls1-lr router                 \
> >>>> +  -- lsp-set-addresses ls2-lr router
> >>>> +check ovn-nbctl --wait=hv sync
> >>>> +
> >>>> +ADD_NAMESPACES(vm1)
> >>>> +ADD_VETH(vm1, vm1, br-int, "42.42.42.2/24", "00:00:00:00:00:01", 
> >>>> "42.42.42.1")
> >>>> +ADD_NAMESPACES(vm2)
> >>>> +ADD_VETH(vm2, vm2, br-int, "43.43.43.2/24", "00:00:00:00:00:02", 
> >>>> "43.43.43.1")
> >>>> +
> >>>> +wait_for_ports_up
> >>>> +
> >>>> +NS_CHECK_EXEC([vm1], [ping -q -c 3 -i 0.3 -w 2 43.43.43.2 | 
> >>>> FORMAT_PING], \
> >>>> +[0], [dnl
> >>>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> >>>> +])
> >>>> +
> >>>> +collector1=$(ovn-nbctl create Sample_Collector id=1 name=c1 
> >>>> probability=65535 set_id=100)
> >>>> +collector2=$(ovn-nbctl create Sample_Collector id=2 name=c2 
> >>>> probability=65535 set_id=200)
> >>>> +check_row_count nb:Sample_Collector 2
> >>>> +
> >>>> +check_uuid ovn-nbctl create Sampling_App type="acl-est" id="43"
> >>>> +check_row_count nb:Sampling_App 1
> >>>> +
> >>>> +check ovn-nbctl pg-add pg vm1 vm2
> >>>> +check_uuid ovn-nbctl                                                    
> >>>>                      \
> >>>> +    -- --id=@sample1 create Sample collector="$collector1" 
> >>>> metadata=1001                     \
> >>>> +    -- --sample-est=@sample1 acl-add pg from-lport 200 "inport == @pg 
> >>>> && udp" allow-related  \
> >>>> +    -- --id=@sample2 create Sample collector="$collector2" 
> >>>> metadata=1004                     \
> >>>> +    -- --sample-est=@sample2 acl-add pg to-lport 200 "outport == @pg && 
> >>>> udp" allow-related
> >>>> +check ovn-nbctl --wait=hv sync
> >>>> +check_row_count nb:ACL 2
> >>>> +check_row_count nb:Sample 2
> >>>> +
> >>>> +NETNS_DAEMONIZE([vm1], [nc -l -u 1234], [udp1.pid])
> >>>> +NETNS_DAEMONIZE([vm2], [nc -l -u 1235], [udp2.pid])
> >>>> +
> >>>> +dnl Start an IPFIX collector.
> >>>> +DAEMONIZE([nfcapd -B 1024000 -w . -p 4242 2> collector.err], 
> >>>> [collector.pid])
> >>>> +dnl Wait for the collector to be up.
> >>>> +OVS_WAIT_UNTIL([grep -q 'Startup nfcapd.' collector.err])
> >>>> +
> >>>> +dnl Configure the OVS flow sample collector.
> >>>> +ovs-vsctl --id=@br get Bridge br-int \
> >>>> +    -- --id=@ipfix1 create IPFIX targets=\"127.0.0.1:4242\" 
> >>>> template_interval=1   \
> >>>> +    -- --id=@cs1 create Flow_Sample_Collector_Set id=100 bridge=@br 
> >>>> ipfix=@ipfix1 \
> >>>> +    -- --id=@ipfix2 create IPFIX targets=\"127.0.0.1:4242\" 
> >>>> template_interval=1   \
> >>>> +    -- --id=@cs2 create Flow_Sample_Collector_Set id=200 bridge=@br 
> >>>> ipfix=@ipfix2
> >>>> +
> >>>> +check ovn-nbctl --wait=hv sync
> >>>> +dnl And wait for it to be up and running.
> >>>> +OVS_WAIT_UNTIL([ovs-ofctl dump-ipfix-flow br-int | grep -q '2 ids'])
> >>>> +
> >>>> +# Create the connection in the CT table.
> >>>> +echo "Hello" > Hello
> >>>> +NS_EXEC([vm1], [nc -p 1234 -u 43.43.43.2 1235 < Hello])
> >>>> +NS_EXEC([vm2], [nc -p 1235 -u 42.42.42.2 1234 < Hello])
> >>>> +# Send some more packets.
> >>>> +for i in $(seq 10); do
> >>>> +NS_EXEC([vm1], [nc -p 1234 -u 43.43.43.2 1235 < Hello])
> >>>> +NS_EXEC([vm2], [nc -p 1235 -u 42.42.42.2 1234 < Hello])
> >>>> +done
> >>>> +
> >>>> +OVS_WAIT_UNTIL([ovs-ofctl dump-ipfix-flow br-int | grep 'id 100' | grep 
> >>>> -q 'sampled pkts=21'])
> >>>> +OVS_WAIT_UNTIL([ovs-ofctl dump-ipfix-flow br-int | grep 'id 200' | grep 
> >>>> -q 'sampled pkts=21'])
> >>>> +
> >>>> +OVN_CLEANUP_CONTROLLER([hv1])
> >>>> +
> >>>> +as ovn-sb
> >>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> >>>> +
> >>>> +as ovn-nb
> >>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> >>>> +
> >>>> +as northd
> >>>> +OVS_APP_EXIT_AND_WAIT([ovn-northd])
> >>>> +
> >>>> +as
> >>>> +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
> >>>> +/connection dropped.*/d"])
> >>>> +
> >>>> +AT_CLEANUP
> >>>> +])
> >>>> +
> >>>>  OVN_FOR_EACH_NORTHD([
> >>>>  AT_SETUP([SB Disconnect - MAC_Binding])
> >>>>  ovn_start
> >>>> --
> >>>> 2.51.0
> >>>>
> >>> _______________________________________________
> >>> dev mailing list
> >>> [email protected]
> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>
> 
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to