Hi Lorenzo, Mark Thanks for the patch. Just to try to clarify the case which would not work anymore if we apply this patch: see below.
Thanks Xavier On Wed, Sep 24, 2025 at 7:22 PM Mark Michelson via dev < [email protected]> wrote: > Hi Lorenzo, > > Clearing CT in this scenario fixes the sampling issue, but I fear it > could break some ACLs. "allow-related" ACLs that are expected to be > applied in this scenario may no longer be applied since conntrack > state is being cleared With this patch, ACL applied on 'full' switches (i.e. w/o outport == ..) ACL will behave differently than before in some cases. This is highlighted by the system test change. In the following configuration sw0p1 - sw0 - lr0 - sw1 - sw1p1 (all running on same hv1) With the following ACLs [0] acl-add sw1 to-lport 1002 ip4 && tcp && tcp.dst == 80' allow-related [1] acl-add sw1 to-lport 1001 ip4' drop [2] acl-add sw1 from-lport 1001 'inport == "sw1p1" && ip4' drop Without the patch, wget from sw0p1 towards sw1p1 was working (despite looking at a "wrong" zone). With the patch, replies from sw1p1 are dropped in sw1p1 egress pipeline (as ct is skipped as towards router port, so allow-related rule does nor work). In more details: Sending packets from sw0p1 towards sw1p1 port 80. sw0 -> lr0 -> sw1 - (sw1 ingress) received from sw1-lr0 no ct (skipped as from router port) - (sw1 egress) sent to sw1p1. sent to ct (zone=sw1-p1) -> hits ACL [0] Reply (SYN/ACK) from sw1p1 - (sw1 ingress) received from sw1p1 sent to ct (zone=sw1-p1) -> hits 'related' part of [0] - (sw1 egress) sent to sw1-lr0 With this patch: not sent to ct => -trk -> allow-related does not apply -> ACL [1] is applied, packet is dropped. Without the patch, packet was sent to ct (zone=zone=sw1-p1), packet was +trk and hits 'related' part of [0] . Would it be possible to just suppress the > sampling in this scenario instead of clearing CT completely? > > I have a couple of other questions inline below > > On Mon, Sep 22, 2025 at 8:58 AM Lorenzo Bianconi via dev > <[email protected]> wrote: > > > > > Clear connection tracking state in the logical switch egress pipeline > if > > > the packet is sent to a distributed logical router via a patch port. > > > This patch will fix the oversampling issue reported in > > > > https://mail.openvswitch.org/pipermail/ovs-discuss/2025-May/053626.html > > > > > > Reported-at: https://issues.redhat.com/browse/FDP-1408 > > > Signed-off-by: Lorenzo Bianconi <[email protected]> > > > > Hi Oscar, > > > > any chance to test if this patch fixes the reported problem? Thanks in > advance. > > > > Regards, > > Lorenzo > > > > > --- > > > northd/northd.c | 16 ++++-- > > > tests/ovn-northd.at | 4 +- > > > tests/system-ovn.at | 129 > +++++++++++++++++++++++++++++++++++++++++++- > > > 3 files changed, 141 insertions(+), 8 deletions(-) > > > > > > diff --git a/northd/northd.c b/northd/northd.c > > > index 8b5413ef3..2e433bfdd 100644 > > > --- a/northd/northd.c > > > +++ b/northd/northd.c > > > @@ -5832,7 +5832,7 @@ build_lswitch_output_port_sec_od(struct > ovn_datapath *od, > > > > > > static void > > > skip_port_from_conntrack(const struct ovn_datapath *od, struct > ovn_port *op, > > > - bool has_stateful_acl, enum ovn_stage > in_stage, > > > + bool apply_ct, enum ovn_stage in_stage, > > > enum ovn_stage out_stage, uint16_t priority, > > > struct lflow_table *lflows, > > > struct lflow_ref *lflow_ref) > > > @@ -5848,9 +5848,17 @@ skip_port_from_conntrack(const struct > ovn_datapath *od, struct ovn_port *op, > > > * conntrack state across all chassis. */ > > > > > > const char *ingress_action = "next;"; > > > - const char *egress_action = has_stateful_acl > > > - ? "next;" > > > - : "ct_clear; next;"; > > > + struct ovn_port *peer = op->peer; > > > + if (peer && lsp_is_router(op->nbsp)) { > > > + struct ovn_datapath *peer_od = peer->od; > > > + > > > + if (peer_od && > > > + !peer_od->is_gw_router && > vector_is_empty(&peer_od->l3dgw_ports)) { > > > + /* We can clear ct for distributed routers. */ > > > + apply_ct = false; > > > + } > > I'm a bit confused about why apply_ct remains true for gateway routers > and routers with DGPs. Wouldn't this lead to the same double sampling > issue as before? Consider a scenario with a gateway router where both > vms are bound to the gateway chassis. The traffic pattern is identical > to what is done in the system test you've added, but in this case, CT > is not cleared because the router is a gateway router. But even if the > vms are bound to a different chassis than the gateway router, then why > would we not clear CT in this case? Wouldn't the double sampling still > happen then? > > I think this reinforces the suggestion I had that instead of clearing > CT, we should just suppress the sampling on "router" type logical > switch ports, no matter what attributes the router has. > > > > + } > > > + const char *egress_action = apply_ct ? "next;" : "ct_clear; > next;"; > > > > > > char *ingress_match = xasprintf("ip && inport == %s", > op->json_key); > > > char *egress_match = xasprintf("ip && outport == %s", > op->json_key); > > > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > > > index c9e998129..8eb4bae1c 100644 > > > --- a/tests/ovn-northd.at > > > +++ b/tests/ovn-northd.at > > > @@ -4819,7 +4819,7 @@ check_stateful_flows() { > > > table=??(ls_out_pre_lb ), priority=100 , match=(ip), > action=(reg0[[2]] = 1; next;) > > > table=??(ls_out_pre_lb ), priority=110 , match=(eth.mcast), > action=(next;) > > > table=??(ls_out_pre_lb ), priority=110 , match=(eth.src == > \$svc_monitor_mac), action=(next;) > > > - table=??(ls_out_pre_lb ), priority=110 , match=(ip && outport > == "sw0-lr0"), action=($action) > > The $action variable is only used in this one line of > check_stateful_flows(). If the variable is not going to be used > anymore, then you can remove it entirely and clean up the calls to > check_stateful_flows() so they do not pass an argument any longer. > > Depending on what is done for v2, this comment may not be relevant though. > > > > + table=??(ls_out_pre_lb ), priority=110 , match=(ip && outport > == "sw0-lr0"), action=(ct_clear; next;) > > > table=??(ls_out_pre_lb ), priority=110 , match=(nd || nd_rs > || nd_ra || mldv1 || mldv2), action=(next;) > > > table=??(ls_out_pre_lb ), priority=110 , match=(reg0[[16]] == > 1), action=(next;) > > > ]) > > > @@ -4886,7 +4886,7 @@ AT_CHECK([grep "ls_out_pre_lb" sw0flows | > ovn_strip_lflows], [0], [dnl > > > table=??(ls_out_pre_lb ), priority=0 , match=(1), > action=(next;) > > > table=??(ls_out_pre_lb ), priority=110 , match=(eth.mcast), > action=(next;) > > > table=??(ls_out_pre_lb ), priority=110 , match=(eth.src == > $svc_monitor_mac), action=(next;) > > > - table=??(ls_out_pre_lb ), priority=110 , match=(ip && outport > == "sw0-lr0"), action=(next;) > > > + table=??(ls_out_pre_lb ), priority=110 , match=(ip && outport > == "sw0-lr0"), action=(ct_clear; next;) > > > table=??(ls_out_pre_lb ), priority=110 , match=(nd || nd_rs > || nd_ra || mldv1 || mldv2), action=(next;) > > > table=??(ls_out_pre_lb ), priority=110 , match=(reg0[[16]] == > 1), action=(next;) > > > ]) > > > diff --git a/tests/system-ovn.at b/tests/system-ovn.at > > > index 8e356df6f..3e4ff32d1 100644 > > > --- a/tests/system-ovn.at > > > +++ b/tests/system-ovn.at > > > @@ -8542,8 +8542,8 @@ check ovn-nbctl lsp-set-addresses sw1p1 > "50:54:00:00:00:03 20.0.0.3" > > > > > > check ovn-nbctl --apply-after-lb acl-add sw1 from-lport 1001 'inport > == "sw1p1" && ip4' drop > > > > > > -check ovn-nbctl acl-add sw1 to-lport 1002 'ip4 && tcp && tcp.dst == > 80' allow-related > > > -check ovn-nbctl acl-add sw1 to-lport 1001 'ip4' drop > > > +check ovn-nbctl acl-add sw1 to-lport 1002 'outport == "sw1p1" && ip4 > && tcp && tcp.dst == 80' allow-related > > > +check ovn-nbctl acl-add sw1 to-lport 1001 'outport == "sw1p1" && ip4' > drop > > Why are these changes necessary? > Applying ACL on the switch (w/ outport=..) applies ACL also on the switch ports, and the patch is breaking this use case. > > > > > > > ADD_NAMESPACES(sw0p1) > > > ADD_VETH(sw0p1, sw0p1, br-int, "10.0.0.3/24", "50:54:00:00:00:02", \ > > > @@ -13969,6 +13969,131 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query > port patch-.*/d > > > AT_CLEANUP > > > ]) > > > > > > +OVN_FOR_EACH_NORTHD([ > > > +AT_SETUP([ovn -- ACL Sampling - Stateful ACL - routed traffic]) > > > +AT_KEYWORDS([ACL]) > > > + > > > +CHECK_CONNTRACK() > > > +CHECK_CONNTRACK_NAT() > > > +ovn_start > > > +OVS_TRAFFIC_VSWITCHD_START() > > > +ADD_BR([br-int]) > > > + > > > +dnl Set external-ids in br-int needed for ovn-controller > > > +check ovs-vsctl \ > > > + -- set Open_vSwitch . external-ids:system-id=hv1 \ > > > + -- set Open_vSwitch . > external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ > > > + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ > > > + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ > > > + -- set bridge br-int fail-mode=secure > other-config:disable-in-band=true > > > + > > > +dnl Start ovn-controller > > > +start_daemon ovn-controller > > > + > > > +dnl Logical network: > > > +dnl 1 router > > > +dnl 2 logical switch > > > +dnl 2 VIF > > > + > > > +check ovn-nbctl \ > > > + -- lr-add lr \ > > > + -- lrp-add lr lrp1 00:00:00:00:01:00 42.42.42.1/24 \ > > > + -- lrp-add lr lrp2 00:00:00:00:02:00 43.43.43.1/24 \ > > > + -- ls-add ls1 \ > > > + -- ls-add ls2 \ > > > + -- lsp-add ls1 vm1 \ > > > + -- lsp-add ls2 vm2 \ > > > + -- lsp-set-addresses vm1 00:00:00:00:00:01 \ > > > + -- lsp-set-addresses vm2 00:00:00:00:00:02 \ > > > + -- lsp-add ls1 ls1-lr \ > > > + -- lsp-add ls2 ls2-lr \ > > > + -- lsp-set-type ls1-lr router \ > > > + -- lsp-set-type ls2-lr router \ > > > + -- lsp-set-options ls1-lr router-port=lrp1 \ > > > + -- lsp-set-options ls2-lr router-port=lrp2 \ > > > + -- lsp-set-addresses ls1-lr router \ > > > + -- lsp-set-addresses ls2-lr router > > > +check ovn-nbctl --wait=hv sync > > > + > > > +ADD_NAMESPACES(vm1) > > > +ADD_VETH(vm1, vm1, br-int, "42.42.42.2/24", "00:00:00:00:00:01", > "42.42.42.1") > > > +ADD_NAMESPACES(vm2) > > > +ADD_VETH(vm2, vm2, br-int, "43.43.43.2/24", "00:00:00:00:00:02", > "43.43.43.1") > > > + > > > +wait_for_ports_up > > > + > > > +NS_CHECK_EXEC([vm1], [ping -q -c 3 -i 0.3 -w 2 43.43.43.2 | > FORMAT_PING], \ > > > +[0], [dnl > > > +3 packets transmitted, 3 received, 0% packet loss, time 0ms > > > +]) > > > + > > > +collector1=$(ovn-nbctl create Sample_Collector id=1 name=c1 > probability=65535 set_id=100) > > > +collector2=$(ovn-nbctl create Sample_Collector id=2 name=c2 > probability=65535 set_id=200) > > > +check_row_count nb:Sample_Collector 2 > > > + > > > +check_uuid ovn-nbctl create Sampling_App type="acl-est" id="43" > > > +check_row_count nb:Sampling_App 1 > > > + > > > +check ovn-nbctl pg-add pg vm1 vm2 > > > +check_uuid ovn-nbctl > \ > > > + -- --id=@sample1 create Sample collector="$collector1" > metadata=1001 \ > > > + -- --sample-est=@sample1 acl-add pg from-lport 200 "inport == @pg > && udp" allow-related \ > > > + -- --id=@sample2 create Sample collector="$collector2" > metadata=1004 \ > > > + -- --sample-est=@sample2 acl-add pg to-lport 200 "outport == @pg > && udp" allow-related > > > +check ovn-nbctl --wait=hv sync > > > +check_row_count nb:ACL 2 > > > +check_row_count nb:Sample 2 > > > + > > > +NETNS_DAEMONIZE([vm1], [nc -l -u 1234], [udp1.pid]) > > > +NETNS_DAEMONIZE([vm2], [nc -l -u 1235], [udp2.pid]) > > > + > > > +dnl Start an IPFIX collector. > > > +DAEMONIZE([nfcapd -B 1024000 -w . -p 4242 2> collector.err], > [collector.pid]) > > > +dnl Wait for the collector to be up. > > > +OVS_WAIT_UNTIL([grep -q 'Startup nfcapd.' collector.err]) > > > + > > > +dnl Configure the OVS flow sample collector. > > > +ovs-vsctl --id=@br get Bridge br-int \ > > > + -- --id=@ipfix1 create IPFIX targets=\"127.0.0.1:4242\" > template_interval=1 \ > > > + -- --id=@cs1 create Flow_Sample_Collector_Set id=100 bridge=@br > ipfix=@ipfix1 \ > > > + -- --id=@ipfix2 create IPFIX targets=\"127.0.0.1:4242\" > template_interval=1 \ > > > + -- --id=@cs2 create Flow_Sample_Collector_Set id=200 bridge=@br > ipfix=@ipfix2 > > > + > > > +check ovn-nbctl --wait=hv sync > > > +dnl And wait for it to be up and running. > > > +OVS_WAIT_UNTIL([ovs-ofctl dump-ipfix-flow br-int | grep -q '2 ids']) > > > + > > > +# Create the connection in the CT table. > > > +echo "Hello" > Hello > > > +NS_EXEC([vm1], [nc -p 1234 -u 43.43.43.2 1235 < Hello]) > > > +NS_EXEC([vm2], [nc -p 1235 -u 42.42.42.2 1234 < Hello]) > > > +# Send some more packets. > > > +for i in $(seq 10); do > > > +NS_EXEC([vm1], [nc -p 1234 -u 43.43.43.2 1235 < Hello]) > > > +NS_EXEC([vm2], [nc -p 1235 -u 42.42.42.2 1234 < Hello]) > > > +done > > > + > > > +OVS_WAIT_UNTIL([ovs-ofctl dump-ipfix-flow br-int | grep 'id 100' | > grep -q 'sampled pkts=21']) > > > +OVS_WAIT_UNTIL([ovs-ofctl dump-ipfix-flow br-int | grep 'id 200' | > grep -q 'sampled pkts=21']) > > > + > > > +OVN_CLEANUP_CONTROLLER([hv1]) > > > + > > > +as ovn-sb > > > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > > > + > > > +as ovn-nb > > > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > > > + > > > +as northd > > > +OVS_APP_EXIT_AND_WAIT([ovn-northd]) > > > + > > > +as > > > +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d > > > +/connection dropped.*/d"]) > > > + > > > +AT_CLEANUP > > > +]) > > > + > > > OVN_FOR_EACH_NORTHD([ > > > AT_SETUP([SB Disconnect - MAC_Binding]) > > > ovn_start > > > -- > > > 2.51.0 > > > > > _______________________________________________ > > dev mailing list > > [email protected] > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
