> Hi Lorenzo, Mark
> 
> Thanks for the patch.
> Just to try to clarify the case which would not work anymore if we apply
> this patch: see below.
> 
> Thanks
> Xavier

Hi Xavier,

> 
> On Wed, Sep 24, 2025 at 7:22 PM Mark Michelson via dev <
> [email protected]> wrote:
> 
> > Hi Lorenzo,
> >
> > Clearing CT in this scenario fixes the sampling issue, but I fear it
> > could break some ACLs. "allow-related" ACLs that are expected to be
> > applied in this scenario may no longer be applied since conntrack
> > state is being cleared
> 
> With this patch, ACL applied on 'full' switches (i.e. w/o outport == ..)
> ACL will behave differently than before in some cases.
> This is highlighted by the  system test change.
> 
> In the following configuration
> sw0p1 - sw0 - lr0 - sw1 - sw1p1 (all running on same hv1)
> With the following ACLs
> [0] acl-add sw1 to-lport 1002 ip4 && tcp && tcp.dst == 80' allow-related
> [1] acl-add sw1 to-lport 1001 ip4' drop
> [2] acl-add sw1 from-lport 1001 'inport == "sw1p1" && ip4' drop
> 
> Without the patch, wget from sw0p1 towards sw1p1 was working (despite
> looking at a "wrong" zone).
> With the patch, replies from sw1p1 are dropped in sw1p1 egress pipeline (as
> ct is skipped as towards router port, so allow-related rule does nor work).
> 
> In more details:
> Sending packets from sw0p1 towards sw1p1 port 80.
> sw0 -> lr0 -> sw1
> - (sw1 ingress) received from sw1-lr0
>   no ct (skipped as from router port)
> - (sw1 egress) sent to sw1p1.
>    sent to ct (zone=sw1-p1) -> hits ACL [0]
> 
> Reply (SYN/ACK) from sw1p1
> - (sw1 ingress) received from sw1p1
>   sent to ct (zone=sw1-p1) -> hits 'related' part of [0]
> - (sw1 egress) sent to sw1-lr0
>    With this patch: not sent to ct => -trk -> allow-related does not apply
> -> ACL [1] is applied, packet is dropped.
>    Without the patch, packet was sent to ct (zone=zone=sw1-p1), packet was
> +trk and hits 'related' part of [0]

yes, I was aware of this possible problem (in fact I modified the system-test
:)). I was thinking it was not a valid use-case. I will investigate the solution
suggested by Dumitru.

Regards,
Lorenzo

> 
> . Would it be possible to just suppress the
> > sampling in this scenario instead of clearing CT completely?
> >
> > I have a couple of other questions inline below
> >
> > On Mon, Sep 22, 2025 at 8:58 AM Lorenzo Bianconi via dev
> > <[email protected]> wrote:
> > >
> > > > Clear connection tracking state in the logical switch egress pipeline
> > if
> > > > the packet is sent to a distributed logical router via a patch port.
> > > > This patch will fix the oversampling issue reported in
> > > >
> > https://mail.openvswitch.org/pipermail/ovs-discuss/2025-May/053626.html
> > > >
> > > > Reported-at: https://issues.redhat.com/browse/FDP-1408
> > > > Signed-off-by: Lorenzo Bianconi <[email protected]>
> > >
> > > Hi Oscar,
> > >
> > > any chance to test if this patch fixes the reported problem? Thanks in
> > advance.
> > >
> > > Regards,
> > > Lorenzo
> > >
> > > > ---
> > > >  northd/northd.c     |  16 ++++--
> > > >  tests/ovn-northd.at |   4 +-
> > > >  tests/system-ovn.at | 129
> > +++++++++++++++++++++++++++++++++++++++++++-
> > > >  3 files changed, 141 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/northd/northd.c b/northd/northd.c
> > > > index 8b5413ef3..2e433bfdd 100644
> > > > --- a/northd/northd.c
> > > > +++ b/northd/northd.c
> > > > @@ -5832,7 +5832,7 @@ build_lswitch_output_port_sec_od(struct
> > ovn_datapath *od,
> > > >
> > > >  static void
> > > >  skip_port_from_conntrack(const struct ovn_datapath *od, struct
> > ovn_port *op,
> > > > -                         bool has_stateful_acl, enum ovn_stage
> > in_stage,
> > > > +                         bool apply_ct, enum ovn_stage in_stage,
> > > >                           enum ovn_stage out_stage, uint16_t priority,
> > > >                           struct lflow_table *lflows,
> > > >                           struct lflow_ref *lflow_ref)
> > > > @@ -5848,9 +5848,17 @@ skip_port_from_conntrack(const struct
> > ovn_datapath *od, struct ovn_port *op,
> > > >       * conntrack state across all chassis. */
> > > >
> > > >      const char *ingress_action = "next;";
> > > > -    const char *egress_action = has_stateful_acl
> > > > -                                ? "next;"
> > > > -                                : "ct_clear; next;";
> > > > +    struct ovn_port *peer = op->peer;
> > > > +    if (peer && lsp_is_router(op->nbsp)) {
> > > > +        struct ovn_datapath *peer_od = peer->od;
> > > > +
> > > > +        if (peer_od &&
> > > > +            !peer_od->is_gw_router &&
> > vector_is_empty(&peer_od->l3dgw_ports)) {
> > > > +            /* We can clear ct for distributed routers. */
> > > > +            apply_ct = false;
> > > > +        }
> >
> > I'm a bit confused about why apply_ct remains true for gateway routers
> > and routers with DGPs. Wouldn't this lead to the same double sampling
> > issue as before? Consider a scenario with a gateway router where both
> > vms are bound to the gateway chassis. The traffic pattern is identical
> > to what is done in the system test you've added, but in this case, CT
> > is not cleared because the router is a gateway router. But even if the
> > vms are bound to a different chassis than the gateway router, then why
> > would we not clear CT in this case? Wouldn't the double sampling still
> > happen then?
> >
> > I think this reinforces the suggestion I had that instead of clearing
> > CT, we should just suppress the sampling on "router" type logical
> > switch ports, no matter what attributes the router has.
> >
> > > > +    }
> > > > +    const char *egress_action = apply_ct ? "next;" : "ct_clear;
> > next;";
> > > >
> > > >      char *ingress_match = xasprintf("ip && inport == %s",
> > op->json_key);
> > > >      char *egress_match = xasprintf("ip && outport == %s",
> > op->json_key);
> > > > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> > > > index c9e998129..8eb4bae1c 100644
> > > > --- a/tests/ovn-northd.at
> > > > +++ b/tests/ovn-northd.at
> > > > @@ -4819,7 +4819,7 @@ check_stateful_flows() {
> > > >    table=??(ls_out_pre_lb      ), priority=100  , match=(ip),
> > action=(reg0[[2]] = 1; next;)
> > > >    table=??(ls_out_pre_lb      ), priority=110  , match=(eth.mcast),
> > action=(next;)
> > > >    table=??(ls_out_pre_lb      ), priority=110  , match=(eth.src ==
> > \$svc_monitor_mac), action=(next;)
> > > > -  table=??(ls_out_pre_lb      ), priority=110  , match=(ip && outport
> > == "sw0-lr0"), action=($action)
> >
> > The $action variable is only used in this one line of
> > check_stateful_flows(). If the variable is not going to be used
> > anymore, then you can remove it entirely and clean up the calls to
> > check_stateful_flows() so they do not pass an argument any longer.
> >
> > Depending on what is done for v2, this comment may not be relevant though.
> >
> > > > +  table=??(ls_out_pre_lb      ), priority=110  , match=(ip && outport
> > == "sw0-lr0"), action=(ct_clear; next;)
> > > >    table=??(ls_out_pre_lb      ), priority=110  , match=(nd || nd_rs
> > || nd_ra || mldv1 || mldv2), action=(next;)
> > > >    table=??(ls_out_pre_lb      ), priority=110  , match=(reg0[[16]] ==
> > 1), action=(next;)
> > > >  ])
> > > > @@ -4886,7 +4886,7 @@ AT_CHECK([grep "ls_out_pre_lb" sw0flows |
> > ovn_strip_lflows], [0], [dnl
> > > >    table=??(ls_out_pre_lb      ), priority=0    , match=(1),
> > action=(next;)
> > > >    table=??(ls_out_pre_lb      ), priority=110  , match=(eth.mcast),
> > action=(next;)
> > > >    table=??(ls_out_pre_lb      ), priority=110  , match=(eth.src ==
> > $svc_monitor_mac), action=(next;)
> > > > -  table=??(ls_out_pre_lb      ), priority=110  , match=(ip && outport
> > == "sw0-lr0"), action=(next;)
> > > > +  table=??(ls_out_pre_lb      ), priority=110  , match=(ip && outport
> > == "sw0-lr0"), action=(ct_clear; next;)
> > > >    table=??(ls_out_pre_lb      ), priority=110  , match=(nd || nd_rs
> > || nd_ra || mldv1 || mldv2), action=(next;)
> > > >    table=??(ls_out_pre_lb      ), priority=110  , match=(reg0[[16]] ==
> > 1), action=(next;)
> > > >  ])
> > > > diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> > > > index 8e356df6f..3e4ff32d1 100644
> > > > --- a/tests/system-ovn.at
> > > > +++ b/tests/system-ovn.at
> > > > @@ -8542,8 +8542,8 @@ check ovn-nbctl lsp-set-addresses sw1p1
> > "50:54:00:00:00:03 20.0.0.3"
> > > >
> > > >  check ovn-nbctl --apply-after-lb acl-add sw1 from-lport 1001 'inport
> > == "sw1p1" && ip4' drop
> > > >
> > > > -check ovn-nbctl acl-add sw1 to-lport 1002 'ip4 && tcp && tcp.dst ==
> > 80' allow-related
> > > > -check ovn-nbctl acl-add sw1 to-lport 1001 'ip4' drop
> > > > +check ovn-nbctl acl-add sw1 to-lport 1002 'outport == "sw1p1" && ip4
> > && tcp && tcp.dst == 80' allow-related
> > > > +check ovn-nbctl acl-add sw1 to-lport 1001 'outport == "sw1p1" && ip4'
> > drop
> >
> > Why are these changes necessary?
> >
> Applying ACL on the switch (w/ outport=..) applies ACL also on the switch
> ports, and the patch is breaking this use case.
> 
> >
> > > >
> > > >  ADD_NAMESPACES(sw0p1)
> > > >  ADD_VETH(sw0p1, sw0p1, br-int, "10.0.0.3/24", "50:54:00:00:00:02", \
> > > > @@ -13969,6 +13969,131 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query
> > port patch-.*/d
> > > >  AT_CLEANUP
> > > >  ])
> > > >
> > > > +OVN_FOR_EACH_NORTHD([
> > > > +AT_SETUP([ovn -- ACL Sampling - Stateful ACL - routed traffic])
> > > > +AT_KEYWORDS([ACL])
> > > > +
> > > > +CHECK_CONNTRACK()
> > > > +CHECK_CONNTRACK_NAT()
> > > > +ovn_start
> > > > +OVS_TRAFFIC_VSWITCHD_START()
> > > > +ADD_BR([br-int])
> > > > +
> > > > +dnl Set external-ids in br-int needed for ovn-controller
> > > > +check ovs-vsctl \
> > > > +        -- set Open_vSwitch . external-ids:system-id=hv1 \
> > > > +        -- set Open_vSwitch .
> > external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
> > > > +        -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
> > > > +        -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
> > > > +        -- set bridge br-int fail-mode=secure
> > other-config:disable-in-band=true
> > > > +
> > > > +dnl Start ovn-controller
> > > > +start_daemon ovn-controller
> > > > +
> > > > +dnl Logical network:
> > > > +dnl 1 router
> > > > +dnl 2 logical switch
> > > > +dnl 2 VIF
> > > > +
> > > > +check ovn-nbctl                                      \
> > > > +  -- lr-add lr                                       \
> > > > +  -- lrp-add lr lrp1 00:00:00:00:01:00 42.42.42.1/24 \
> > > > +  -- lrp-add lr lrp2 00:00:00:00:02:00 43.43.43.1/24 \
> > > > +  -- ls-add ls1                                      \
> > > > +  -- ls-add ls2                                      \
> > > > +  -- lsp-add ls1 vm1                                 \
> > > > +  -- lsp-add ls2 vm2                                 \
> > > > +  -- lsp-set-addresses vm1 00:00:00:00:00:01         \
> > > > +  -- lsp-set-addresses vm2 00:00:00:00:00:02         \
> > > > +  -- lsp-add ls1 ls1-lr                              \
> > > > +  -- lsp-add ls2 ls2-lr                              \
> > > > +  -- lsp-set-type ls1-lr router                      \
> > > > +  -- lsp-set-type ls2-lr router                      \
> > > > +  -- lsp-set-options ls1-lr router-port=lrp1         \
> > > > +  -- lsp-set-options ls2-lr router-port=lrp2         \
> > > > +  -- lsp-set-addresses ls1-lr router                 \
> > > > +  -- lsp-set-addresses ls2-lr router
> > > > +check ovn-nbctl --wait=hv sync
> > > > +
> > > > +ADD_NAMESPACES(vm1)
> > > > +ADD_VETH(vm1, vm1, br-int, "42.42.42.2/24", "00:00:00:00:00:01",
> > "42.42.42.1")
> > > > +ADD_NAMESPACES(vm2)
> > > > +ADD_VETH(vm2, vm2, br-int, "43.43.43.2/24", "00:00:00:00:00:02",
> > "43.43.43.1")
> > > > +
> > > > +wait_for_ports_up
> > > > +
> > > > +NS_CHECK_EXEC([vm1], [ping -q -c 3 -i 0.3 -w 2 43.43.43.2 |
> > FORMAT_PING], \
> > > > +[0], [dnl
> > > > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > > > +])
> > > > +
> > > > +collector1=$(ovn-nbctl create Sample_Collector id=1 name=c1
> > probability=65535 set_id=100)
> > > > +collector2=$(ovn-nbctl create Sample_Collector id=2 name=c2
> > probability=65535 set_id=200)
> > > > +check_row_count nb:Sample_Collector 2
> > > > +
> > > > +check_uuid ovn-nbctl create Sampling_App type="acl-est" id="43"
> > > > +check_row_count nb:Sampling_App 1
> > > > +
> > > > +check ovn-nbctl pg-add pg vm1 vm2
> > > > +check_uuid ovn-nbctl
> >                        \
> > > > +    -- --id=@sample1 create Sample collector="$collector1"
> > metadata=1001                     \
> > > > +    -- --sample-est=@sample1 acl-add pg from-lport 200 "inport == @pg
> > && udp" allow-related  \
> > > > +    -- --id=@sample2 create Sample collector="$collector2"
> > metadata=1004                     \
> > > > +    -- --sample-est=@sample2 acl-add pg to-lport 200 "outport == @pg
> > && udp" allow-related
> > > > +check ovn-nbctl --wait=hv sync
> > > > +check_row_count nb:ACL 2
> > > > +check_row_count nb:Sample 2
> > > > +
> > > > +NETNS_DAEMONIZE([vm1], [nc -l -u 1234], [udp1.pid])
> > > > +NETNS_DAEMONIZE([vm2], [nc -l -u 1235], [udp2.pid])
> > > > +
> > > > +dnl Start an IPFIX collector.
> > > > +DAEMONIZE([nfcapd -B 1024000 -w . -p 4242 2> collector.err],
> > [collector.pid])
> > > > +dnl Wait for the collector to be up.
> > > > +OVS_WAIT_UNTIL([grep -q 'Startup nfcapd.' collector.err])
> > > > +
> > > > +dnl Configure the OVS flow sample collector.
> > > > +ovs-vsctl --id=@br get Bridge br-int \
> > > > +    -- --id=@ipfix1 create IPFIX targets=\"127.0.0.1:4242\"
> > template_interval=1   \
> > > > +    -- --id=@cs1 create Flow_Sample_Collector_Set id=100 bridge=@br
> > ipfix=@ipfix1 \
> > > > +    -- --id=@ipfix2 create IPFIX targets=\"127.0.0.1:4242\"
> > template_interval=1   \
> > > > +    -- --id=@cs2 create Flow_Sample_Collector_Set id=200 bridge=@br
> > ipfix=@ipfix2
> > > > +
> > > > +check ovn-nbctl --wait=hv sync
> > > > +dnl And wait for it to be up and running.
> > > > +OVS_WAIT_UNTIL([ovs-ofctl dump-ipfix-flow br-int | grep -q '2 ids'])
> > > > +
> > > > +# Create the connection in the CT table.
> > > > +echo "Hello" > Hello
> > > > +NS_EXEC([vm1], [nc -p 1234 -u 43.43.43.2 1235 < Hello])
> > > > +NS_EXEC([vm2], [nc -p 1235 -u 42.42.42.2 1234 < Hello])
> > > > +# Send some more packets.
> > > > +for i in $(seq 10); do
> > > > +NS_EXEC([vm1], [nc -p 1234 -u 43.43.43.2 1235 < Hello])
> > > > +NS_EXEC([vm2], [nc -p 1235 -u 42.42.42.2 1234 < Hello])
> > > > +done
> > > > +
> > > > +OVS_WAIT_UNTIL([ovs-ofctl dump-ipfix-flow br-int | grep 'id 100' |
> > grep -q 'sampled pkts=21'])
> > > > +OVS_WAIT_UNTIL([ovs-ofctl dump-ipfix-flow br-int | grep 'id 200' |
> > grep -q 'sampled pkts=21'])
> > > > +
> > > > +OVN_CLEANUP_CONTROLLER([hv1])
> > > > +
> > > > +as ovn-sb
> > > > +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> > > > +
> > > > +as ovn-nb
> > > > +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> > > > +
> > > > +as northd
> > > > +OVS_APP_EXIT_AND_WAIT([ovn-northd])
> > > > +
> > > > +as
> > > > +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
> > > > +/connection dropped.*/d"])
> > > > +
> > > > +AT_CLEANUP
> > > > +])
> > > > +
> > > >  OVN_FOR_EACH_NORTHD([
> > > >  AT_SETUP([SB Disconnect - MAC_Binding])
> > > >  ovn_start
> > > > --
> > > > 2.51.0
> > > >
> > > _______________________________________________
> > > dev mailing list
> > > [email protected]
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> > _______________________________________________
> > dev mailing list
> > [email protected]
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to