> 
> Hi, Salem.  This patch looks clearly AI-generated.  You need to disclose
> that fact by providing the Assisted-by tag, see the contribution guide.
> 
> Also, while code seems fine at the first glance, most of the comments are
> (as typical with LLMs) way too long, some only make sense in the context
> of the patch and not in the context of the file they are in, a lot of
> comments, especially in the test, are obvious.  Please, clean this up.
> 
> I didn't review the code in details, but see a couple comments below.
Hi Ilya, thank you for the review. I gave addressed all your comments and
adapted accordingly in V3 for your review at your convenience.
> 
>> 
>> However, OVN also produces OFPP_NONE in the frozen state at logical
>> datapath crossings by explicitly clearing in_port mid-pipeline
>> (load:0xffff->in_port) before a ct() freeze.  In this case the
>> megaflow still has a real datapath in_port.
>> 
>> When that port is deleted, the revalidator re-translates these
>> megaflows.  The shortcut bypasses port validation, the actions look
>> unchanged, and the megaflow is kept.
>> Stale megaflows then accumulate until max-idle fires,
> 
> The commeit message and the comments fixate on the word 'megaflow'
> which is a wrong word to use in this context, IMO.  These are datapath
> flows.  Megaflow is the match criteria with the mask, which is not
> really related to what is going on here.
> 
>> which is problematic in high-churn environments like Kubernetes and
>> especially in offload setups where matching datapath flows are held
>> in hardware.
>> 
>> Fix this by verifying that the megaflow's datapath in_port still maps
>> to an OF port via odp_port_to_ofport() in the shortcut path.  Skip
>> the check when in_port.odp_port == ODPP_NONE so the original
>> controller packet-out scenarios continue to work.
>> 
>> A regression test reproducing the bug on the dummy datapath is added.
> 
> 
> 
>> 
>> [1]: 323ae1e808e6 ("ofproto-dpif-xlate: Fix recirculation when in_port is 
>> OFPP_CONTROLLER.")
> 
> Don't duplicate the commit line, if you need to reference the commit in
> the fixes tag, just say so or reference it as 'cited commit'.
> 
>> 
>> Fixes: 323ae1e808e6 ("ofproto-dpif-xlate: Fix recirculation when in_port is 
>> OFPP_CONTROLLER.")
>> Signed-off-by: Salem Sol <[email protected]>
>> ---
>> ofproto/ofproto-dpif-xlate.c | 33 ++++++++++++++++++++-
>> tests/ofproto-dpif.at        | 56 ++++++++++++++++++++++++++++++++++++
>> 2 files changed, 88 insertions(+), 1 deletion(-)
>> 
>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>> index c1ba447e93..972df39935 100644
>> --- a/ofproto/ofproto-dpif-xlate.c
>> +++ b/ofproto/ofproto-dpif-xlate.c
>> @@ -1689,7 +1689,10 @@ xlate_lookup_ofproto_(const struct dpif_backer 
>> *backer,
>>             * the packet originated from OFPP_CONTROLLER passed
>>             * through a patch port.
>>             *
>> -             * OFPP_NONE can also indicate that a bond caused 
>> recirculation. */
>> +             * OFPP_NONE can also indicate that a bond caused recirculation,
>> +             * or that an OpenFlow action (e.g. 
>> load:0xffff->NXM_OF_IN_PORT[])
>> +             * cleared the OF in_port mid-pipeline before a freeze (as OVN
>> +             * does at logical-datapath crossings). */
>>            struct uuid uuid = recirc_id_node->state.ofproto_uuid;
>>            const struct xbridge *bridge = xbridge_lookup_by_uuid(xcfg, 
>> &uuid);
>> 
>> @@ -1698,6 +1701,34 @@ xlate_lookup_ofproto_(const struct dpif_backer 
>> *backer,
>>                    !get_ofp_port(bridge, in_port)) {
>>                    goto xport_lookup;
>>                }
>> +
>> +                /* Even when the frozen state's OF in_port is OFPP_NONE or
>> +                 * OFPP_CONTROLLER, the megaflow under revalidation still 
>> has
>> +                 * a concrete datapath in_port in its key
>> +                 * (flow->in_port.odp_port).  If that ingress can no longer
>> +                 * be resolved -- e.g. the underlying port has been deleted,
>> +                 * or the tunnel removed -- then this megaflow is 
>> unreachable
>> +                 * (no future packet can match a non-existent ingress) and
>> +                 * must be evicted.  Mirror the xport_lookup logic below so
>> +                 * tunnel arrivals are handled the same way, and fall 
>> through
>> +                 * to xport_lookup when the ingress is gone so we return
>> +                 * NULL/ENODEV.
>> +                 *
>> +                 * ODPP_NONE means there is no datapath ingress at all
>> +                 * (legitimate for controller-originated packet-outs); skip
>> +                 * the check there. */
>> +                if (flow->in_port.odp_port != ODPP_NONE) {
>> +                    const struct ofport_dpif *ingress;
>> +
>> +                    ingress = tnl_port_should_receive(flow)
>> +                              ? tnl_port_receive(flow)
>> +                              : odp_port_to_ofport(backer,
>> +                                                   flow->in_port.odp_port);
>> +                    if (!ingress) {
>> +                        goto xport_lookup;
>> +                    }
>> +                }
>> +
>>                if (errorp) {
>>                    *errorp = NULL;
>>                }
>> diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
>> index dcab7bba45..b15457b06f 100644
>> --- a/tests/ofproto-dpif.at
>> +++ b/tests/ofproto-dpif.at
>> @@ -9010,6 +9010,62 @@ packets:2, bytes:68, used:0.001s, actions:drop
>> OVS_VSWITCHD_STOP(["/sending to collector failed/d"])
>> AT_CLEANUP
>> 
>> +dnl Reproducer for stale recirculated megaflows that survive port deletion
>> +dnl when the OpenFlow pipeline clears in_port 
>> (load:0xffff->NXM_OF_IN_PORT[])
>> +dnl before triggering a freeze with ct(table=N).  This is the exact pattern
>> +dnl OVN uses at logical-datapath crossings, and the resulting frozen state
>> +dnl with metadata.in_port == OFPP_NONE makes xlate_lookup_ofproto_() bypass
>> +dnl any validation of the datapath in_port that the megaflow keys on, so the
>> +dnl megaflow is revalidated as UKEY_KEEP for max-idle even
>> +dnl though the underlying port is gone.
>> +AT_SETUP([ofproto-dpif - stale recirc megaflow after in_port deletion 
>> (OFPP_NONE frozen in_port)])
> 
> The name is too long.
> 
>> +AT_KEYWORDS([megaflow revalidator recirc])
>> +OVS_VSWITCHD_START
>> +add_of_ports br0 1 2 3
> 
> Why 3 ports?
> 
>> +
>> +dnl Pipeline mirrors OVN's logical-datapath-crossing pattern:
>> +dnl   table=0: receive on p3, clear OF in_port, then ct(table=1) which 
>> freezes
>> +dnl   table=1: post-recirc, output to p2
>> +AT_DATA([flows.txt], [dnl
>> +table=0,in_port=3,tcp 
>> actions=load:0xffff->NXM_OF_IN_PORT[[]],ct(zone=1,table=1)
>> +table=1,tcp,ct_state=+trk actions=output:2
>> +])
>> +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
>> +
>> +dnl Send a packet to install both the recirc_id=0 and the recirc_id=0x1 
>> ukeys.
>> +AT_CHECK([ovs-appctl netdev-dummy/receive p3 \
>> +  
>> 'in_port(3),eth(src=50:54:00:00:00:08,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=192.168.0.2,dst=192.168.0.1,proto=6,tos=0,ttl=64,frag=no),tcp(src=1234,dst=80)'])
>> +
> 
> Use an m4_define for a more compact definition for the packet, see TRACE_PKT
> as an example.
> 
>> +dnl Sanity: both the pre-CT (recirc_id=0,in_port=3) and the post-CT
>> +dnl (recirc_id=0x1,in_port=3) megaflows must be installed.  Use
>> +dnl OVS_WAIT_UNTIL so the test does not depend on revalidator exec speed.
>> +OVS_WAIT_UNTIL([test $(ovs-appctl dpctl/dump-flows | dnl
>> +  grep -c 'recirc_id(0),in_port(3)') -eq 1], [ovs-appctl dpctl/dump-flows])
>> +OVS_WAIT_UNTIL([test $(ovs-appctl dpctl/dump-flows | dnl
>> +  grep -c 'recirc_id(0x1),in_port(3)') -eq 1], [ovs-appctl 
>> dpctl/dump-flows])
> 
> Don't use OVS_WAIT_UNTIL.  Use revalidator/wait (when needed) + AC_CHECK.  
> Also,
> we know exactly how all the datapath flows here should look like, so just 
> match
> on that (with all the appropriate strip_* options).
> 
>> +
>> +dnl Delete the ingress port.  This triggers REV_RECONFIGURE, after which
>> +dnl the revalidator must drop any datapath megaflows that key on the
>> +dnl now-deleted port.
>> +
>> +AT_CHECK([ovs-vsctl del-port br0 p3])
>> +
>> +dnl The recirc_id=0 flow is correctly deleted (no frozen state involved).
>> +OVS_WAIT_UNTIL([test $(ovs-appctl dpctl/dump-flows | dnl
>> +  grep -c 'recirc_id(0),in_port(3)') -eq 0], [ovs-appctl dpctl/dump-flows])
>> +
>> +dnl The recirc_id>0 megaflow MUST also be deleted: its datapath key is
>> +dnl in_port(3) which no longer exists.  Without the fix this wait times
>> +dnl out, because xlate_lookup_ofproto_() takes the frozen-in_port=OFPP_NONE
>> +dnl shortcut and returns SUCCESS without verifying the datapath in_port,
>> +dnl so revalidate_ukey__() decides UKEY_KEEP and the megaflow only goes
>> +dnl away when max-idle fires.
>> +OVS_WAIT_UNTIL([test $(ovs-appctl dpctl/dump-flows | dnl
>> +  grep -c 'recirc_id(0x1),in_port(3)') -eq 0], [ovs-appctl 
>> dpctl/dump-flows])
>> +
>> +OVS_VSWITCHD_STOP
>> +AT_CLEANUP
>> +
>> dnl Bridge IPFIX statistics check
>> AT_SETUP([ofproto-dpif - Bridge IPFIX statistics check])
>> OVS_VSWITCHD_START


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to