On 2 Jun 2026, at 22:50, [email protected] wrote:

> From: Numan Siddique <[email protected]>
>
> Hello,
>
> Below is a side-by-side trace of the same OVN-driven datapath pipeline,
> In our prod deployment we are seeing intermittent offload issues. All
> the datapath flows of a chain are getting offloaded except the last one.
> It is installed in the kernel dp because of which it kills the performance.
> If I run the command - ovs-appctl dpctl/del-flows,  the problematic
> flow gets offloaded.
>
> The issue again can be reproduced if we run a script to the delete
> the dp flows in a loop with a sleep of 5 seconds.  Generally the issue
> gets surfaced after 5-6 del flows.
>
> Below are the datapath flows when the issue is seen
>
> The traffic is destined to the public IP(s) (there are 2 public ips in
> our setup) of the VM and enters the compute node PF and via the br-ex
> to the OVN pipeline.
>
> -------------------------------------------------------------------------------
> 1. BEFORE FLUSH
> -------------------------------------------------------------------------------
>
> Two upstream branches merge into the stranded chain `0x314610`.  Each
> branch is exactly two TC-offloaded stages followed by the umbrella that
> is stuck in dp:ovs.
>
>    +------------------------------+      +------------------------------+
>    | recirc_id(0)        BRANCH A |      | recirc_id(0)        BRANCH B |
>    | ufid:b81b9ab4                |      | ufid:29ac3fbf                |
>    | in_port=enp210s0f0np0  (PF)  |      | in_port=enp210s0f0np0  (PF)  |
>    | eth_type=0x8100, VLAN 120    |      | eth_type=0x8100, VLAN 26     |
>    | eth(src=b0:cf:0e:b1:31:ff,   |      | eth(src=b0:cf:0e:b1:31:ff,   |
>    |     dst=ae:ad:c9:2a:9d:0f)   |      |     dst=ae:ad:c9:2a:9d:0f)   |
>    | ipv4(dst=AA.BB.CC.DD,        |      | ipv4(dst=XX.YY.ZZ.AA,        |
>    |      src=32.0.0.0/224.0.0.0, |      |      src=8.0.0.0/248.0.0.0,  |
>    |      ttl=119)                |      |      ttl=62)                 |
>    | ct_state(0/0x2b)             |      | ct_state(0/0x2b)             |
>    | ct_mark(0/0x2)               |      | ct_mark(0/0x2)               |
>    |                              |      |                              |
>    | actions:                     |      | actions:                     |
>    |   pop_vlan,                  |      |   pop_vlan,                  |
>    |   ct(zone=6,nat),            |      |   ct(zone=5,nat),            |
>    |   recirc(0x320213)           |      |   recirc(0x321f17)           |
>    |                              |      |                              |
>    | pkts=565   bytes=44636       |      | pkts=26,867,289              |
>    | used=1.640s                  |      | bytes=241,881,487,034        |
>    | offloaded:yes, dp:tc         |      | used=0.620s                  |
>    +------------------------------+      | offloaded:yes, dp:tc         |
>                   |                      +------------------------------+
>                   | post-DNAT in zone 6                  |
>                   v                                     | post-DNAT in zone 5
>    +------------------------------+                     v
>    | recirc_id(0x320213)          |      +------------------------------+
>    | ufid:3413a279                |      | recirc_id(0x321f17)          |
>    | in_port=enp210s0f0np0  (PF)  |      | ufid:9f638cd3                |
>    | ct_state(0x2a/0x3e)          |      | in_port=enp210s0f0np0  (PF)  |
>    | ct_mark(0/0x43)              |      | ct_state(0x2a/0x3e)          |
>    | eth(src=b0:cf:0e:b1:31:ff,   |      | ct_mark(0/0x43)              |
>    |     dst=ae:ad:c9:2a:9d:0f)   |      | eth(src=b0:cf:0e:b1:31:ff,   |
>    | ipv4(src=0.0.0.0/128.0.0.0,  |      |     dst=ae:ad:c9:2a:9d:0f)   |
>    |      dst=172.27.61.7,        |      | ipv4(src=8.0.0.0/248.0.0.0,  |
>    |      proto=6, ttl=119)       |      |      dst=172.27.61.7,        |
>    |                              |      |      proto=6, ttl=62)        |
>    | actions:                     |      |                              |
>    |   ct_clear,                  |      | actions:                     |
>    |   set(eth src=               |      |   ct_clear,                  |
>    |        1a:83:58:7b:a8:ed),   |      |   set(eth src=               |
>    |   set(ipv4 ttl=118),         |      |        1a:83:58:7b:a8:ed),   |
>    |   ct(zone=11,nat),           |      |   set(ipv4 ttl=60),          |
>    |   recirc(0x314610)           |      |   ct(zone=11,nat),           |
>    |                              |      |   recirc(0x314610)           |
>    | pkts=565   bytes=44636       |      |                              |
>    | used=1.640s                  |      | pkts=26,867,289              |
>    | offloaded:yes, dp:tc         |      | bytes=241,881,487,034        |
>    +------------------------------+      | used=0.620s                  |
>                   |                      | offloaded:yes, dp:tc         |
>                   |                      +------------------------------+
>                   |                                     |
>                   +--------------+        +-------------+
>                                  |        |
>                                  v        v
>                   +--------------------------------------+
>                   | recirc_id(0x314610)        STAGE 2   |
>                   | ufid:1ee350bf                        |
>                   | in_port=enp210s0f0np0   (PF)         |
>                   | ct_state(0x2a/0x3f)  <-- mask 0x3f   |
>                   | ct_mark(0/0x41)                      |
>                   | eth(src=*, dst=ae:ad:c9:2a:9d:0f)    |
>                   | ipv4(src=*, dst=172.27.61.7,         |
>                   |      proto=0/0, ttl=0/0)             |
>                   |                                      |
>                   | actions: enp210s0f0_1   (VF)         |
>                   |                                      |
>                   | pkts=41,192,879                      |
>                   | bytes=2,502,536,363,732              |
>                   | used=0.020s, flags=SFPR.             |
>                   |                                      |
>                   | dp:ovs   <-- STRANDED, NOT OFFLOADED |
>                   +--------------------------------------+
>
>
> -------------------------------------------------------------------------------
> 2. AFTER FLUSH (ovs-appctl dpctl/del-flows)
> -------------------------------------------------------------------------------
>
> After `ovs-appctl dpctl/del-flows` everything is re-installed in the
> natural pipeline order, so the chain check passes for every stage.
> The megaflow masks have not been re-aggregated yet, so we see a
> "fanned out" pipeline:
>
>    +----------------+    +----------------+    +----------------+
>    | recirc_id(0)   |    | recirc_id(0)   |    | (parent for    |
>    |    BRANCH A    |    |    BRANCH B    |    |  chain         |
>    | 5 sub-megaflows|    | 1 megaflow     |    |  0x3229d9 had  |
>    | vlan 120       |    | vlan 26        |    |  aged out at   |
>    | zone 6 NAT     |    | zone 5 NAT     |    |  dump time --  |
>    |                |    |                |    |  the two       |
>    | dst=           |    | dst=           |    |  stage-1       |
>    |  AA.BB.CC.DD   |    |  XX.YY.ZZ.AA   |    |  flows below   |
>    | by src/ttl:    |    | src=8.0.0.0/5  |    |  had pkts=0)   |
>    |  104/5 ttl=56  |    |  ttl=62        |    |                |
>    |  32/3  ttl=119 |    |                |    |  ufid:1b6d210e |
>    |  124/7 ttl=234 |    | pkts=14,326,765|    |  -- not        |
>    |  32/3  ttl=122 |    | bytes=128.7 GB |    |     captured   |
>    |  192/3 ttl=243 |    | used=0.660s    |    |     for branch |
>    |                |    |                |    |     C          |
>    | actions:       |    | actions:       |    |                |
>    |  pop_vlan,     |    |  pop_vlan,     |    |                |
>    |  ct(zone=6,    |    |  ct(zone=5,    |    |                |
>    |     nat),      |    |     nat),      |    |                |
>    |  recirc(       |    |  recirc(       |    |                |
>    |   0x320213)    |    |   0x321f17)    |    |                |
>    | offloaded:yes  |    | offloaded:yes  |    |                |
>    | dp:tc          |    | dp:tc          |    |                |
>    +----------------+    +----------------+    +----------------+
>             |                    |                       :
>             v                    v                       v
>    +----------------+    +----------------+    +----------------+
>    | recirc_id      |    | recirc_id      |    | recirc_id      |
>    |  (0x320213)    |    |  (0x321f17)    |    |  (0x3229d9)    |
>    |                |    |                |    |                |
>    | 3 sub-megaflows|    | 1 megaflow     |    | 2 megaflows    |
>    | ct_state(      |    | ct_state(      |    | ct_state(      |
>    |  0x2a/0x3e)    |    |  0x2a/0x3e)    |    |  0x21/0x3f)    |
>    | (+est+rpl+trk) |    | (+est+rpl+trk) |    | (+new+trk)     |
>    |                |    |                |    |                |
>    | ttl 119 -> 118 |    | ttl 62  -> 60  |    | ttl 234 -> 233 |
>    | ttl 56  -> 55  |    |                |    | ttl 243 -> 242 |
>    | ttl 122 -> 121 |    | pkts=14,326,690|    |                |
>    |                |    | bytes=128.7 GB |    | pkts=0  (new   |
>    | pkts=68+9+1=78 |    | used=0.660s    |    |   conn attempts|
>    |                |    |                |    |   in flight)   |
>    | actions:       |    | actions:       |    |                |
>    |  ct_clear,     |    |  ct_clear,     |    | actions:       |
>    |  set(eth src=  |    |  set(eth src=  |    |  (same shape   |
>    |   1a:83:..),   |    |   1a:83:..),   |    |   as branch    |
>    |  set(ipv4 ttl  |    |  set(ipv4 ttl  |    |   A/B stage 1) |
>    |   -1),         |    |   -1),         |    |  recirc(       |
>    |  ct(zone=11,   |    |  ct(zone=11,   |    |   0x314610)    |
>    |   nat),        |    |   nat),        |    |                |
>    |  recirc(       |    |  recirc(       |    | offloaded:yes  |
>    |   0x314610)    |    |   0x314610)    |    | dp:tc          |
>    | offloaded:yes  |    | offloaded:yes  |    |                |
>    | dp:tc          |    | dp:tc          |    |                |
>    +----------------+    +----------------+    +----------------+
>             |                    |                       |
>             +--------+           |          +------------+
>                      |           |          |
>                      v           v          v
>             +-----------------------------------------+
>             | recirc_id(0x314610)     STAGE 2         |
>             |                                         |
>             | Three flows now (all offloaded:yes,     |
>             | dp:tc):                                 |
>             |                                         |
>             | 1. ufid:c51ef89d   <-- the umbrella     |
>             |    ct_state(0x2a/0x3e)  <-- mask 0x3e   |
>             |    ct_mark(0/0x41)                      |
>             |    eth(src=*, dst=ae:ad:c9:2a:9d:0f)    |
>             |    ipv4(dst=172.27.61.7)                |
>             |    actions: enp210s0f0_1   (VF)         |
>             |    pkts=14,326,720                      |
>             |    bytes=128,674,265,194                |
>             |    used=0.660s                          |
>             |                                         |
>             | 2. ufid:d6f6c8c3   (DROP, new conn ACL) |
>             |    ct_state(0x21/0x3f)  (+new+trk)      |
>             |    eth(src=1a:83:58:7b:a8:ed,           |
>             |        dst=ae:ad:00:00:00:00/           |
>             |            ff:ff:00:00:00:00)           |
>             |    dst=172.27.60.0/23,                  |
>             |    tcp ports w/ submask                 |
>             |    actions: drop                        |
>             |    pkts=0                               |
>             |                                         |
>             | 3. ufid:0b52d8bd   (DROP, new conn ACL) |
>             |    same shape, different tcp submask    |
>             |    actions: drop                        |
>             |    pkts=0                               |
>             +-----------------------------------------+
>
>
> (Note:  The above ascii graph is generated by Claude)
>
>
> In the OVS logs we also see the below msg 
> (https://github.com/openvswitch/ovs/blob/main/lib/dpif-offload-tc-netdev.c#L2363)
>
> ```
> 2026-06-01T21:15:33.774Z|10763|netdev_offload_tc(handler18)|DBG|
>   match for chain 3229200 failed due to non-existing goto chain action
> ```
>
> There seems to be a race condition during the ccmap 'used_chains'.
>
> As per Claude, the issue seems to be introduced in the commit :
> `273a4fce951a`** — `netdev-offload-tc: Only install recirc flows if the 
> parent is present.`
> and there is a possibility of a race window in the function 
> netdev_tc_flow_put()
> between 
> https://github.com/openvswitch/ovs/blob/main/lib/dpif-offload-tc-netdev.c#L2695
> and 
> https://github.com/openvswitch/ovs/blob/main/lib/dpif-offload-tc-netdev.c#L2730
>
> @Eelco @Ilya - Do you have any idea on what could be going on here ?

Hi Numan,

Sorry for the late response, but this message ended up in my
spam box which I was cleaning up :( Put Ilya also on the TO
line, maybe it ended up in his spam also.

I'm on PTO on Monday, so will try to take a look at this later
in the week.

//Eelco

> Let me know if you need more information.  I'll try to debug further.
>
> Thanks
> Numan

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to