From: Numan Siddique <[email protected]>

Hello,

Below is a side-by-side trace of the same OVN-driven datapath pipeline,
In our prod deployment we are seeing intermittent offload issues. All
the datapath flows of a chain are getting offloaded except the last one.
It is installed in the kernel dp because of which it kills the performance.
If I run the command - ovs-appctl dpctl/del-flows,  the problematic
flow gets offloaded.

The issue again can be reproduced if we run a script to the delete
the dp flows in a loop with a sleep of 5 seconds.  Generally the issue
gets surfaced after 5-6 del flows.

Below are the datapath flows when the issue is seen

The traffic is destined to the public IP(s) (there are 2 public ips in
our setup) of the VM and enters the compute node PF and via the br-ex
to the OVN pipeline.

-------------------------------------------------------------------------------
1. BEFORE FLUSH
-------------------------------------------------------------------------------

Two upstream branches merge into the stranded chain `0x314610`.  Each
branch is exactly two TC-offloaded stages followed by the umbrella that
is stuck in dp:ovs.

   +------------------------------+      +------------------------------+
   | recirc_id(0)        BRANCH A |      | recirc_id(0)        BRANCH B |
   | ufid:b81b9ab4                |      | ufid:29ac3fbf                |
   | in_port=enp210s0f0np0  (PF)  |      | in_port=enp210s0f0np0  (PF)  |
   | eth_type=0x8100, VLAN 120    |      | eth_type=0x8100, VLAN 26     |
   | eth(src=b0:cf:0e:b1:31:ff,   |      | eth(src=b0:cf:0e:b1:31:ff,   |
   |     dst=ae:ad:c9:2a:9d:0f)   |      |     dst=ae:ad:c9:2a:9d:0f)   |
   | ipv4(dst=AA.BB.CC.DD,        |      | ipv4(dst=XX.YY.ZZ.AA,        |
   |      src=32.0.0.0/224.0.0.0, |      |      src=8.0.0.0/248.0.0.0,  |
   |      ttl=119)                |      |      ttl=62)                 |
   | ct_state(0/0x2b)             |      | ct_state(0/0x2b)             |
   | ct_mark(0/0x2)               |      | ct_mark(0/0x2)               |
   |                              |      |                              |
   | actions:                     |      | actions:                     |
   |   pop_vlan,                  |      |   pop_vlan,                  |
   |   ct(zone=6,nat),            |      |   ct(zone=5,nat),            |
   |   recirc(0x320213)           |      |   recirc(0x321f17)           |
   |                              |      |                              |
   | pkts=565   bytes=44636       |      | pkts=26,867,289              |
   | used=1.640s                  |      | bytes=241,881,487,034        |
   | offloaded:yes, dp:tc         |      | used=0.620s                  |
   +------------------------------+      | offloaded:yes, dp:tc         |
                  |                      +------------------------------+
                  | post-DNAT in zone 6                  |
                  v                                     | post-DNAT in zone 5
   +------------------------------+                     v
   | recirc_id(0x320213)          |      +------------------------------+
   | ufid:3413a279                |      | recirc_id(0x321f17)          |
   | in_port=enp210s0f0np0  (PF)  |      | ufid:9f638cd3                |
   | ct_state(0x2a/0x3e)          |      | in_port=enp210s0f0np0  (PF)  |
   | ct_mark(0/0x43)              |      | ct_state(0x2a/0x3e)          |
   | eth(src=b0:cf:0e:b1:31:ff,   |      | ct_mark(0/0x43)              |
   |     dst=ae:ad:c9:2a:9d:0f)   |      | eth(src=b0:cf:0e:b1:31:ff,   |
   | ipv4(src=0.0.0.0/128.0.0.0,  |      |     dst=ae:ad:c9:2a:9d:0f)   |
   |      dst=172.27.61.7,        |      | ipv4(src=8.0.0.0/248.0.0.0,  |
   |      proto=6, ttl=119)       |      |      dst=172.27.61.7,        |
   |                              |      |      proto=6, ttl=62)        |
   | actions:                     |      |                              |
   |   ct_clear,                  |      | actions:                     |
   |   set(eth src=               |      |   ct_clear,                  |
   |        1a:83:58:7b:a8:ed),   |      |   set(eth src=               |
   |   set(ipv4 ttl=118),         |      |        1a:83:58:7b:a8:ed),   |
   |   ct(zone=11,nat),           |      |   set(ipv4 ttl=60),          |
   |   recirc(0x314610)           |      |   ct(zone=11,nat),           |
   |                              |      |   recirc(0x314610)           |
   | pkts=565   bytes=44636       |      |                              |
   | used=1.640s                  |      | pkts=26,867,289              |
   | offloaded:yes, dp:tc         |      | bytes=241,881,487,034        |
   +------------------------------+      | used=0.620s                  |
                  |                      | offloaded:yes, dp:tc         |
                  |                      +------------------------------+
                  |                                     |
                  +--------------+        +-------------+
                                 |        |
                                 v        v
                  +--------------------------------------+
                  | recirc_id(0x314610)        STAGE 2   |
                  | ufid:1ee350bf                        |
                  | in_port=enp210s0f0np0   (PF)         |
                  | ct_state(0x2a/0x3f)  <-- mask 0x3f   |
                  | ct_mark(0/0x41)                      |
                  | eth(src=*, dst=ae:ad:c9:2a:9d:0f)    |
                  | ipv4(src=*, dst=172.27.61.7,         |
                  |      proto=0/0, ttl=0/0)             |
                  |                                      |
                  | actions: enp210s0f0_1   (VF)         |
                  |                                      |
                  | pkts=41,192,879                      |
                  | bytes=2,502,536,363,732              |
                  | used=0.020s, flags=SFPR.             |
                  |                                      |
                  | dp:ovs   <-- STRANDED, NOT OFFLOADED |
                  +--------------------------------------+


-------------------------------------------------------------------------------
2. AFTER FLUSH (ovs-appctl dpctl/del-flows)
-------------------------------------------------------------------------------

After `ovs-appctl dpctl/del-flows` everything is re-installed in the
natural pipeline order, so the chain check passes for every stage.
The megaflow masks have not been re-aggregated yet, so we see a
"fanned out" pipeline:

   +----------------+    +----------------+    +----------------+
   | recirc_id(0)   |    | recirc_id(0)   |    | (parent for    |
   |    BRANCH A    |    |    BRANCH B    |    |  chain         |
   | 5 sub-megaflows|    | 1 megaflow     |    |  0x3229d9 had  |
   | vlan 120       |    | vlan 26        |    |  aged out at   |
   | zone 6 NAT     |    | zone 5 NAT     |    |  dump time --  |
   |                |    |                |    |  the two       |
   | dst=           |    | dst=           |    |  stage-1       |
   |  AA.BB.CC.DD   |    |  XX.YY.ZZ.AA   |    |  flows below   |
   | by src/ttl:    |    | src=8.0.0.0/5  |    |  had pkts=0)   |
   |  104/5 ttl=56  |    |  ttl=62        |    |                |
   |  32/3  ttl=119 |    |                |    |  ufid:1b6d210e |
   |  124/7 ttl=234 |    | pkts=14,326,765|    |  -- not        |
   |  32/3  ttl=122 |    | bytes=128.7 GB |    |     captured   |
   |  192/3 ttl=243 |    | used=0.660s    |    |     for branch |
   |                |    |                |    |     C          |
   | actions:       |    | actions:       |    |                |
   |  pop_vlan,     |    |  pop_vlan,     |    |                |
   |  ct(zone=6,    |    |  ct(zone=5,    |    |                |
   |     nat),      |    |     nat),      |    |                |
   |  recirc(       |    |  recirc(       |    |                |
   |   0x320213)    |    |   0x321f17)    |    |                |
   | offloaded:yes  |    | offloaded:yes  |    |                |
   | dp:tc          |    | dp:tc          |    |                |
   +----------------+    +----------------+    +----------------+
            |                    |                       :
            v                    v                       v
   +----------------+    +----------------+    +----------------+
   | recirc_id      |    | recirc_id      |    | recirc_id      |
   |  (0x320213)    |    |  (0x321f17)    |    |  (0x3229d9)    |
   |                |    |                |    |                |
   | 3 sub-megaflows|    | 1 megaflow     |    | 2 megaflows    |
   | ct_state(      |    | ct_state(      |    | ct_state(      |
   |  0x2a/0x3e)    |    |  0x2a/0x3e)    |    |  0x21/0x3f)    |
   | (+est+rpl+trk) |    | (+est+rpl+trk) |    | (+new+trk)     |
   |                |    |                |    |                |
   | ttl 119 -> 118 |    | ttl 62  -> 60  |    | ttl 234 -> 233 |
   | ttl 56  -> 55  |    |                |    | ttl 243 -> 242 |
   | ttl 122 -> 121 |    | pkts=14,326,690|    |                |
   |                |    | bytes=128.7 GB |    | pkts=0  (new   |
   | pkts=68+9+1=78 |    | used=0.660s    |    |   conn attempts|
   |                |    |                |    |   in flight)   |
   | actions:       |    | actions:       |    |                |
   |  ct_clear,     |    |  ct_clear,     |    | actions:       |
   |  set(eth src=  |    |  set(eth src=  |    |  (same shape   |
   |   1a:83:..),   |    |   1a:83:..),   |    |   as branch    |
   |  set(ipv4 ttl  |    |  set(ipv4 ttl  |    |   A/B stage 1) |
   |   -1),         |    |   -1),         |    |  recirc(       |
   |  ct(zone=11,   |    |  ct(zone=11,   |    |   0x314610)    |
   |   nat),        |    |   nat),        |    |                |
   |  recirc(       |    |  recirc(       |    | offloaded:yes  |
   |   0x314610)    |    |   0x314610)    |    | dp:tc          |
   | offloaded:yes  |    | offloaded:yes  |    |                |
   | dp:tc          |    | dp:tc          |    |                |
   +----------------+    +----------------+    +----------------+
            |                    |                       |
            +--------+           |          +------------+
                     |           |          |
                     v           v          v
            +-----------------------------------------+
            | recirc_id(0x314610)     STAGE 2         |
            |                                         |
            | Three flows now (all offloaded:yes,     |
            | dp:tc):                                 |
            |                                         |
            | 1. ufid:c51ef89d   <-- the umbrella     |
            |    ct_state(0x2a/0x3e)  <-- mask 0x3e   |
            |    ct_mark(0/0x41)                      |
            |    eth(src=*, dst=ae:ad:c9:2a:9d:0f)    |
            |    ipv4(dst=172.27.61.7)                |
            |    actions: enp210s0f0_1   (VF)         |
            |    pkts=14,326,720                      |
            |    bytes=128,674,265,194                |
            |    used=0.660s                          |
            |                                         |
            | 2. ufid:d6f6c8c3   (DROP, new conn ACL) |
            |    ct_state(0x21/0x3f)  (+new+trk)      |
            |    eth(src=1a:83:58:7b:a8:ed,           |
            |        dst=ae:ad:00:00:00:00/           |
            |            ff:ff:00:00:00:00)           |
            |    dst=172.27.60.0/23,                  |
            |    tcp ports w/ submask                 |
            |    actions: drop                        |
            |    pkts=0                               |
            |                                         |
            | 3. ufid:0b52d8bd   (DROP, new conn ACL) |
            |    same shape, different tcp submask    |
            |    actions: drop                        |
            |    pkts=0                               |
            +-----------------------------------------+


(Note:  The above ascii graph is generated by Claude)


In the OVS logs we also see the below msg 
(https://github.com/openvswitch/ovs/blob/main/lib/dpif-offload-tc-netdev.c#L2363)

```
2026-06-01T21:15:33.774Z|10763|netdev_offload_tc(handler18)|DBG|
  match for chain 3229200 failed due to non-existing goto chain action
```

There seems to be a race condition during the ccmap 'used_chains'.

As per Claude, the issue seems to be introduced in the commit :
`273a4fce951a`** — `netdev-offload-tc: Only install recirc flows if the parent 
is present.`
and there is a possibility of a race window in the function netdev_tc_flow_put()
between 
https://github.com/openvswitch/ovs/blob/main/lib/dpif-offload-tc-netdev.c#L2695
and 
https://github.com/openvswitch/ovs/blob/main/lib/dpif-offload-tc-netdev.c#L2730

@Eelco @Ilya - Do you have any idea on what could be going on here ?

Let me know if you need more information.  I'll try to debug further.

Thanks
Numan

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to