Hi!

We've encountered a strange issue while backporting patches to the 
version 24.03 branch (ovs v3.3.4) and running tests. Let me describe the 
situation:
I took the upstream branch 24.03, added a stage at the beginning of the 
switch pipeline, and added a 'match all' flow with 'next;' action. 
Commit example: 
https://github.com/Sashhkaa/ovn/commit/f20295315c327addfeb6fe455c3b3c655d6b3666.
 
After this change, OVN 79-82 userspace tests (ECMP symmetric reply) 
started failing.
According to the test logs, I see the following:
The test expects to see the conntrack state ct_state(+new-est-rpl+trk) 
in the datapath flow, but gets ct_state(+new-rpl+trk) - that is, -est 
disappears. I will also attach more detailed dumps below.

The expected state should be set by matching this OpenFlow rule in table 
17 (in OVN it is router pipeline table 9 - ECMP stateful):

  cookie=0xdda3b0a7, duration=2.635s, table=17, n_packets=6, 
n_bytes=636, idle_age=1, 
priority=100,ct_state=+new-rpl+trk,ipv6,reg14=0x2,metadata=0x1,ipv6_dst=fd01::/126
 
actions=ct(commit,zone=NXM_NX_REG11[0..15],nat(src),exec(move:NXM_OF_ETH_SRC[]->NXM_NX_CT_LABEL[32..79],load:0x2->NXM_NX_CT_MARK[16..31])),resubmit(,18)
 

  cookie=0xdda3b0a7, duration=2.635s, table=17, n_packets=14, 
n_bytes=1396, idle_age=0, 
priority=100,ct_state=+est-rpl+trk,ipv6,reg14=0x2,metadata=0x1,ipv6_dst=fd01::/126
 
actions=ct(commit,zone=NXM_NX_REG11[0..15],nat(src),exec(move:NXM_OF_ETH_SRC[]->NXM_NX_CT_LABEL[32..79],load:0x2->NXM_NX_CT_MARK[16..31])),resubmit(,18)
 


I found two logical flow changes, that work, though it's not clear why:
1) Adding a router table before ECMP processing:
By inserting just one table at the very beginning of the router 
pipeline, before the ECMP stateful handling (for example, 
https://github.com/odivlad/ovn/commit/eb6d0d7409ff78f1fc0908a28225d0a2a47daa29 
one table is enough), the test starts passing. The mechanism isn't clear 
- packets now match the default flow in table 17 and only hit the proper 
ECMP rule in table 18, yet this somehow resolves the issue.
2) Modifying ACL evaluation rules:
The second solution is even more strange. Since this test case doesn't 
use ACLs or load balancers, northd adds match all' flow with 'next;' 
action and priority 65535 to the acl_eval table (logical table 9 in 
switch, OpenFlow table 17). When we lower the priority of these rules 
below 100(less priority for the ecmp rules), the test begins working. 
This suggests some hidden interaction between router and switch pipeline 
rules, despite their different metadata matching criteria.

When examining the OVS traces for both cases - the initial failed test 
with just a stage addition versus the working version where we also 
modified the ACL eval table priority to 0 - the packet's path through 
the tables shows no differences except for two key aspects: first, the 
rule matching in ACL eval (OpenFlow table 17), and second, the resulting 
datapath action where the -est state unexpectedly disappears. The trace 
comparison reveals that only the rule priorities in table 17 actually 
changed, yet this somehow impacts the connection tracking state. You can 
see the complete trace comparison showing both scenarios - with just the 
stage addition and with the priority modification - along with the 
contents of table 17 and the diff between traces at this link: 
https://gist.github.com/Sashhkaa/58b2c616e7d46fc2dafb898ed832960f.
I've verified this behavior persists in newer versions of Open vSwitch 
as well.
Does anyone understand what could be causing this issue? I'd appreciate 
any insights or suggestions for a proper fix. Thank you!

-- 
regards,
Alexandra.

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to