> On 8 Oct 2025, at 2:59 PM, Dumitru Ceara <[email protected]> wrote: > > !-------------------------------------------------------------------| > CAUTION: External Email > > |-------------------------------------------------------------------! > > On 10/7/25 8:36 PM, Sragdhara Datta Chaudhuri wrote: >> Hi Dumitru, >> > > Hi Sragdhara, > >> Thanks so much for your detailed review over multiple iterations. >> >> Numan, thank you as well for your valuable inputs. >> >> Regarding the points you mentioned in this email: >> >> * >> Agree that dedicated registers would be good. i picked reg5 as no >> other registers were completely free. The lack of registers would >> hamper other features in future as well. >> * >> We’ll take a look at the system test tcpdump issue and fix it in a >> future patch. >> * >> We noticed the test 622 (Network function packet flow >> outbound) failure but couldn’t reproduce in in-house even after >> running in a loop. We’ll look into it. >> > > Thank you! Let me know if you need help reproducing this in CI.
Hi Dumitru, I tried reproducing the issue locally in CI with the latest main but not able to even after multiple runs. I sent patch to fix the tcpdump issue for NF system test. Thanks, Naveen > > Regards, > Dumitru > >> >> Thanks, >> Sragdhara >> >> *From: *Dumitru Ceara <[email protected]> >> *Date: *Friday, October 3, 2025 at 2:01 AM >> *To: *Sragdhara Datta Chaudhuri <[email protected]>, ovs- >> [email protected] <[email protected]> >> *Cc: *Numan Siddique <[email protected]>, Naveen Yerramneni >> <[email protected]>, Karthik Chandrashekar >> <[email protected]>, Ilya Maximets <[email protected]> >> *Subject: *Re: [ovs-dev] [PATCH OVN v9 0/5] Network Function Insertion. >> >> !-------------------------------------------------------------------| >> CAUTION: External Email >> >> |-------------------------------------------------------------------! >> >> On 9/15/25 9:34 PM, Sragdhara Datta Chaudhuri wrote: >>> RFC: NETWORK FUNCTION INSERTION IN OVN >>> >>> 1. Introduction >>> ================ >>> The objective is to insert a Network Function (NF) in the path of >> outbound/inbound traffic from/to a port-group. The use case is to >> integrate a 3rd party service in the path of traffic. An example of such >> a service would be layer7 firewall. The NF VM will be like a bump in the >> wire and should not modify the packet, i.e. the IP header, the MAC >> addresses, VLAN tag, sequence numbers remain unchanged. >>> >>> Here are some of the highlights: >>> - A new entity network-function (NF) has been introduced. It contains >> a pair of LSPs. The CMS would designate one as “inport” and the other as >> “outport”. >>> - For high-availability, a network function group (NFG) entity >> consists of a group of NFs. Only one NF in a NFG has an active role >> based on health monitoring. >>> - ACL would accept NFG as a parameter and traffic matching the ACL >> would be redirected to the associated active NF’s port. NFG is accepted >> for stateful allow action only. >>> - The ACL’s port-group is the point of reference when defining the >> role of the NF ports. The “inport” is the port closer to the port-group >> and “outport” is the one away from it. For from-lport ACLs, the request >> packets would be redirected to the NF “inport” and for to-lport ACLs, >> the request packets would be redirected to NF “outport”. When the same >> packet comes out of the other NF port, it gets simply forwarded. >>> - Statefulness will be maintained, i.e. the response traffic will also >> go through the same pair of NF ports but in reverse order. >>> - For the NF ports we need to disable port security check, fdb >> learning and multicast/broadcast forwarding. >>> - Health monitoring involves ovn-controller periodically injecting >> ICMP probe packets into the NF inport and monitor the same packet coming >> out of the NF outport. >>> - If the traffic redirection involves cross-host traffic (e.g. for a >> from-lport ACL, if the source VM and NF VM are on different hosts), >> packets would be tunneled to and from the NF VM's host. >>> - If the port-group to which the ACL is being applied has members >> spread across multiple LSs, CMS needs to create child ports for the NF >> ports on each of these LSs. The redirection rules in each LS will use >> the child ports on that LS. >>> >>> 2. NB tables >>> ============= >>> New NB tables >>> —------------ >>> Network_Function: Each row contains {inport, outport, health_check} >>> Network_Function_Group: Each row contains a list of Network_Function >> entities. It also contains a unique id (between 1 and 255) and a >> reference to the current active NF. >>> Network_Function_Health_Check: Each row contains configuration for >> probes in options field: {interval, timeout, success_count, failure_count} >>> >>> "Network_Function_Health_Check": { >>> "columns": { >>> "name": {"type": "string"}, >>> "options": { >>> "type": {"key": "string", >>> "value": "string", >>> "min": 0, >>> "max": "unlimited"}}, >>> "external_ids": { >>> "type": {"key": "string", "value": "string", >>> "min": 0, "max": "unlimited"}}}, >>> "isRoot": true}, >>> "Network_Function": { >>> "columns": { >>> "name": {"type": "string"}, >>> "outport": {"type": {"key": {"type": "uuid", >>> "refTable": >> "Logical_Switch_Port", >>> "refType": "strong"}, >>> "min": 1, "max": 1}}, >>> "inport": {"type": {"key": {"type": "uuid", >>> "refTable": >> "Logical_Switch_Port", >>> "refType": "strong"}, >>> "min": 1, "max": 1}}, >>> "health_check": {"type": { >>> "key": {"type": "uuid", >>> "refTable": "Network_Function_Health_Check", >>> "refType": "strong"}, >>> "min": 0, "max": 1}}, >>> "external_ids": { >>> "type": {"key": "string", "value": "string", >>> "min": 0, "max": "unlimited"}}}, >>> "isRoot": true}, >>> "Network_Function_Group": { >>> "columns": { >>> "name": {"type": "string"}, >>> "network_function": {"type": >>> {"key": {"type": "uuid", >>> "refTable": "Network_Function", >>> "refType": "strong"}, >>> "min": 0, "max": "unlimited"}}, >>> "mode": {"type": {"key": {"type": "string", >>> "enum": ["set", ["inline"]]}}}, >>> "network_function_active": {"type": >>> {"key": {"type": "uuid", >>> "refTable": "Network_Function", >>> "refType": "strong"}, >>> "min": 0, "max": 1}}, >>> "id": { >>> "type": {"key": {"type": "integer", >>> "minInteger": 0, >>> "maxInteger": 255}}}, >>> "external_ids": { >>> "type": {"key": "string", "value": "string", >>> "min": 0, "max": "unlimited"}}}, >>> "isRoot": true}, >>> >>> >>> Modified NB table >>> —---------------- >>> ACL: The ACL entity would have a new optional field that is a >> reference to a Network_Function_Group entity. This field can be present >> only for stateful allow ACLs. >>> >>> "ACL": { >>> "columns": { >>> "network_function_group": {"type": {"key": {"type": >> "uuid", >>> "refTable": >> "Network_Function_Group", >>> "refType": "strong"}, >>> "min": 0, >>> "max": 1}}, >>> >>> New options for Logical_Switch_Port >>> —---------------------------------- >>> receive_multicast=<boolean>: Default true. If set to false, LS will >> not forward broadcast/multicast traffic to this port. This is to prevent >> looping of such packets. >>> >>> lsp_learn_mac=<boolean>: Default true. If set to false, fdb learning >> will be skipped for packets coming out of this port. Redirected packets >> from the NF port would be carrying the originating VM’s MAC in source, >> and so learning should not happen. >>> >>> CMS needs to set both the above options to false for NF ports, in >> addition to disabling port security. >>> >>> nf-linked-port=<lsp-name>: Each NF port needs to have this set to the >> other NF port of the pair. >>> >>> is-nf=<boolean>: Each NF port needs to have this set to true. >>> >>> New NB_global options >>> —-------------------- >>> svc_monitor_mac_dst: destination MAC of probe packets (svc_monitor_mac >> is already there and will be used as source MAC) >>> svc_monitor_ip: source IP of probe packets >>> svc_monitor_ip_dst: destination IP of probe packets >>> >>> Sample configuration >>> —------------------- >>> ovn-nbctl ls-add ls1 >>> ovn-nbctl lsp-add ls1 nfp1 >>> ovn-nbctl lsp-add ls1 nfp2 >>> ovn-nbctl set logical_switch_port nfp1 options:receive_multicast=false >> options:lsp_learn_mac=false options:nf-linked-port=nfp2 >>> ovn-nbctl set logical_switch_port nfp2 options:receive_multicast=false >> options:lsp_learn_mac=false options:nf-linked-port=nfp1 >>> ovn-nbctl nf-add nf1 nfp1 nfp2 >>> ovn-nbctl nfg-add nfg1 123 inline nf1 >>> ovn-nbctl lsp-add ls1 p1 -- lsp-set-addresses p1 "50:6b:8d:3e:ed:c4 >> 10.1.1.4" >>> ovn-nbctl pg-add pg1 p1 >>> ovn-nbctl create Address_Set name=as1 addresses=10.1.1.4 >>> ovn-nbctl lsp-add ls1 p2 -- lsp-set-addresses p2 "50:6b:8d:3e:ed:c5 >> 10.1.1.5" >>> ovn-nbctl create Address_Set name=as2 addresses=10.1.1.5 >>> ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' >> allow-related nfg1 >>> ovn-nbctl acl-add pg1 to-lport 100 'outport==@pg1 && ip4.src == $as2' >> allow-related nfg1 >>> >>> 3. SB tables >>> ============ >>> Service_Monitor: >>> This is currently used by Load balancer. New fields are: “type” - to >> indicate LB or NF, “mac” - the destination MAC address for monitor >> packets, “logical_input_port” - the LSP to which the probe packet would >> be sent. Also, “icmp” has been added as a protocol type, used only for NF. >>> >>> "Service_Monitor": { >>> "columns": { >>> "type": {"type": {"key": { >>> "type": "string", >>> "enum": ["set", ["load-balancer", "network- >> function"]]}}}, >>> "mac": {"type": "string"}, >>> "protocol": { >>> "type": {"key": {"type": "string", >>> "enum": ["set", ["tcp", "udp", "icmp"]]}, >>> "min": 0, "max": 1}}, >>> "logical_input_port": {"type": "string"}, >>> >>> northd would create one Service_Monitor entity for each NF. The >> logical_input_port and logical_port would be populated from the NF >> inport and outport fields respectively. The probe packets would be >> injected into the logical_input_port and would be monitored out of >> logical_port. >>> >>> 4. Logical Flows >>> ================ >>> Logical Switch ingress pipeline: >>> - in_network_function added after in_stateful. >>> - Modifications to in_acl_eval, in_stateful and in_l2_lookup. >>> Logical Switch egress pipeline: >>> - out_network_function added after out_stateful. >>> - Modifications to out_pre_acl, out_acl_eval and out_stateful. >>> >>> 4.1 from-lport ACL >>> ------------------ >>> The diagram shows the request path for packets from VM1 port p1, which >> is a member of the pg to which ACL is applied. The response would follow >> the reverse path, i.e. packet would be redirected to nfp2 and come out >> of nfp1 and be forwarded to p1. >>> Also, p2 does not need to be on the same LS. Only the p1, nfp1, nfp2 >> are on the same LS. >>> >>> ----- ------- ----- >>> | VM1 | | NF VM | | VM2 | >>> ----- ------- ----- >>> | /\ | / \ >>> | | | | >>> \ / | \ / | >>> ------------------------------------------------------------ >>> | p1 nfp1 nfp2 p2 | >>> | | >>> | Logical Switch | >>> ------------------------------------------------------------- >>> pg1: [p1] as2: [p2-ip] >>> ovn-nbctl nf-add nf1 nfp1 nfp2 >>> ovn-nbctl nfg-add nfg1 123 inline nf1 >>> ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' >> allow-related nfg1 >>> >>> The request packets from p1 matching a from-lport ACL with NFG, are >> redirected to nfp1 and the NFG id is committed to the ct label in p1's >> zone. When the same packet comes out of nfp2 it gets forwarded the >> normal way. >>> Response packets have destination as p1's MAC. Ingress processing sets >> the outport to p1 and the CT lookup in egress pipeline (in p1's ct zone) >> yields the NFG id and the packet injected back to ingress pipeline after >> setting the outport to nfp2. >>> >>> Below are the changes in detail. >>> >>> 4.1.1 Request processing >>> ------------------------ >>> >>> in_acl_eval: For from-lport ACLs with NFG, the existing rule's action >> has been enhanced to set: >>> - reg8[21] = 1: to indicate that packet has matched a rule with NFG >>> - reg0[22..29] = <NFG-unique-id> >>> - reg8[22] = <direction> (1: request, 0: response) >>> >>> table=8 (ls_in_acl_eval), priority=1200 , match=(reg0[7] == 1 && >> (inport==@pg1 && ip4.dst == $as2)), action=(reg8[16] = 1; reg0[1] = 1; >> reg8[21] = 1; reg8[22] = 1; reg0[22..29] = 123; next;) >>> table=8 (ls_in_acl_eval), priority=1200 , match=(reg0[8] == 1 && >> (inport==@pg1 && ip4.dst == $as2)), action=(reg8[16] = 1; reg8[21] = 1; >> reg8[22] = 1; reg0[22..29] = 123; next;) >>> >>> in_stateful: Priority 110: set NFG id in CT label if reg8[21] is set. >>> - bit 7 (ct_label.network_function_group): Set to 1 to indicate NF >> insertion. >>> - bits 17 to 24 (ct_label.network_function_group_id): Stores the 8 >> bit NFG id >>> >>> table=21(ls_in_stateful ), priority=110 , match=(reg0[1] == 1 >> && reg0[13] == 0 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked >> = 0; ct_mark.allow_established = reg0[20]; ct_label.acl_id = >> reg2[16..31]; ct_label.network_function_group = 1; >> ct_label.network_function_group_id = reg0[22..29]; }; next;) >>> table=21(ls_in_stateful ), priority=110 , match=(reg0[1] == 1 >> && reg0[13] == 1 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked >> = 0; ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = >> reg8[19..20]; ct_mark.obs_collector_id = reg8[8..15]; >> ct_label.obs_point_id = reg9; ct_label.acl_id = reg2[16..31]; >> ct_label.network_function_group = 1; ct_label.network_function_group_id >> = reg0[22..29]; }; next;) >>> table=21(ls_in_stateful ), priority=100 , match=(reg0[1] == 1 >> && reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; >> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; >> ct_label.network_function_group = 0; ct_label.network_function_group_id >> = 0; }; next;) >>> table=21(ls_in_stateful ), priority=100 , match=(reg0[1] == 1 >> && reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; >> ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; >> ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; >> ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 0; >> ct_label.network_function_group_id = 0; }; next;) >>> table=21(ls_in_stateful ), priority=0 , match=(1), action=(next;) >>> >>> >>> For non-NFG cases, the existing priority 100 rules will be hit. There >> additional action has been added to clear the NFG bits in ct label. >>> >>> in_network_function: A new stage with priority 99 rules to redirect >> packets by setting outport to the NF “inport” (or its child port) based >> on the NFG id set by the prior ACL stage. >>> Priority 100 rules ensure that when the same packets come out of the >> NF ports, they are not redirected again (the setting of reg5 here >> relates to the cross-host packet tunneling and will be explained later). >>> Priority 1 rule: if reg8[21] is set, but the NF port (or child port) >> is not present on this LS, drop packets. >>> >>> table=22(ls_in_network_function), priority=100 , match=(inport == >> "nfp1"), action=(reg5[16..31] = ct_label.tun_if_id; next;) >>> table=22(ls_in_network_function), priority=100 , match=(inport == >> "nfp2"), action=(reg5[16..31] = ct_label.tun_if_id; next;) >>> table=22(ls_in_network_function), priority=100 , match=(reg8[21] == >> 1 && eth.mcast), action=(next;) >>> table=22(ls_in_network_function), priority=99 , match=(reg8[21] == >> 1 && reg8[22] == 1 && reg0[22..29] == 123), action=(outport = "nfp1"; >> output;) >>> table=22(ls_in_network_function), priority=1 , match=(reg8[21] == >> 1), action=(drop;) >>> table=22(ls_in_network_function), priority=0 , match=(1), >> action=(next;) >>> >>> >>> 4.1.2 Response processing >>> ------------------------- >>> out_acl_eval: High priority rules that allow response and related >> packets to go through have been enhanced to also copy CT label NFG bit >> into reg8[21]. >>> >>> table=6(ls_out_acl_eval), priority=65532, match=(!ct.est && ct.rel >> && !ct.new && !ct.inv && ct_mark.blocked == 0), action=(reg8[21] = >> ct_label.network_function_group; reg8[16] = 1; ct_commit_nat;) >>> table=6(ls_out_acl_eval), priority=65532, match=(ct.est && !ct.rel >> && !ct.new && !ct.inv && ct.rpl && ct_mark.blocked == 0), >> action=(reg8[21] = ct_label.network_function_group; reg8[16] = 1; next;) >>> >>> out_network_function: Priority 99 rule matches on the nfg_id in >> ct_label and sets the outport to the NF “outport”. It also sets >> reg8[23]=1 and injects the packet to ingress pipeline (in_l2_lookup). >>> Priority 100 rule forwards all packets to NF ports to the next table. >>> >>> table=11 (ls_out_network_function), priority=100 , match=(outport >> == "nfp1"), action=(next;) >>> table=11 (ls_out_network_function), priority=100 , match=(outport >> == "nfp2"), action=(next;) >>> table=11(ls_out_network_function), priority=100 , match=(reg8[21] >> == 1 && eth.mcast), action=(next;) >>> table=11 (ls_out_network_function), priority=99 , match=(reg8[21] >> == 1 && reg8[22] == 0 && ct_label.network_function_group_id == 123), >> action=(outport = "nfp2"; reg8[23] = 1; next(pipeline=ingress, table=29);) >>> table=11 (ls_out_network_function), priority=1 , match=(reg8[21] >> == 1), action=(drop;) >>> table=11 (ls_out_network_function), priority=0 , match=(1), >> action=(next;) >>> >>> in_l2_lkup: if reg8[23] == 1 (packet has come back from egress), >> simply forward such packets as outport is already set. >>> >>> table=29(ls_in_l2_lkup), priority=100 , match=(reg8[23] == 1), >> action=(output;) >>> >>> The above set of rules ensure that the response packet is sent to >> nfp2. When the same packet comes out of nfp1, the ingress pipeline would >> set the outport to p1 and it enters the egress pipeline. >>> >>> out_pre_acl: If the packet is coming from the NF inport, skip the >> egress pipeline upto the out_nf stage, as the packet has already gone >> through it and we don't want the same packet to be processed by CT twice. >>> table=2 (ls_out_pre_acl ), priority=110 , match=(inport == >> "nfp1"), action=(next(pipeline=egress, table=12);) >>> >>> >>> 4.2 to-lport ACL >>> ---------------- >>> ----- -------- ----- >>> | VM1 | | NF VM | | VM2 | >>> ----- -------- ----- >>> / \ | / \ | >>> | | | | >>> | \ / | \ / >>> ------------------------------------------------------------- >>> | p1 nfp1 nfp2 p2 | >>> | | >>> | Logical Switch | >>> ------------------------------------------------------------- >>> ovn-nbctl acl-add pg1 to-lport 100 'outport==@pg1&& ip4.src == $as2' >> allow-related nfg1 >>> Diagram shows request traffic path. The response will follow a reverse >> path. >>> >>> Ingress pipeline sets the outport to p1 based on destination MAC >> lookup. The packet enters the egress pipeline. There the to-lport ACL >> with NFG gets evaluated and the NFG id gets committed to the CT label. >> Then the outport is set to nfp2 and then the packet is injected back to >> ingress. When the same packet comes out of nfp1, it gets forwarded to p1 >> the normal way. >>>> From the response packet from p1, ingress pipeline gets the NFG id >> from CT label and accordingly redirects it to nfp1. When it comes out of >> nfp2 it is forwarded the normal way. >>> >>> 4.2.1 Request processing >>> ------------------------ >>> out_acl_eval: For to-lport ACLs with NFG, the existing rule's action >> has been enhanced to set: >>> - reg8[21] = 1: to indicate that packet has matched a rule with NFG >>> - reg0[22..29] = <NFG-unique-id> >>> - reg8[22] = <direction> (1: request, 0: response) >>> >>> table=6 (ls_out_acl_eval ), priority=1100 , match=(reg0[7] == 1 >> && (outport==@pg1 && ip4.src == $as2)), action=(reg8[16] = 1; reg0[1] = >> 1; reg8[21] = 1; reg8[22] = 1; reg0[22..29] = 123; next;) >>> table=6 (ls_out_acl_eval ), priority=1100 , match=(reg0[8] == 1 >> && (outport==@pg1 && ip4.src == $as2)), action=(reg8[16] = 1; reg0[1] = >> 1; reg8[21] = 1; reg8[22] = 1; reg0[22..29] = 123; next;) >>> >>> >>> >>> Out_stateful: Priority 110: set NFG id in CT label if reg8[21] is set. >>> >>> table=10(ls_out_stateful ), priority=110 , match=(reg0[1] == 1 >> && reg0[13] == 0 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked >> = 0; ct_mark.allow_established = reg0[20]; ct_label.acl_id = >> reg2[16..31]; ct_label.network_function_group = 1; >> ct_label.network_function_group_id = reg0[22..29]; }; next;) >>> table=10(ls_out_stateful ), priority=110 , match=(reg0[1] == 1 >> && reg0[13] == 1 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked >> = 0; ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = >> reg8[19..20]; ct_mark.obs_collector_id = reg8[8..15]; >> ct_label.obs_point_id = reg9; ct_label.acl_id = reg2[16..31]; >> ct_label.network_function_group = 1; ct_label.network_function_group_id >> = reg0[22..29]; }; next;) >>> table=10(ls_out_stateful ), priority=100 , match=(reg0[1] == 1 >> && reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; >> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; >> ct_label.network_function_group = 0; ct_label.network_function_group_id >> = 0; }; next;) >>> table=10(ls_out_stateful ), priority=100 , match=(reg0[1] == 1 >> && reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; >> ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; >> ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; >> ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 0; >> ct_label.network_function_group_id = 0; }; next;) >>> table=10(ls_out_stateful ), priority=0 , match=(1), action=(next;) >>> >>> out_network_function: A new stage that has priority 99 rules to >> redirect packet by setting outport to the NF “outport” (or its child >> port) based on the NFG id set by the prior ACL stage, and then injecting >> back to ingress. Priority 100 rules ensure that when the packets are >> going to NF ports, they are not redirected again. >>> Priority 1 rule: if reg8[21] is set, but the NF port (or child port) >> is not present on this LS, drop packets. >>> >>> table=11(ls_out_network_function), priority=100 , match=(outport == >> "nfp1"), action=(next;) >>> table=11(ls_out_network_function), priority=100 , match=(outport == >> "nfp2"), action=(next;) >>> table=11(ls_out_network_function), priority=100 , match=(reg8[21] >> == 1 && eth.mcast), action=(next;) >>> table=11(ls_out_network_function), priority=99 , match=(reg8[21] >> == 1 && reg8[22] == 1 && reg0[22..29] == 123), action=(outport = "nfp2"; >> reg8[23] = 1; next(pipeline=ingress, table=29);) >>> table=11(ls_out_network_function), priority=1 , match=(reg8[21] >> == 1), action=(drop;) >>> table=11(ls_out_network_function), priority=0 , match=(1), >> action=(next;) >>> >>> >>> in_l2_lkup: As described earlier, the priority 100 rule will forward >> these packets. >>> >>> Then the same packet comes out from nfp1 and goes through the ingress >> processing where the outport gets set to p1. The egress pipeline >> out_pre_acl priority 110 rule described earlier, matches against inport >> as nfp1 and directly jumps to the stage after out_network_function. Thus >> the packet is not redirected again. >>> >>> 4.2.2 Response processing >>> ------------------------- >>> in_acl_eval: High priority rules that allow response and related >> packets to go through have been enhanced to also copy CT label NFG bit >> into reg8[21]. >>> >>> table=8(ls_in_acl_eval), priority=65532, match=(!ct.est && ct.rel >> && !ct.new && !ct.inv && ct_mark.blocked == 0), action=(reg0[17] = 1; >> reg8[21] = ct_label.network_function_group; reg8[16] = 1; ct_commit_nat;) >>> table=8 (ls_in_acl_eval), priority=65532, match=(ct.est && !ct.rel >> && !ct.new && !ct.inv && ct.rpl && ct_mark.blocked == 0), >> action=(reg0[9] = 0; reg0[10] = 0; reg0[17] = 1; reg8[21] = >> ct_label.network_function_group; reg8[16] = 1; next;) >>> >>> in_network_function: Priority 99 rule matches on the nfg_id in >> ct_label and sets the outport to the NF “inport”. >>> Priority 100 rule forwards all packets to NF ports to the next table. >>> table=22(ls_in_network_function), priority=99 , match=(reg8[21] == >> 1 && reg8[22] == 0 && ct_label.network_function_group_id == 123), >> action=(outport = "nfp1"; output;) >>> >>> >>> 5. Cross-host Traffic for VLAN Network >>> ====================================== >>> For overlay subnets, all cross-host traffic exchanges are tunneled. In >> the case of VLAN subnets, there needs to be special handling to >> selectively tunnel only the traffic to or from the NF ports. >>> Take the example of a from-lport ACL. Packets from p1 to p2, gets >> redirected to nfp1 in host1. If this packet is simply sent out from >> host1, the physical network will directly forward it to host2 where VM2 >> is. So, we need to tunnel the redirected packets from host1 to host3. >> Now, once the packets come out of nfp2, if host3 sends the packets out, >> the physical network would learn p1's MAC coming from host3. So, these >> packets need to be tunneled back to host1. From there the packet would >> be forwarded to VM2 via the physical network. >>> >>> ----- ----- -------- >>> | VM2 | | VM1 | | NF VM | >>> ----- ----- -------- >>> / \ | / \ | >>> | (7) | (1) (3)| |(4) >>> | \ / | \ / >>> -------------- -------------- (2) --------------- >>> | p2 | (6) | p1 |______\ | nfp1 nfp2 | >>> | |/____ | |------/ | | >>> | host2 |\ | host1 |/______ | host3 | >>> | | | |\------ | | >>> -------------- -------------- (5) -------------- >>> >>> The above figure shows the request packet path for a from-lport ACL. >> Response would follow the same path in reverse direction. >>> >>> To achieve this, the following would be done: >>> >>> On host where the ACL port group members are present (host1) >>> —----------------------------------------------------------- >>> REMOTE_OUTPUT (table 42): >>> Currently, it tunnels traffic destined to all non-local overlay ports >> to their associated hosts. The same rule is now also added for traffic >> to non-local NF ports. Thus the packets from p1 get tunneled to host 3. >>> >>> On host with NF (host3) forward packet to nfp1 >>> —---------------------------------------------- >>> Upon reaching host3, the following rules come into play: >>> PHY_TO_LOG (table 0): >>> Ppriority 100: Existing rule - for each geneve tunnel interface on the >> chassis, copies info from header to inport, outport, metadata registers. >> Now the same rule also stores the tun intf id in a register (reg5[16..31]). >>> >>> CHECK_LOOPBACK (table 44) >>> This table has a rule that clears all the registers. The change is to >> skip the clearing of reg5[16..31]. >>> >>> Logical egress pipeline: >>> >>> ls_out_stateful priority 120: If the outport is an NF port, copy >> reg5[16..31] (table0 had set it) to ct_label.tun_if_id.) >>> >>> table=10(ls_out_stateful ), priority=120 , match=(outport == >> "nfp1" && reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; >> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; >> ct_label.tun_if_id = reg5[16..31]; }; next;) >>> table=10(ls_out_stateful ), priority=120 , match=(outport == >> "nfp1" && reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; >> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; >> ct_mark.obs_stage = reg8[19..20]; ct_mark.obs_collector_id = >> reg8[8..15]; ct_label.obs_point_id = reg9; ct_label.tun_if_id = >> reg5[16..31]; }; next;) >>> >>> The above sequence of flows ensure that if a packet is received via >> tunnel on host3, with outport as nfp1, the tunnel interface id is >> committed to the ct entry in nfp1's zone. >>> >>> On host with NF (host3) tunnel packets from nfp2 back to host1 >>> —-------------------------------------------------------------- >>> When the same packet comes out of nfp2 on host3: >>> >>> LOCAL_OUTPUT (table 43) >>> When the packet comes out of the other NF port (nfp2), following two >> rules send it back to the host that it originally came from: >>> >>> Priority 110: For each NF port local to this host, following rule >> processes the >>> packet through CT of linked port (for nfp2, it is nfp1): >>> match: inport==nfp2 && RECIRC_BIT==0 >>> action: RECIRC_BIT = 1, ct(zone=nfp1’s zone, table=LOCAL), resubmit >> to table 43 >>> >>> Priority 109: For each {tunnel_id, nf port} on this host, if the >> tun_if_id in ct_label matches the tunnel_id, send the recirculated >> packet using tunnetl_id: >>> match: inport==nfp1 && RECIRC_BIT==1 && ct_label.tun_if_id==<tun-id> >>> action: tunnel packet using tun-id >>> >>> If p1 and nfp1 happen to be on the same host, the tun_if_id would not >> be set and thus none of the priority 109 rules would match. It would be >> forwarded the usual way matching the existing priority 100 rules in >> LOCAL_TABLE. >>> >>> Special handling of the case where NF responds back on nfp1, instead >> of forwarding packet out of nfp2: >>> For example, a SYN packet from p1 got redirected to nfp1. Then the NF, >> which is a firewall VM, drops the SYN and sends RST back on port nfp1. >> In this case, looking up in the linked port (nfp2) ct zone will not give >> anything. The following rule uses ct.inv to identify such scenarios and >> uses nfp1’s CT zone to send the packet back. To achieve this, following >> 2 rules are installed: >>> >>> in_network_function: >>> Priority 100 rule that allows packets incoming from NF type ports, is >> enhanced with additional action to store the tun_if_id from ct_label >> into reg5[16..31]. >>> table=22(ls_in_network_function), priority=100 , match=(inport == >> "nfp1"), action=(reg5[16..31] = ct_label.tun_if_id; next;) >>> >>> LOCAL_OUTPUT (table 43) >>> Priority 110 rule: for recirculated packets, if ct (of the linked >> port) is invalid, use the tun id from reg5[16..31] to tunnel the packet >> back to host1 (as CT zone info has been overwritten in the above 110 >> priority rule in table 42). >>> match: inport==nf1 && RECIRC_BIT==1 && ct.inv && >> MFF_LOG_TUN_OFPORT==<tun-id> >>> action: tunnel packet using tun-id >>> >>> >>> 6. NF insertion across logical switches >>> ======================================= >>> If the port-group where the ACL is being applied has members across >> multiple logical switches, there needs to be a NF port pair on each of >> these switches. >>> The NF VM will have only one inport and one outport. The CMS is >> expected to create child ports linked to these ports on each logical >> switch where port-group members are present. >>> The network-function entity would be configured with the parent ports >> only. When CMS creates the child ports, it does not need to change any >> of the NF, NFG or ACL config tables. >>> When northd configures the redirection rules for a specific LS, it >> will use the parent or child port depending on what it finds on that LS. >>> -------- >>> | NF VM | >>> -------- >>> | | >>> ----- | | ----- >>> | VM1 | nfp1 nfp2 | VM2 | >>> ---- - | | -------------- ----- | >> | >>> | | | | SVC LS | | | >> | >>> p1| nfp1_ch1 nfp2_ch1 -------------- p3| >> nfp1_ch2 nfp2_ch2 >>> -------------------- >> -------------------- >>> | LS1 | | >> LS2 | >>> -------------------- >> -------------------- >>> >>> In this example, the CMS created the parent ports for the NF VM on LS >> named SVC LS. The ports are nfp1 and nfp2. The CMS configures the NF >> using these ports: >>> ovn-nbctl nf-add nf1 nfp1 nfp2 >>> ovn-nbctl nfg-add nfg1 123 inline nf1 >>> ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' >> allow-related nfg1 >>> >>> The port group to which the ACL is applied is pg1 and pg1 has two >> ports: p1 on LS1 and p3 on LS2. >>> The CMS needs to create child ports for the NF ports on LS1 and LS2. >> On LS1: nfp1_ch1 and nfp2_ch1. On LS2: nfp1_ch2 and nfp2_ch2 >>> >>> When northd creates rules on LS1, it would use nfp1_ch1 and nfp2_ch1. >>> >>> table=22(ls_in_network_function), priority=100 , match=(inport == >> "nfp2_ch1"), action=(reg5[16..31] = ct_label.tun_if_id; next;) >>> table=22(ls_in_network_function), priority=99 , match=(reg8[21] == >> 1 && reg8[22] == 1 && reg0[22..29] == 123), action=(outport = >> "nfp1_ch1"; output;) >>> >>> When northd is creating rules on LS2, it would use nfp1_ch2 and nfp2_ch2. >>> table=22(ls_in_network_function), priority=100 , match=(inport == >> "nfp2_ch2"), action=(reg5[16..31] = ct_label.tun_if_id; next;) >>> table=22(ls_in_network_function), priority=99 , match=(reg8[21] == >> 1 && reg8[22] == 1 && reg0[22..29] == 123), action=(outport = >> "nfp1_ch2"; output;) >>> >>> >>> 7. Health Monitoring >>> ==================== >>> The LB health monitoring functionality has been extended to support >> NFs. Network_Function_Group has a list of Network_Functions, each of >> which has a reference to network_Function_Health_Check that has the >> monitoring config. There is a corresponding SB service_monitor >> maintaining the online/offline status. When status changes, northd picks >> one of the “online” NFs and sets it in the network_function_active field >> of NFG. The redirection rule in LS uses the ports from this NF. >>> >>> Ovn-controller performs the health monitoring by sending ICMP echo >> request with source IP and MAC from NB global options “svc_monitor_ip4” >> and “svc_monitor_mac”, and destination IP and MAC from new NB global >> options “svc_monitor_ip4_dst” and “svc_monitor_mac_dst”. The sequence >> number and id are randomly generated and stored in service_mon. The NF >> VM forwards the same packet out of the other port. When it comes out, >> ovn-controller matches the sequence number and id with stored values and >> marks online if matched. >>> >>> V1: >>> - First patch. >>> >>> V2: >>> - Rebased code. >>> - Added "mode" field in Network_function_group table, with only allowed >>> value as "inline". This is for future expansion to include >> "mirror" mode. >>> - Added a flow in the in_network_function and out_network_function >> table to >>> skip redirection of multicast traffic. >>> >>> V3: >>> - Rebased code. >>> >>> V4: >>> - Rebased code. >>> >>> V5: >>> - FIxed issue of packet duplication for NF on overlay. Added a check >>> in physical/controller.c so that only for localnet case the new flows >>> in table 44 (that tunnel packets back to source host based on tunnel >>> i/f stored in CT) get installed. >>> - Fixed packet drop issue for NF on overlay. This happens when port1 >> is on >>> host1, and port2 & NF are on host2; packets sent from port2 to port1 >>> are redirected to NF port as intended but it is not reaching host2. >>> This is because OVS drops packets that are sent to same port on which >>> they were received (in this case packet received on host1 on >>> a tunnel interface and being forwarded to the same tunnel). Updated >>> the table 43 rule for remote overlay ports to clear in_port to get >>> around this. >>> - Added unit tests in ovn.at to cover various multi-chassis scenarios >>> for both VLAN and overlay cases. >>> >>> V6: >>> - Addressed review comments from Numan Siddique. >>> - Updated ovn-northd.8.xml and ovn-nbctl.8.xml. >>> - Unit test added for health monitoring of network function >>> - System test added for single chassis case. Multichassis ST to be >> added later. >>> There were already the multi-chassis cases added in ovn.at. >>> >>> V7: >>> - Rebased >>> - Fixed unit test failures resulting from new IC SB service monitor >> feature >>> >>> V8: >>> - Addressed review comments from Dumitru Ceara. Main changes are >> listed below. >>> - NFG unique id is not generated internally, but is configured by CMS. >>> - NFG register changed from reg5[0…7] to reg0[22..29] >>> - IPv6 support for health probes >>> - Update to NEWs and various xml files for proper documentation. >>> >>> V9: >>> - Addressed new review comments from Dumitru Ceara. Main changes are >> as below. >>> - Changed isRoot from true to false for Network_Function_Health_Check. >>> - Changed the behaviour of --may-exist from no-op to update. >>> - In nbctl commands, replaced "network-function" with "nf" and >>> "network-function-group" with "nfg". >>> - The change related to skipping the clearing of tunnel id part of >> reg5 is now >>> being done only for NF datapaths. >>> - Fixed the intermittent test failures earlier being caused by patch5, by >>> sorting the svc_macs. >>> - Fixed backward compatibility with respect to "type" field in >> service_monitor. >>> - New system test added for the case without health_check. >>> - Replaced the scoring mechanism for NF election with a deterministic >> method. >>> - Moved some test and xml changes to other patches to align with the >> changes. >>> - Added a few TODO items. Split the NEWs change across multiple patches. >>> >> >> Hi Sragdhara, Naveen, Karthik, Numan >> >> Thank you for this new revision and for the reviews. Also thanks for >> bearing with us while trying to get this new feature in OVN! I know >> it's been a challenge. >> >> Overall I didn't see any major issues in the series. There's quite a >> lot of code being added/changed so I'm quite sure we missed some >> things. However, we have quite a lot of time to address any of those >> in the remainder of the 26.03 development cycle. >> >> I did apply some small changes to the last 3 patches of the series, >> mostly style issues and then pushed the series to main. >> >> But I also have a few items I'd like to try to follow up on. It would >> be great if you could set some time aside to look into that. Please >> see below, listed for the corresponding patches. >> >>> Sragdhara Datta Chaudhuri (5): >>> ovn-nb: Network Function insertion OVN-NB schema changes. >>> ovn-nbctl: Network Function insertion commands. >>> northd, tests: Network Function insertion logical flow programming. >>> controller, tests: Network Function insertion tunneling of cross-host >>> VLAN traffic. >> >> This patch changes the PHY_TO_LOG() table to automatically load the >> "tun intf id" for decapsulated packets into a "random" register, >> reg5[16..31]. I know we also define a logical field for these 16 bits >> of reg5: >> >> #define MFF_LOG_TUN_OFPORT MFF_REG5 /* 16..31 of the 32 bits */ >> >> But that seems a bit error prone. We also (for NF traffic) skip >> clearing these bits when going from the logical ingress pipeline to >> the logical egress one. >> >> I think it might make sense to have a dedicated (set of) registers >> that are preserved between logical pipelines. The problem is we >> currently are out of OVS registers that we can use: >> >> /* Logical fields. >> * >> * These values are documented in ovn-architecture(7), please update the >> * documentation if you change any of them. */ >> #define MFF_LOG_DATAPATH MFF_METADATA /* Logical datapath (64 bits). */ >> #define MFF_LOG_FLAGS MFF_REG10 /* One of MLF_* (32 bits). */ >> #define MFF_LOG_DNAT_ZONE MFF_REG11 /* conntrack dnat zone for gateway >> router >> * (32 bits). */ >> #define MFF_LOG_SNAT_ZONE MFF_REG12 /* conntrack snat zone for gateway >> router >> * (32 bits). */ >> #define MFF_LOG_CT_ZONE MFF_REG13 /* Logical conntrack zone for lports >> * (0..15 of the 32 bits). */ >> #define MFF_LOG_ENCAP_ID MFF_REG13 /* Encap ID for lports >> * (16..31 of the 32 bits). */ >> #define MFF_LOG_INPORT MFF_REG14 /* Logical input port (32 bits). */ >> #define MFF_LOG_OUTPORT MFF_REG15 /* Logical output port (32 bits). */ >> >> I had a quick glance at OVS' code and I _think_ there should be no >> major issue if we add a nother xxreg (i.e., 2 extra xregs / 4 extra regs). >> >> We're also running low on available bits in the logical flags register >> (that's also preserved between ingress and egress) so having some extra >> register storage space to work with would make our lives easier. The >> impact of adding new registers should be relatively minimal (I guess) as >> it would only potentially increase the upcall handling time but likely >> with a marginal amount. >> >> I'm CC-ing Ilya to see if such a change would be acceptable in OVS. >> >>> northd, controller, tests: Network Function Health monitoring. >> >> The new system test in this patch blindly kills any "random" tcpdump >> instance that might be running on the host where the tests are being >> executed. We should probably improve this test to use >> NETNS_START_TCPDUMP() or something similar. >> >> Also, with this last patch, I saw the "622: Network function packet >> flow - outbound" test fail in GitHub CI a couple of times. I couldn't >> reproduce the problem locally and after rebase it seems to pass. But >> it would be great if you could monitor this test and fix it in the >> future if it starts failing again. >> >> Thanks again for all the hard work on this new feature! >> >> Best regards, >> Dumitru >> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
