On 9/15/25 9:34 PM, Sragdhara Datta Chaudhuri wrote:
> RFC: NETWORK FUNCTION INSERTION IN OVN
> 
> 1. Introduction
> ================
> The objective is to insert a Network Function (NF) in the path of 
> outbound/inbound traffic from/to a port-group. The use case is to integrate a 
> 3rd party service in the path of traffic. An example of such a service would 
> be layer7 firewall. The NF VM will be like a bump in the wire and should not 
> modify the packet, i.e. the IP header, the MAC addresses, VLAN tag, sequence 
> numbers remain unchanged.
> 
> Here are some of the highlights:
> - A new entity network-function (NF) has been introduced. It contains a pair 
> of LSPs. The CMS would designate one as “inport” and the other as “outport”.
> - For high-availability, a network function group (NFG) entity consists of a 
> group of NFs. Only one NF in a NFG has an active role based on health 
> monitoring.
> - ACL would accept NFG as a parameter and traffic matching the ACL would be 
> redirected to the associated active NF’s port. NFG is accepted for stateful 
> allow action only.
> - The ACL’s port-group is the point of reference when defining the role of 
> the NF ports. The “inport” is the port closer to the port-group and “outport” 
> is the one away from it. For from-lport ACLs, the request packets would be 
> redirected to the NF “inport” and for to-lport ACLs, the request packets 
> would be redirected to NF “outport”. When the same packet comes out of the 
> other NF port, it gets simply forwarded.
> - Statefulness will be maintained, i.e. the response traffic will also go 
> through the same pair of NF ports but in reverse order.
> - For the NF ports we need to disable port security check, fdb learning and 
> multicast/broadcast forwarding.
> - Health monitoring involves ovn-controller periodically injecting ICMP probe 
> packets into the NF inport and monitor the same packet coming out of the NF 
> outport.
> - If the traffic redirection involves cross-host traffic (e.g. for a 
> from-lport ACL, if the source VM and NF VM are on different hosts), packets 
> would be tunneled to and from the NF VM's host.
> - If the port-group to which the ACL is being applied has members spread 
> across multiple LSs, CMS needs to create child ports for the NF ports on each 
> of these LSs. The redirection rules in each LS will use the child ports on 
> that LS.
> 
> 2. NB tables
> =============
> New NB tables
> —------------
> Network_Function: Each row contains {inport, outport, health_check}
> Network_Function_Group: Each row contains a list of Network_Function 
> entities. It also contains a unique id (between 1 and 255) and a reference to 
> the current active NF.
> Network_Function_Health_Check: Each row contains configuration for probes in 
> options field: {interval, timeout, success_count, failure_count}
> 
>         "Network_Function_Health_Check": {
>             "columns": {
>                 "name": {"type": "string"},
>                 "options": {
>                      "type": {"key": "string",
>                               "value": "string",
>                               "min": 0,
>                               "max": "unlimited"}},
>                 "external_ids": {
>                     "type": {"key": "string", "value": "string",
>                              "min": 0, "max": "unlimited"}}},
>             "isRoot": true},
>         "Network_Function": {
>             "columns": {
>                 "name": {"type": "string"},
>                 "outport": {"type": {"key": {"type": "uuid",
>                                              "refTable": 
> "Logical_Switch_Port",
>                                              "refType": "strong"},
>                                      "min": 1, "max": 1}},
>                 "inport": {"type": {"key": {"type": "uuid",
>                                             "refTable": "Logical_Switch_Port",
>                                             "refType": "strong"},
>                                     "min": 1, "max": 1}},
>                 "health_check": {"type": {
>                     "key": {"type": "uuid",
>                             "refTable": "Network_Function_Health_Check",
>                             "refType": "strong"},
>                     "min": 0, "max": 1}},
>                 "external_ids": {
>                     "type": {"key": "string", "value": "string",
>                              "min": 0, "max": "unlimited"}}},
>             "isRoot": true},
>         "Network_Function_Group": {
>             "columns": {
>                 "name": {"type": "string"},
>                 "network_function": {"type":
>                                   {"key": {"type": "uuid",
>                                            "refTable": "Network_Function",
>                                            "refType": "strong"},
>                                            "min": 0, "max": "unlimited"}},
>                 "mode": {"type": {"key": {"type": "string",
>                                           "enum": ["set", ["inline"]]}}},
>                 "network_function_active": {"type":
>                                   {"key": {"type": "uuid",
>                                            "refTable": "Network_Function",
>                                            "refType": "strong"},
>                                            "min": 0, "max": 1}},
>                 "id": {
>                      "type": {"key": {"type": "integer",
>                                       "minInteger": 0,
>                                       "maxInteger": 255}}},
>                 "external_ids": {
>                     "type": {"key": "string", "value": "string",
>                              "min": 0, "max": "unlimited"}}},
>             "isRoot": true},
> 
> 
> Modified NB table
> —----------------
> ACL: The ACL entity would have a new optional field that is a reference to a 
> Network_Function_Group entity. This field can be present only for stateful 
> allow ACLs.
> 
>         "ACL": {
>             "columns": {
>                 "network_function_group": {"type": {"key": {"type": "uuid",
>                                            "refTable": 
> "Network_Function_Group",
>                                            "refType": "strong"},
>                                            "min": 0,
>                                            "max": 1}},
> 
> New options for Logical_Switch_Port
> —----------------------------------
> receive_multicast=<boolean>: Default true. If set to false, LS will not 
> forward broadcast/multicast traffic to this port. This is to prevent looping 
> of such packets.
> 
> lsp_learn_mac=<boolean>: Default true. If set to false, fdb learning will be 
> skipped for packets coming out of this port. Redirected packets from the NF 
> port would be carrying the originating VM’s MAC in source, and so learning 
> should not happen.
> 
> CMS needs to set both the above options to false for NF ports, in addition to 
> disabling port security.
> 
> nf-linked-port=<lsp-name>: Each NF port needs to have this set to the other 
> NF port of the pair.
> 
> is-nf=<boolean>: Each NF port needs to have this set to true.
> 
> New NB_global options
> —--------------------
> svc_monitor_mac_dst: destination MAC of probe packets (svc_monitor_mac is 
> already there and will be used as source MAC)
> svc_monitor_ip: source IP of probe packets
> svc_monitor_ip_dst: destination IP of probe packets
> 
> Sample configuration
> —-------------------
> ovn-nbctl ls-add ls1
> ovn-nbctl lsp-add ls1 nfp1
> ovn-nbctl lsp-add ls1 nfp2
> ovn-nbctl set logical_switch_port nfp1 options:receive_multicast=false 
> options:lsp_learn_mac=false options:nf-linked-port=nfp2
> ovn-nbctl set logical_switch_port nfp2 options:receive_multicast=false 
> options:lsp_learn_mac=false options:nf-linked-port=nfp1
> ovn-nbctl nf-add nf1 nfp1 nfp2
> ovn-nbctl nfg-add nfg1 123 inline nf1
> ovn-nbctl lsp-add ls1 p1 -- lsp-set-addresses p1 "50:6b:8d:3e:ed:c4 10.1.1.4"
> ovn-nbctl pg-add pg1 p1
> ovn-nbctl create Address_Set name=as1 addresses=10.1.1.4
> ovn-nbctl lsp-add ls1 p2 -- lsp-set-addresses p2 "50:6b:8d:3e:ed:c5 10.1.1.5"
> ovn-nbctl create Address_Set name=as2 addresses=10.1.1.5
> ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' 
> allow-related nfg1
> ovn-nbctl acl-add pg1 to-lport 100 'outport==@pg1 && ip4.src == $as2' 
> allow-related nfg1
> 
> 3. SB tables
> ============
> Service_Monitor:
> This is currently used by Load balancer. New fields are: “type” - to indicate 
> LB or NF, “mac” - the destination MAC address for monitor packets, 
> “logical_input_port” - the LSP to which the probe packet would be sent. Also, 
> “icmp” has been added as a protocol type, used only for NF.
> 
>          "Service_Monitor": {
>              "columns": {
>                "type": {"type": {"key": {
>                           "type": "string",
>                           "enum": ["set", ["load-balancer", 
> "network-function"]]}}},
>                "mac": {"type": "string"},
>                  "protocol": {
>                      "type": {"key": {"type": "string",
>                             "enum": ["set", ["tcp", "udp", "icmp"]]},
>                               "min": 0, "max": 1}},
>                "logical_input_port": {"type": "string"},
> 
> northd would create one Service_Monitor entity for each NF. The 
> logical_input_port and logical_port would be populated from the NF inport and 
> outport fields respectively. The probe packets would be injected into the 
> logical_input_port and would be monitored out of logical_port.
> 
> 4. Logical Flows
> ================
> Logical Switch ingress pipeline:
> - in_network_function added after in_stateful.
> - Modifications to in_acl_eval, in_stateful and in_l2_lookup.
> Logical Switch egress pipeline:
> - out_network_function added after out_stateful.
> - Modifications to out_pre_acl, out_acl_eval and out_stateful.
> 
> 4.1 from-lport ACL
> ------------------
> The diagram shows the request path for packets from VM1 port p1, which is a 
> member of the pg to which ACL is applied. The response would follow the 
> reverse path, i.e. packet would be redirected to nfp2 and come out of nfp1 
> and be forwarded to p1.
> Also, p2 does not need to be on the same LS. Only the p1, nfp1, nfp2 are on 
> the same LS.
> 
>       -----                  -------                  -----
>      | VM1 |                | NF VM |                | VM2 |
>       -----                  -------                  -----
>         |                    /\    |                   / \
>         |                    |     |                    |
>        \ /                   |    \ /                   |
>    ------------------------------------------------------------
>   |     p1                 nfp1  nfp2                   p2     |
>   |                                                            |
>   |                      Logical Switch                        |
>    -------------------------------------------------------------
> pg1: [p1]         as2: [p2-ip]
> ovn-nbctl nf-add nf1 nfp1 nfp2
> ovn-nbctl nfg-add nfg1 123 inline nf1
> ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' 
> allow-related nfg1
> 
> The request packets from p1 matching a from-lport ACL with NFG, are 
> redirected to nfp1 and the NFG id is committed to the ct label in p1's zone. 
> When the same packet comes out of nfp2 it gets forwarded the normal way.
> Response packets have destination as p1's MAC. Ingress processing sets the 
> outport to p1 and the CT lookup in egress pipeline (in p1's ct zone) yields 
> the NFG id and the packet injected back to ingress pipeline after setting the 
> outport to nfp2.
> 
> Below are the changes in detail.
> 
> 4.1.1 Request processing
> ------------------------
> 
> in_acl_eval: For from-lport ACLs with NFG, the existing rule's action has 
> been enhanced to set:
>  - reg8[21] = 1: to indicate that packet has matched a rule with NFG
>  - reg0[22..29] = <NFG-unique-id>
>  - reg8[22] = <direction> (1: request, 0: response)
> 
>   table=8 (ls_in_acl_eval), priority=1200 , match=(reg0[7] == 1 && 
> (inport==@pg1 && ip4.dst == $as2)), action=(reg8[16] = 1; reg0[1] = 1; 
> reg8[21] = 1; reg8[22] = 1; reg0[22..29] = 123; next;)
>   table=8 (ls_in_acl_eval), priority=1200 , match=(reg0[8] == 1 && 
> (inport==@pg1 && ip4.dst == $as2)), action=(reg8[16] = 1; reg8[21] = 1; 
> reg8[22] = 1; reg0[22..29] = 123; next;)
> 
> in_stateful: Priority 110: set NFG id in CT label if reg8[21] is set.
>  - bit 7 (ct_label.network_function_group): Set to 1 to indicate NF insertion.
>  - bits 17 to 24 (ct_label.network_function_group_id): Stores the 8 bit NFG id
> 
>   table=21(ls_in_stateful     ), priority=110  , match=(reg0[1] == 1 && 
> reg0[13] == 0 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; 
> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
> ct_label.network_function_group = 1; ct_label.network_function_group_id = 
> reg0[22..29]; }; next;)
>   table=21(ls_in_stateful     ), priority=110  , match=(reg0[1] == 1 && 
> reg0[13] == 1 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; 
> ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; 
> ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; 
> ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 1; 
> ct_label.network_function_group_id = reg0[22..29]; }; next;)
>   table=21(ls_in_stateful     ), priority=100  , match=(reg0[1] == 1 && 
> reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; 
> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
> ct_label.network_function_group = 0; ct_label.network_function_group_id = 0; 
> }; next;)
>   table=21(ls_in_stateful     ), priority=100  , match=(reg0[1] == 1 && 
> reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; 
> ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; 
> ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; 
> ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 0; 
> ct_label.network_function_group_id = 0; }; next;)
>   table=21(ls_in_stateful     ), priority=0    , match=(1), action=(next;)
> 
> 
> For non-NFG cases, the existing priority 100 rules will be hit. There 
> additional action has been added to clear the NFG bits in ct label. 
> 
> in_network_function: A new stage with priority 99 rules to redirect packets 
> by setting outport to the NF “inport” (or its child port) based on the NFG id 
> set by the prior ACL stage.
> Priority 100 rules ensure that when the same packets come out of the NF 
> ports, they are not redirected again (the setting of reg5 here relates to the 
> cross-host packet tunneling and will be explained later).
> Priority 1 rule: if reg8[21] is set, but the NF port (or child port) is not 
> present on this LS, drop packets.
> 
>   table=22(ls_in_network_function), priority=100  , match=(inport == "nfp1"), 
> action=(reg5[16..31] = ct_label.tun_if_id; next;)
>   table=22(ls_in_network_function), priority=100  , match=(inport == "nfp2"), 
> action=(reg5[16..31] = ct_label.tun_if_id; next;)
>   table=22(ls_in_network_function), priority=100  , match=(reg8[21] == 1 && 
> eth.mcast), action=(next;)
>   table=22(ls_in_network_function), priority=99   , match=(reg8[21] == 1 && 
> reg8[22] == 1 && reg0[22..29] == 123), action=(outport = "nfp1"; output;)
>   table=22(ls_in_network_function), priority=1    , match=(reg8[21] == 1), 
> action=(drop;)
>   table=22(ls_in_network_function), priority=0    , match=(1), action=(next;)
> 
> 
> 4.1.2 Response processing
> -------------------------
> out_acl_eval: High priority rules that allow response and related packets to 
> go through have been enhanced to also copy CT label NFG bit into reg8[21].
> 
>   table=6(ls_out_acl_eval), priority=65532, match=(!ct.est && ct.rel && 
> !ct.new && !ct.inv && ct_mark.blocked == 0), action=(reg8[21] = 
> ct_label.network_function_group; reg8[16] = 1; ct_commit_nat;)
>   table=6(ls_out_acl_eval), priority=65532, match=(ct.est && !ct.rel && 
> !ct.new && !ct.inv && ct.rpl && ct_mark.blocked == 0), action=(reg8[21] = 
> ct_label.network_function_group; reg8[16] = 1; next;)
> 
> out_network_function: Priority 99 rule matches on the nfg_id in ct_label and 
> sets the outport to the NF “outport”. It also sets reg8[23]=1 and injects the 
> packet to ingress pipeline (in_l2_lookup).
> Priority 100 rule forwards all packets to NF ports to the next table. 
> 
>   table=11 (ls_out_network_function), priority=100  , match=(outport == 
> "nfp1"), action=(next;)
>   table=11 (ls_out_network_function), priority=100  , match=(outport == 
> "nfp2"), action=(next;)
>   table=11(ls_out_network_function), priority=100  , match=(reg8[21] == 1 && 
> eth.mcast), action=(next;)
>   table=11 (ls_out_network_function), priority=99   , match=(reg8[21] == 1 && 
> reg8[22] == 0 && ct_label.network_function_group_id == 123), action=(outport 
> = "nfp2"; reg8[23] = 1; next(pipeline=ingress, table=29);)
>   table=11 (ls_out_network_function), priority=1    , match=(reg8[21] == 1), 
> action=(drop;)
>   table=11 (ls_out_network_function), priority=0    , match=(1), 
> action=(next;)
> 
> in_l2_lkup: if reg8[23] == 1 (packet has come back from egress), simply 
> forward such packets as outport is already set.
> 
>   table=29(ls_in_l2_lkup), priority=100  , match=(reg8[23] == 1), 
> action=(output;)
> 
> The above set of rules ensure that the response packet is sent to nfp2. When 
> the same packet comes out of nfp1, the ingress pipeline would set the outport 
> to p1 and it enters the egress pipeline.
> 
> out_pre_acl: If the packet is coming from the NF inport, skip the egress 
> pipeline upto the out_nf stage, as the packet has already gone through it and 
> we don't want the same packet to be processed by CT twice.
>   table=2 (ls_out_pre_acl     ), priority=110  , match=(inport == "nfp1"), 
> action=(next(pipeline=egress, table=12);)
> 
> 
> 4.2 to-lport ACL
> ----------------
>       -----                  --------                  -----
>      | VM1 |                |  NF VM |                | VM2 |
>       -----                  --------                  -----
>        / \                    |   / \                    |
>         |                     |    |                     |
>         |                    \ /   |                    \ /
>    -------------------------------------------------------------
>   |     p1                  nfp1   nfp2                  p2     |
>   |                                                             |
>   |                      Logical Switch                         |
>    -------------------------------------------------------------
> ovn-nbctl acl-add pg1 to-lport 100 'outport==@pg1&& ip4.src == $as2' 
> allow-related nfg1
> Diagram shows request traffic path. The response will follow a reverse path.
> 
> Ingress pipeline sets the outport to p1 based on destination MAC lookup. The 
> packet enters the egress pipeline. There the to-lport ACL with NFG gets 
> evaluated and the NFG id gets committed to the CT label. Then the outport is 
> set to nfp2 and then the packet is injected back to ingress. When the same 
> packet comes out of nfp1, it gets forwarded to p1 the normal way.
>>From the response packet from p1, ingress pipeline gets the NFG id from CT 
>>label and accordingly redirects it to nfp1. When it comes out of nfp2 it is 
>>forwarded the normal way.
> 
> 4.2.1 Request processing
> ------------------------
> out_acl_eval: For to-lport ACLs with NFG, the existing rule's action has been 
> enhanced to set:
>  - reg8[21] = 1: to indicate that packet has matched a rule with NFG
>  - reg0[22..29] = <NFG-unique-id>
>  - reg8[22] = <direction> (1: request, 0: response)
> 
>   table=6 (ls_out_acl_eval    ), priority=1100 , match=(reg0[7] == 1 && 
> (outport==@pg1 && ip4.src == $as2)), action=(reg8[16] = 1; reg0[1] = 1; 
> reg8[21] = 1; reg8[22] = 1; reg0[22..29] = 123; next;)
>   table=6 (ls_out_acl_eval    ), priority=1100 , match=(reg0[8] == 1 && 
> (outport==@pg1 && ip4.src == $as2)), action=(reg8[16] = 1; reg0[1] = 1; 
> reg8[21] = 1; reg8[22] = 1; reg0[22..29] = 123; next;)
> 
> 
> 
> Out_stateful: Priority 110: set NFG id in CT label if reg8[21] is set.
> 
>   table=10(ls_out_stateful    ), priority=110  , match=(reg0[1] == 1 && 
> reg0[13] == 0 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; 
> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
> ct_label.network_function_group = 1; ct_label.network_function_group_id = 
> reg0[22..29]; }; next;)
>   table=10(ls_out_stateful    ), priority=110  , match=(reg0[1] == 1 && 
> reg0[13] == 1 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; 
> ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; 
> ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; 
> ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 1; 
> ct_label.network_function_group_id = reg0[22..29]; }; next;)
>   table=10(ls_out_stateful    ), priority=100  , match=(reg0[1] == 1 && 
> reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; 
> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
> ct_label.network_function_group = 0; ct_label.network_function_group_id = 0; 
> }; next;)
>   table=10(ls_out_stateful    ), priority=100  , match=(reg0[1] == 1 && 
> reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; 
> ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; 
> ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; 
> ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 0; 
> ct_label.network_function_group_id = 0; }; next;)
>   table=10(ls_out_stateful    ), priority=0    , match=(1), action=(next;)
> 
> out_network_function: A new stage that has priority 99 rules to redirect 
> packet by setting outport to the NF “outport” (or its child port) based on 
> the NFG id set by the prior ACL stage, and then injecting back to ingress. 
> Priority 100 rules ensure that when the packets are going to NF ports, they 
> are not redirected again.
> Priority 1 rule: if reg8[21] is set, but the NF port (or child port) is not 
> present on this LS, drop packets.
> 
>   table=11(ls_out_network_function), priority=100  , match=(outport == 
> "nfp1"), action=(next;)
>   table=11(ls_out_network_function), priority=100  , match=(outport == 
> "nfp2"), action=(next;)
>   table=11(ls_out_network_function), priority=100  , match=(reg8[21] == 1 && 
> eth.mcast), action=(next;)
>   table=11(ls_out_network_function), priority=99   , match=(reg8[21] == 1 && 
> reg8[22] == 1 && reg0[22..29] == 123), action=(outport = "nfp2"; reg8[23] = 
> 1; next(pipeline=ingress, table=29);)
>   table=11(ls_out_network_function), priority=1    , match=(reg8[21] == 1), 
> action=(drop;)
>   table=11(ls_out_network_function), priority=0    , match=(1), action=(next;)
> 
> 
> in_l2_lkup: As described earlier, the priority 100 rule will forward these 
> packets.
> 
> Then the same packet comes out from nfp1 and goes through the ingress 
> processing where the outport gets set to p1. The egress pipeline out_pre_acl 
> priority 110 rule described earlier, matches against inport as nfp1 and 
> directly jumps to the stage after out_network_function. Thus the packet is 
> not redirected again.
> 
> 4.2.2 Response processing
> -------------------------
> in_acl_eval: High priority rules that allow response and related packets to 
> go through have been enhanced to also copy CT label NFG bit into reg8[21].
> 
>   table=8(ls_in_acl_eval), priority=65532, match=(!ct.est && ct.rel && 
> !ct.new && !ct.inv && ct_mark.blocked == 0), action=(reg0[17] = 1; reg8[21] = 
> ct_label.network_function_group; reg8[16] = 1; ct_commit_nat;)
>   table=8 (ls_in_acl_eval), priority=65532, match=(ct.est && !ct.rel && 
> !ct.new && !ct.inv && ct.rpl && ct_mark.blocked == 0), action=(reg0[9] = 0; 
> reg0[10] = 0; reg0[17] = 1; reg8[21] = ct_label.network_function_group; 
> reg8[16] = 1; next;)
> 
> in_network_function: Priority 99 rule matches on the nfg_id in ct_label and 
> sets the outport to the NF “inport”.
> Priority 100 rule forwards all packets to NF ports to the next table. 
>   table=22(ls_in_network_function), priority=99   , match=(reg8[21] == 1 && 
> reg8[22] == 0 && ct_label.network_function_group_id == 123), action=(outport 
> = "nfp1"; output;)
> 
> 
> 5. Cross-host Traffic for VLAN Network
> ======================================
> For overlay subnets, all cross-host traffic exchanges are tunneled. In the 
> case of VLAN subnets, there needs to be special handling to selectively 
> tunnel only the traffic to or from the NF ports.
> Take the example of a from-lport ACL. Packets from p1 to p2, gets redirected 
> to nfp1 in host1. If this packet is simply sent out from host1, the physical 
> network will directly forward it to host2 where VM2 is. So, we need to tunnel 
> the redirected packets from host1 to host3. Now, once the packets come out of 
> nfp2, if host3 sends the packets out, the physical network would learn p1's 
> MAC coming from host3. So, these packets need to be tunneled back to host1. 
> From there the packet would be forwarded to VM2 via the physical network.
> 
>       -----                  -----                  --------
>      | VM2 |                | VM1 |                | NF VM  |
>       -----                  -----                  --------
>        / \                     |                    / \   |
>         | (7)                  |  (1)             (3)|    |(4)
>         |                     \ /                    |   \ /
>   --------------        --------------   (2)    ---------------
>  |      p2      |  (6) |      p1      |______\ |   nfp1  nfp2  |
>  |              |/____ |              |------/ |               |
>  |    host2     |\     |     host1    |/______ |     host3     |
>  |              |      |              |\------ |               |
>   --------------        --------------   (5)    --------------
> 
> The above figure shows the request packet path for a from-lport ACL. Response 
> would follow the same path in reverse direction.
> 
> To achieve this, the following would be done:
> 
> On host where the ACL port group members are present (host1)
> —-----------------------------------------------------------
> REMOTE_OUTPUT (table 42):
> Currently, it tunnels traffic destined to all non-local overlay ports to 
> their associated hosts. The same rule is now also added for traffic to 
> non-local NF ports. Thus the packets from p1 get tunneled to host 3.
> 
> On host with NF (host3) forward packet to nfp1
> —----------------------------------------------
> Upon reaching host3, the following rules come into play:
> PHY_TO_LOG (table 0):
> Ppriority 100: Existing rule - for each geneve tunnel interface on the 
> chassis, copies info from header to inport, outport, metadata registers. Now 
> the same rule also stores the tun intf id in a register (reg5[16..31]).
> 
> CHECK_LOOPBACK (table 44)
> This table has a rule that clears all the registers. The change is to skip 
> the clearing of reg5[16..31].
> 
> Logical egress pipeline:
> 
> ls_out_stateful priority 120: If the outport is an NF port, copy reg5[16..31] 
> (table0 had set it) to ct_label.tun_if_id.)
> 
>   table=10(ls_out_stateful    ), priority=120  , match=(outport == "nfp1" && 
> reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; 
> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
> ct_label.tun_if_id = reg5[16..31]; }; next;)
>   table=10(ls_out_stateful    ), priority=120  , match=(outport == "nfp1" && 
> reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; 
> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
> ct_mark.obs_stage = reg8[19..20]; ct_mark.obs_collector_id = reg8[8..15]; 
> ct_label.obs_point_id = reg9; ct_label.tun_if_id = reg5[16..31]; }; next;)
> 
> The above sequence of flows ensure that if a packet is received via tunnel on 
> host3, with outport as nfp1, the tunnel interface id is committed to the ct 
> entry in nfp1's zone.
> 
> On host with NF (host3) tunnel packets from nfp2 back to host1
> —--------------------------------------------------------------
> When the same packet comes out of nfp2 on host3:
> 
> LOCAL_OUTPUT (table 43)
> When the packet comes out of the other NF port (nfp2), following two rules 
> send it back to the host that it originally came from:
> 
> Priority 110: For each NF port local to this host, following rule processes 
> the
> packet through CT of linked port (for nfp2, it is nfp1):
>   match: inport==nfp2 && RECIRC_BIT==0
>   action: RECIRC_BIT = 1, ct(zone=nfp1’s zone, table=LOCAL), resubmit to 
> table 43
> 
> Priority 109: For each {tunnel_id, nf port} on this host, if the tun_if_id in 
> ct_label matches the tunnel_id, send the recirculated packet using tunnetl_id:
>   match: inport==nfp1 && RECIRC_BIT==1 && ct_label.tun_if_id==<tun-id>
>   action: tunnel packet using tun-id
> 
> If p1 and nfp1 happen to be on the same host, the tun_if_id would not be set 
> and thus none of the priority 109 rules would match. It would be forwarded 
> the usual way matching the existing priority 100 rules in LOCAL_TABLE. 
> 
> Special handling of the case where NF responds back on nfp1, instead of 
> forwarding packet out of nfp2:
> For example, a SYN packet from p1 got redirected to nfp1. Then the NF, which 
> is a firewall VM, drops the SYN and sends RST back on port nfp1. In this 
> case, looking up in the linked port (nfp2) ct zone will not give anything. 
> The following rule uses ct.inv to identify such scenarios and uses nfp1’s CT 
> zone to send the packet back. To achieve this, following 2 rules are 
> installed:
>     
> in_network_function:
> Priority 100 rule that allows packets incoming from NF type ports, is 
> enhanced with additional action to store the tun_if_id from ct_label into 
> reg5[16..31].
>   table=22(ls_in_network_function), priority=100  , match=(inport == "nfp1"), 
> action=(reg5[16..31] = ct_label.tun_if_id; next;)
> 
> LOCAL_OUTPUT (table 43)
> Priority 110 rule: for recirculated packets, if ct (of the linked port) is 
> invalid, use the tun id from reg5[16..31] to tunnel the packet back to host1 
> (as CT zone info has been overwritten in the above 110 priority rule in table 
> 42). 
>       match: inport==nf1 && RECIRC_BIT==1 && ct.inv && 
> MFF_LOG_TUN_OFPORT==<tun-id>
>       action: tunnel packet using tun-id
> 
> 
> 6. NF insertion across logical switches
> =======================================
> If the port-group where the ACL is being applied has members across multiple 
> logical switches, there needs to be a NF port pair on each of these switches.
> The NF VM will have only one inport and one outport. The CMS is expected to 
> create child ports linked to these ports on each logical switch where 
> port-group members are present.
> The network-function entity would be configured with the parent ports only. 
> When CMS creates the child ports, it does not need to change any of the NF, 
> NFG or ACL config tables.
> When northd configures the redirection rules for a specific LS, it will use 
> the parent or child port depending on what it finds on that LS.
>                                      --------
>                                     | NF VM  |
>                                      --------
>                                      |      |
>           -----                      |      |              -----
>          | VM1 |                    nfp1   nfp2           | VM2 |
>           ---- -   |     |         --------------          -----    |      |
>             |      |     |        |    SVC LS    |          |       |      |
>           p1|  nfp1_ch1  nfp2_ch1  --------------         p3|  nfp1_ch2  
> nfp2_ch2
>           --------------------                             
> --------------------
>          |         LS1        |                           |         LS2       
>  |
>           --------------------                             
> --------------------
> 
> In this example, the CMS created the parent ports for the NF VM on LS named 
> SVC LS. The ports are nfp1 and nfp2. The CMS configures the NF using these 
> ports:
> ovn-nbctl nf-add nf1 nfp1 nfp2
> ovn-nbctl nfg-add nfg1 123 inline nf1
> ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' 
> allow-related nfg1
> 
> The port group to which the ACL is applied is pg1 and pg1 has two ports: p1 
> on LS1 and p3 on LS2.
> The CMS needs to create child ports for the NF ports on LS1 and LS2. On LS1: 
> nfp1_ch1 and nfp2_ch1. On LS2: nfp1_ch2 and nfp2_ch2
> 
> When northd creates rules on LS1, it would use nfp1_ch1 and nfp2_ch1.
> 
>   table=22(ls_in_network_function), priority=100  , match=(inport == 
> "nfp2_ch1"), action=(reg5[16..31] = ct_label.tun_if_id; next;)
>   table=22(ls_in_network_function), priority=99   , match=(reg8[21] == 1 && 
> reg8[22] == 1 && reg0[22..29] == 123), action=(outport = "nfp1_ch1"; output;)
> 
> When northd is creating rules on LS2, it would use nfp1_ch2 and nfp2_ch2.
>   table=22(ls_in_network_function), priority=100  , match=(inport == 
> "nfp2_ch2"), action=(reg5[16..31] = ct_label.tun_if_id; next;)
>   table=22(ls_in_network_function), priority=99   , match=(reg8[21] == 1 && 
> reg8[22] == 1 && reg0[22..29] == 123), action=(outport = "nfp1_ch2"; output;)
> 
> 
> 7. Health Monitoring
> ====================
> The LB health monitoring functionality has been extended to support NFs. 
> Network_Function_Group has a list of Network_Functions, each of which has a 
> reference to network_Function_Health_Check that has the monitoring config. 
> There is a corresponding SB service_monitor maintaining the online/offline 
> status. When status changes, northd picks one of the “online” NFs and sets it 
> in the network_function_active field of NFG. The redirection rule in LS uses 
> the ports from this NF.
> 
> Ovn-controller performs the health monitoring by sending ICMP echo request 
> with source IP and MAC from NB global options “svc_monitor_ip4” and 
> “svc_monitor_mac”, and destination IP and MAC from new NB global options 
> “svc_monitor_ip4_dst” and “svc_monitor_mac_dst”. The sequence number and id 
> are randomly generated and stored in service_mon. The NF VM forwards the same 
> packet out of the other port. When it comes out, ovn-controller matches the 
> sequence number and id with stored values and marks online if matched.
> 
> V1:
>   - First patch.
> 
> V2:
>   - Rebased code.
>   - Added "mode" field in Network_function_group table, with only allowed
>     value as "inline". This is for future expansion to include "mirror" mode.
>   - Added a flow in the in_network_function and out_network_function table to
>     skip redirection of multicast traffic.
> 
> V3:
>  - Rebased code.
> 
> V4:
>  - Rebased code.
> 
> V5:
>  - FIxed issue of packet duplication for NF on overlay. Added a check
>    in physical/controller.c so that only for localnet case the new flows
>    in table 44 (that tunnel packets back to source host based on tunnel
>    i/f stored in CT) get installed.
>  - Fixed packet drop issue for NF on overlay. This happens when port1 is on 
>    host1, and port2 & NF are on host2; packets sent from port2 to port1
>    are redirected to NF port as intended but it is not reaching host2.
>    This is because OVS drops packets that are sent to same port on which
>    they were received (in this case packet received on host1 on
>    a tunnel interface and being forwarded to the same tunnel). Updated
>    the table 43 rule for remote overlay ports to clear in_port to get
>    around this.
>  - Added unit tests in ovn.at to cover various multi-chassis scenarios
>    for both VLAN and overlay cases.
> 
> V6:
> - Addressed review comments from Numan Siddique.
> - Updated ovn-northd.8.xml and ovn-nbctl.8.xml.
> - Unit test added for health monitoring of network function
> - System test added for single chassis case. Multichassis ST to be added 
> later.
>   There were already the multi-chassis cases added in ovn.at.
> 
> V7:
> - Rebased
> - Fixed unit test failures resulting from new IC SB service monitor feature
> 
> V8:
> - Addressed review comments from Dumitru Ceara. Main changes are listed below.
> - NFG unique id is not generated internally, but is configured by CMS.
> - NFG register changed from reg5[0…7] to reg0[22..29]
> - IPv6 support for health probes
> - Update to NEWs and various xml files for proper documentation.
> 
> V9:
> - Addressed new review comments from Dumitru Ceara. Main changes are as below.
> - Changed isRoot from true to false for Network_Function_Health_Check.
> - Changed the behaviour of --may-exist from no-op to update.
> - In nbctl commands, replaced "network-function" with "nf" and
>   "network-function-group" with "nfg".
> - The change related to skipping the clearing of tunnel id part of reg5 is now
>   being done only for NF datapaths.
> - Fixed the intermittent test failures earlier being caused by patch5, by
>   sorting the svc_macs.
> - Fixed backward compatibility with respect to "type" field in 
> service_monitor.
> - New system test added for the case without health_check.
> - Replaced the scoring mechanism for NF election with a deterministic method.
> - Moved some test and xml changes to other patches to align with the changes.
> - Added a few TODO items. Split the NEWs change across multiple patches.
> 

Hi Sragdhara, Naveen, Karthik, Numan

Thank you for this new revision and for the reviews.  Also thanks for
bearing with us while trying to get this new feature in OVN!  I know
it's been a challenge.

Overall I didn't see any major issues in the series.  There's quite a
lot of code being added/changed so I'm quite sure we missed some
things.  However, we have quite a lot of time to address any of those
in the remainder of the 26.03 development cycle.

I did apply some small changes to the last 3 patches of the series,
mostly style issues and then pushed the series to main.

But I also have a few items I'd like to try to follow up on.  It would
be great if you could set some time aside to look into that.  Please
see below, listed for the corresponding patches.

> Sragdhara Datta Chaudhuri (5):
>   ovn-nb: Network Function insertion OVN-NB schema changes.
>   ovn-nbctl: Network Function insertion commands.
>   northd, tests: Network Function insertion logical flow programming.
>   controller, tests: Network Function insertion tunneling of cross-host
>     VLAN traffic.

This patch changes the PHY_TO_LOG() table to automatically load the
"tun intf id" for decapsulated packets into a "random" register,
reg5[16..31].  I know we also define a logical field for these 16 bits
of reg5:

#define MFF_LOG_TUN_OFPORT MFF_REG5   /* 16..31 of the 32 bits */

But that seems a bit error prone.  We also (for NF traffic) skip
clearing these bits when going from the logical ingress pipeline to
the logical egress one.

I think it might make sense to have a dedicated (set of) registers
that are preserved between logical pipelines.  The problem is we
currently are out of OVS registers that we can use:

/* Logical fields.
 *
 * These values are documented in ovn-architecture(7), please update the
 * documentation if you change any of them. */
#define MFF_LOG_DATAPATH MFF_METADATA /* Logical datapath (64 bits). */
#define MFF_LOG_FLAGS      MFF_REG10  /* One of MLF_* (32 bits). */
#define MFF_LOG_DNAT_ZONE  MFF_REG11  /* conntrack dnat zone for gateway router
                                       * (32 bits). */
#define MFF_LOG_SNAT_ZONE  MFF_REG12  /* conntrack snat zone for gateway router
                                       * (32 bits). */
#define MFF_LOG_CT_ZONE    MFF_REG13  /* Logical conntrack zone for lports
                                       * (0..15 of the 32 bits). */
#define MFF_LOG_ENCAP_ID   MFF_REG13  /* Encap ID for lports
                                       * (16..31 of the 32 bits). */
#define MFF_LOG_INPORT     MFF_REG14  /* Logical input port (32 bits). */
#define MFF_LOG_OUTPORT    MFF_REG15  /* Logical output port (32 bits). */

I had a quick glance at OVS' code and I _think_ there should be no
major issue if we add a nother xxreg (i.e., 2 extra xregs / 4 extra regs).

We're also running low on available bits in the logical flags register
(that's also preserved between ingress and egress) so having some extra
register storage space to work with would make our lives easier.  The
impact of adding new registers should be relatively minimal (I guess) as
it would only potentially increase the upcall handling time but likely
with a marginal amount.

I'm CC-ing Ilya to see if such a change would be acceptable in OVS.

>   northd, controller, tests: Network Function Health monitoring.

The new system test in this patch blindly kills any "random" tcpdump
instance that might be running on the host where the tests are being
executed.  We should probably improve this test to use
NETNS_START_TCPDUMP() or something similar.

Also, with this last patch, I saw the "622: Network function packet
flow - outbound" test fail in GitHub CI a couple of times.  I couldn't
reproduce the problem locally and after rebase it seems to pass. But
it would be great if you could monitor this test and fix it in the
future if it starts failing again.

Thanks again for all the hard work on this new feature!

Best regards,
Dumitru

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to