> On 8 Oct 2025, at 2:59 PM, Dumitru Ceara <[email protected]> wrote:
> 
> !-------------------------------------------------------------------|
> CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> On 10/7/25 8:36 PM, Sragdhara Datta Chaudhuri wrote:
>> Hi Dumitru,
>> 
> 
> Hi Sragdhara,
> 
>> Thanks so much for your detailed review over multiple iterations.
>> 
>> Numan, thank you as well for your valuable inputs.
>> 
>> Regarding the points you mentioned in this email:
>> 
>> *
>>   Agree that dedicated registers would be good. i picked reg5 as no
>>   other registers were completely free. The lack of registers would
>>   hamper other features in future as well.
>> *
>>   We’ll take a look at the system test tcpdump issue and fix it in a
>>   future patch.
>> *
>>   We noticed the test 622 (Network function packet flow
>>   outbound) failure but couldn’t reproduce in in-house even after
>>   running in a loop. We’ll look into it.
>> 
> 
> Thank you!  Let me know if you need help reproducing this in CI.

Hi Dumitru,

I tried reproducing the issue locally in CI with the latest main but not able to
even after multiple runs.

I sent patch to fix the tcpdump issue for NF system test.

Thanks,
Naveen

> 
> Regards,
> Dumitru
> 
>> 
>> Thanks,
>> Sragdhara
>> 
>> *From: *Dumitru Ceara <[email protected]>
>> *Date: *Friday, October 3, 2025 at 2:01 AM
>> *To: *Sragdhara Datta Chaudhuri <[email protected]>, ovs-
>> [email protected] <[email protected]>
>> *Cc: *Numan Siddique <[email protected]>, Naveen Yerramneni
>> <[email protected]>, Karthik Chandrashekar
>> <[email protected]>, Ilya Maximets <[email protected]>
>> *Subject: *Re: [ovs-dev] [PATCH OVN v9 0/5] Network Function Insertion.
>> 
>> !-------------------------------------------------------------------|
>>  CAUTION: External Email
>> 
>> |-------------------------------------------------------------------!
>> 
>> On 9/15/25 9:34 PM, Sragdhara Datta Chaudhuri wrote:
>>> RFC: NETWORK FUNCTION INSERTION IN OVN
>>> 
>>> 1. Introduction
>>> ================
>>> The objective is to insert a Network Function (NF) in the path of
>> outbound/inbound traffic from/to a port-group. The use case is to
>> integrate a 3rd party service in the path of traffic. An example of such
>> a service would be layer7 firewall. The NF VM will be like a bump in the
>> wire and should not modify the packet, i.e. the IP header, the MAC
>> addresses, VLAN tag, sequence numbers remain unchanged.
>>> 
>>> Here are some of the highlights:
>>> - A new entity network-function (NF) has been introduced. It contains
>> a pair of LSPs. The CMS would designate one as “inport” and the other as
>> “outport”.
>>> - For high-availability, a network function group (NFG) entity
>> consists of a group of NFs. Only one NF in a NFG has an active role
>> based on health monitoring.
>>> - ACL would accept NFG as a parameter and traffic matching the ACL
>> would be redirected to the associated active NF’s port. NFG is accepted
>> for stateful allow action only.
>>> - The ACL’s port-group is the point of reference when defining the
>> role of the NF ports. The “inport” is the port closer to the port-group
>> and “outport” is the one away from it. For from-lport ACLs, the request
>> packets would be redirected to the NF “inport” and for to-lport ACLs,
>> the request packets would be redirected to NF “outport”. When the same
>> packet comes out of the other NF port, it gets simply forwarded.
>>> - Statefulness will be maintained, i.e. the response traffic will also
>> go through the same pair of NF ports but in reverse order.
>>> - For the NF ports we need to disable port security check, fdb
>> learning and multicast/broadcast forwarding.
>>> - Health monitoring involves ovn-controller periodically injecting
>> ICMP probe packets into the NF inport and monitor the same packet coming
>> out of the NF outport.
>>> - If the traffic redirection involves cross-host traffic (e.g. for a
>> from-lport ACL, if the source VM and NF VM are on different hosts),
>> packets would be tunneled to and from the NF VM's host.
>>> - If the port-group to which the ACL is being applied has members
>> spread across multiple LSs, CMS needs to create child ports for the NF
>> ports on each of these LSs. The redirection rules in each LS will use
>> the child ports on that LS.
>>> 
>>> 2. NB tables
>>> =============
>>> New NB tables
>>> —------------
>>> Network_Function: Each row contains {inport, outport, health_check}
>>> Network_Function_Group: Each row contains a list of Network_Function
>> entities. It also contains a unique id (between 1 and 255) and a
>> reference to the current active NF.
>>> Network_Function_Health_Check: Each row contains configuration for
>> probes in options field: {interval, timeout, success_count, failure_count}
>>> 
>>>         "Network_Function_Health_Check": {
>>>             "columns": {
>>>                 "name": {"type": "string"},
>>>                 "options": {
>>>                      "type": {"key": "string",
>>>                               "value": "string",
>>>                               "min": 0,
>>>                               "max": "unlimited"}},
>>>                 "external_ids": {
>>>                     "type": {"key": "string", "value": "string",
>>>                              "min": 0, "max": "unlimited"}}},
>>>             "isRoot": true},
>>>         "Network_Function": {
>>>             "columns": {
>>>                 "name": {"type": "string"},
>>>                 "outport": {"type": {"key": {"type": "uuid",
>>>                                              "refTable":
>> "Logical_Switch_Port",
>>>                                              "refType": "strong"},
>>>                                      "min": 1, "max": 1}},
>>>                 "inport": {"type": {"key": {"type": "uuid",
>>>                                             "refTable":
>> "Logical_Switch_Port",
>>>                                             "refType": "strong"},
>>>                                     "min": 1, "max": 1}},
>>>                 "health_check": {"type": {
>>>                     "key": {"type": "uuid",
>>>                             "refTable": "Network_Function_Health_Check",
>>>                             "refType": "strong"},
>>>                     "min": 0, "max": 1}},
>>>                 "external_ids": {
>>>                     "type": {"key": "string", "value": "string",
>>>                              "min": 0, "max": "unlimited"}}},
>>>             "isRoot": true},
>>>         "Network_Function_Group": {
>>>             "columns": {
>>>                 "name": {"type": "string"},
>>>                 "network_function": {"type":
>>>                                   {"key": {"type": "uuid",
>>>                                            "refTable": "Network_Function",
>>>                                            "refType": "strong"},
>>>                                            "min": 0, "max": "unlimited"}},
>>>                 "mode": {"type": {"key": {"type": "string",
>>>                                           "enum": ["set", ["inline"]]}}},
>>>                 "network_function_active": {"type":
>>>                                   {"key": {"type": "uuid",
>>>                                            "refTable": "Network_Function",
>>>                                            "refType": "strong"},
>>>                                            "min": 0, "max": 1}},
>>>                 "id": {
>>>                      "type": {"key": {"type": "integer",
>>>                                       "minInteger": 0,
>>>                                       "maxInteger": 255}}},
>>>                 "external_ids": {
>>>                     "type": {"key": "string", "value": "string",
>>>                              "min": 0, "max": "unlimited"}}},
>>>             "isRoot": true},
>>> 
>>> 
>>> Modified NB table
>>> —----------------
>>> ACL: The ACL entity would have a new optional field that is a
>> reference to a Network_Function_Group entity. This field can be present
>> only for stateful allow ACLs.
>>> 
>>>         "ACL": {
>>>             "columns": {
>>>                 "network_function_group": {"type": {"key": {"type":
>> "uuid",
>>>                                            "refTable":
>> "Network_Function_Group",
>>>                                            "refType": "strong"},
>>>                                            "min": 0,
>>>                                            "max": 1}},
>>> 
>>> New options for Logical_Switch_Port
>>> —----------------------------------
>>> receive_multicast=<boolean>: Default true. If set to false, LS will
>> not forward broadcast/multicast traffic to this port. This is to prevent
>> looping of such packets.
>>> 
>>> lsp_learn_mac=<boolean>: Default true. If set to false, fdb learning
>> will be skipped for packets coming out of this port. Redirected packets
>> from the NF port would be carrying the originating VM’s MAC in source,
>> and so learning should not happen.
>>> 
>>> CMS needs to set both the above options to false for NF ports, in
>> addition to disabling port security.
>>> 
>>> nf-linked-port=<lsp-name>: Each NF port needs to have this set to the
>> other NF port of the pair.
>>> 
>>> is-nf=<boolean>: Each NF port needs to have this set to true.
>>> 
>>> New NB_global options
>>> —--------------------
>>> svc_monitor_mac_dst: destination MAC of probe packets (svc_monitor_mac
>> is already there and will be used as source MAC)
>>> svc_monitor_ip: source IP of probe packets
>>> svc_monitor_ip_dst: destination IP of probe packets
>>> 
>>> Sample configuration
>>> —-------------------
>>> ovn-nbctl ls-add ls1
>>> ovn-nbctl lsp-add ls1 nfp1
>>> ovn-nbctl lsp-add ls1 nfp2
>>> ovn-nbctl set logical_switch_port nfp1 options:receive_multicast=false
>> options:lsp_learn_mac=false options:nf-linked-port=nfp2
>>> ovn-nbctl set logical_switch_port nfp2 options:receive_multicast=false
>> options:lsp_learn_mac=false options:nf-linked-port=nfp1
>>> ovn-nbctl nf-add nf1 nfp1 nfp2
>>> ovn-nbctl nfg-add nfg1 123 inline nf1
>>> ovn-nbctl lsp-add ls1 p1 -- lsp-set-addresses p1 "50:6b:8d:3e:ed:c4
>> 10.1.1.4"
>>> ovn-nbctl pg-add pg1 p1
>>> ovn-nbctl create Address_Set name=as1 addresses=10.1.1.4
>>> ovn-nbctl lsp-add ls1 p2 -- lsp-set-addresses p2 "50:6b:8d:3e:ed:c5
>> 10.1.1.5"
>>> ovn-nbctl create Address_Set name=as2 addresses=10.1.1.5
>>> ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2'
>> allow-related nfg1
>>> ovn-nbctl acl-add pg1 to-lport 100 'outport==@pg1 && ip4.src == $as2'
>> allow-related nfg1
>>> 
>>> 3. SB tables
>>> ============
>>> Service_Monitor:
>>> This is currently used by Load balancer. New fields are: “type” - to
>> indicate LB or NF, “mac” - the destination MAC address for monitor
>> packets, “logical_input_port” - the LSP to which the probe packet would
>> be sent. Also, “icmp” has been added as a protocol type, used only for NF.
>>> 
>>>          "Service_Monitor": {
>>>              "columns": {
>>>                "type": {"type": {"key": {
>>>                           "type": "string",
>>>                           "enum": ["set", ["load-balancer", "network-
>> function"]]}}},
>>>                "mac": {"type": "string"},
>>>                  "protocol": {
>>>                      "type": {"key": {"type": "string",
>>>                             "enum": ["set", ["tcp", "udp", "icmp"]]},
>>>                               "min": 0, "max": 1}},
>>>                "logical_input_port": {"type": "string"},
>>> 
>>> northd would create one Service_Monitor entity for each NF. The
>> logical_input_port and logical_port would be populated from the NF
>> inport and outport fields respectively. The probe packets would be
>> injected into the logical_input_port and would be monitored out of
>> logical_port.
>>> 
>>> 4. Logical Flows
>>> ================
>>> Logical Switch ingress pipeline:
>>> - in_network_function added after in_stateful.
>>> - Modifications to in_acl_eval, in_stateful and in_l2_lookup.
>>> Logical Switch egress pipeline:
>>> - out_network_function added after out_stateful.
>>> - Modifications to out_pre_acl, out_acl_eval and out_stateful.
>>> 
>>> 4.1 from-lport ACL
>>> ------------------
>>> The diagram shows the request path for packets from VM1 port p1, which
>> is a member of the pg to which ACL is applied. The response would follow
>> the reverse path, i.e. packet would be redirected to nfp2 and come out
>> of nfp1 and be forwarded to p1.
>>> Also, p2 does not need to be on the same LS. Only the p1, nfp1, nfp2
>> are on the same LS.
>>> 
>>>       -----                  -------                  -----
>>>      | VM1 |                | NF VM |                | VM2 |
>>>       -----                  -------                  -----
>>>         |                    /\    |                   / \
>>>         |                    |     |                    |
>>>        \ /                   |    \ /                   |
>>>    ------------------------------------------------------------
>>>   |     p1                 nfp1  nfp2                   p2     |
>>>   |                                                            |
>>>   |                      Logical Switch                        |
>>>    -------------------------------------------------------------
>>> pg1: [p1]         as2: [p2-ip]
>>> ovn-nbctl nf-add nf1 nfp1 nfp2
>>> ovn-nbctl nfg-add nfg1 123 inline nf1
>>> ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2'
>> allow-related nfg1
>>> 
>>> The request packets from p1 matching a from-lport ACL with NFG, are
>> redirected to nfp1 and the NFG id is committed to the ct label in p1's
>> zone. When the same packet comes out of nfp2 it gets forwarded the
>> normal way.
>>> Response packets have destination as p1's MAC. Ingress processing sets
>> the outport to p1 and the CT lookup in egress pipeline (in p1's ct zone)
>> yields the NFG id and the packet injected back to ingress pipeline after
>> setting the outport to nfp2.
>>> 
>>> Below are the changes in detail.
>>> 
>>> 4.1.1 Request processing
>>> ------------------------
>>> 
>>> in_acl_eval: For from-lport ACLs with NFG, the existing rule's action
>> has been enhanced to set:
>>>  - reg8[21] = 1: to indicate that packet has matched a rule with NFG
>>>  - reg0[22..29] = <NFG-unique-id>
>>>  - reg8[22] = <direction> (1: request, 0: response)
>>> 
>>>   table=8 (ls_in_acl_eval), priority=1200 , match=(reg0[7] == 1 &&
>> (inport==@pg1 && ip4.dst == $as2)), action=(reg8[16] = 1; reg0[1] = 1;
>> reg8[21] = 1; reg8[22] = 1; reg0[22..29] = 123; next;)
>>>   table=8 (ls_in_acl_eval), priority=1200 , match=(reg0[8] == 1 &&
>> (inport==@pg1 && ip4.dst == $as2)), action=(reg8[16] = 1; reg8[21] = 1;
>> reg8[22] = 1; reg0[22..29] = 123; next;)
>>> 
>>> in_stateful: Priority 110: set NFG id in CT label if reg8[21] is set.
>>>  - bit 7 (ct_label.network_function_group): Set to 1 to indicate NF
>> insertion.
>>>  - bits 17 to 24 (ct_label.network_function_group_id): Stores the 8
>> bit NFG id
>>> 
>>>   table=21(ls_in_stateful     ), priority=110  , match=(reg0[1] == 1
>> && reg0[13] == 0 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked
>> = 0; ct_mark.allow_established = reg0[20]; ct_label.acl_id =
>> reg2[16..31]; ct_label.network_function_group = 1;
>> ct_label.network_function_group_id = reg0[22..29]; }; next;)
>>>   table=21(ls_in_stateful     ), priority=110  , match=(reg0[1] == 1
>> && reg0[13] == 1 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked
>> = 0; ct_mark.allow_established = reg0[20]; ct_mark.obs_stage =
>> reg8[19..20]; ct_mark.obs_collector_id = reg8[8..15];
>> ct_label.obs_point_id = reg9; ct_label.acl_id = reg2[16..31];
>> ct_label.network_function_group = 1; ct_label.network_function_group_id
>> = reg0[22..29]; }; next;)
>>>   table=21(ls_in_stateful     ), priority=100  , match=(reg0[1] == 1
>> && reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0;
>> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31];
>> ct_label.network_function_group = 0; ct_label.network_function_group_id
>> = 0; }; next;)
>>>   table=21(ls_in_stateful     ), priority=100  , match=(reg0[1] == 1
>> && reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0;
>> ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20];
>> ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9;
>> ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 0;
>> ct_label.network_function_group_id = 0; }; next;)
>>>   table=21(ls_in_stateful     ), priority=0    , match=(1), action=(next;)
>>> 
>>> 
>>> For non-NFG cases, the existing priority 100 rules will be hit. There
>> additional action has been added to clear the NFG bits in ct label.
>>> 
>>> in_network_function: A new stage with priority 99 rules to redirect
>> packets by setting outport to the NF “inport” (or its child port) based
>> on the NFG id set by the prior ACL stage.
>>> Priority 100 rules ensure that when the same packets come out of the
>> NF ports, they are not redirected again (the setting of reg5 here
>> relates to the cross-host packet tunneling and will be explained later).
>>> Priority 1 rule: if reg8[21] is set, but the NF port (or child port)
>> is not present on this LS, drop packets.
>>> 
>>>   table=22(ls_in_network_function), priority=100  , match=(inport ==
>> "nfp1"), action=(reg5[16..31] = ct_label.tun_if_id; next;)
>>>   table=22(ls_in_network_function), priority=100  , match=(inport ==
>> "nfp2"), action=(reg5[16..31] = ct_label.tun_if_id; next;)
>>>   table=22(ls_in_network_function), priority=100  , match=(reg8[21] ==
>> 1 && eth.mcast), action=(next;)
>>>   table=22(ls_in_network_function), priority=99   , match=(reg8[21] ==
>> 1 && reg8[22] == 1 && reg0[22..29] == 123), action=(outport = "nfp1";
>> output;)
>>>   table=22(ls_in_network_function), priority=1    , match=(reg8[21] ==
>> 1), action=(drop;)
>>>   table=22(ls_in_network_function), priority=0    , match=(1),
>> action=(next;)
>>> 
>>> 
>>> 4.1.2 Response processing
>>> -------------------------
>>> out_acl_eval: High priority rules that allow response and related
>> packets to go through have been enhanced to also copy CT label NFG bit
>> into reg8[21].
>>> 
>>>   table=6(ls_out_acl_eval), priority=65532, match=(!ct.est && ct.rel
>> && !ct.new && !ct.inv && ct_mark.blocked == 0), action=(reg8[21] =
>> ct_label.network_function_group; reg8[16] = 1; ct_commit_nat;)
>>>   table=6(ls_out_acl_eval), priority=65532, match=(ct.est && !ct.rel
>> && !ct.new && !ct.inv && ct.rpl && ct_mark.blocked == 0),
>> action=(reg8[21] = ct_label.network_function_group; reg8[16] = 1; next;)
>>> 
>>> out_network_function: Priority 99 rule matches on the nfg_id in
>> ct_label and sets the outport to the NF “outport”. It also sets
>> reg8[23]=1 and injects the packet to ingress pipeline (in_l2_lookup).
>>> Priority 100 rule forwards all packets to NF ports to the next table.
>>> 
>>>   table=11 (ls_out_network_function), priority=100  , match=(outport
>> == "nfp1"), action=(next;)
>>>   table=11 (ls_out_network_function), priority=100  , match=(outport
>> == "nfp2"), action=(next;)
>>>   table=11(ls_out_network_function), priority=100  , match=(reg8[21]
>> == 1 && eth.mcast), action=(next;)
>>>   table=11 (ls_out_network_function), priority=99   , match=(reg8[21]
>> == 1 && reg8[22] == 0 && ct_label.network_function_group_id == 123),
>> action=(outport = "nfp2"; reg8[23] = 1; next(pipeline=ingress, table=29);)
>>>   table=11 (ls_out_network_function), priority=1    , match=(reg8[21]
>> == 1), action=(drop;)
>>>   table=11 (ls_out_network_function), priority=0    , match=(1),
>> action=(next;)
>>> 
>>> in_l2_lkup: if reg8[23] == 1 (packet has come back from egress),
>> simply forward such packets as outport is already set.
>>> 
>>>   table=29(ls_in_l2_lkup), priority=100  , match=(reg8[23] == 1),
>> action=(output;)
>>> 
>>> The above set of rules ensure that the response packet is sent to
>> nfp2. When the same packet comes out of nfp1, the ingress pipeline would
>> set the outport to p1 and it enters the egress pipeline.
>>> 
>>> out_pre_acl: If the packet is coming from the NF inport, skip the
>> egress pipeline upto the out_nf stage, as the packet has already gone
>> through it and we don't want the same packet to be processed by CT twice.
>>>   table=2 (ls_out_pre_acl     ), priority=110  , match=(inport ==
>> "nfp1"), action=(next(pipeline=egress, table=12);)
>>> 
>>> 
>>> 4.2 to-lport ACL
>>> ----------------
>>>       -----                  --------                  -----
>>>      | VM1 |                |  NF VM |                | VM2 |
>>>       -----                  --------                  -----
>>>        / \                    |   / \                    |
>>>         |                     |    |                     |
>>>         |                    \ /   |                    \ /
>>>    -------------------------------------------------------------
>>>   |     p1                  nfp1   nfp2                  p2     |
>>>   |                                                             |
>>>   |                      Logical Switch                         |
>>>    -------------------------------------------------------------
>>> ovn-nbctl acl-add pg1 to-lport 100 'outport==@pg1&& ip4.src == $as2'
>> allow-related nfg1
>>> Diagram shows request traffic path. The response will follow a reverse
>> path.
>>> 
>>> Ingress pipeline sets the outport to p1 based on destination MAC
>> lookup. The packet enters the egress pipeline. There the to-lport ACL
>> with NFG gets evaluated and the NFG id gets committed to the CT label.
>> Then the outport is set to nfp2 and then the packet is injected back to
>> ingress. When the same packet comes out of nfp1, it gets forwarded to p1
>> the normal way.
>>>> From the response packet from p1, ingress pipeline gets the NFG id
>> from CT label and accordingly redirects it to nfp1. When it comes out of
>> nfp2 it is forwarded the normal way.
>>> 
>>> 4.2.1 Request processing
>>> ------------------------
>>> out_acl_eval: For to-lport ACLs with NFG, the existing rule's action
>> has been enhanced to set:
>>>  - reg8[21] = 1: to indicate that packet has matched a rule with NFG
>>>  - reg0[22..29] = <NFG-unique-id>
>>>  - reg8[22] = <direction> (1: request, 0: response)
>>> 
>>>   table=6 (ls_out_acl_eval    ), priority=1100 , match=(reg0[7] == 1
>> && (outport==@pg1 && ip4.src == $as2)), action=(reg8[16] = 1; reg0[1] =
>> 1; reg8[21] = 1; reg8[22] = 1; reg0[22..29] = 123; next;)
>>>   table=6 (ls_out_acl_eval    ), priority=1100 , match=(reg0[8] == 1
>> && (outport==@pg1 && ip4.src == $as2)), action=(reg8[16] = 1; reg0[1] =
>> 1; reg8[21] = 1; reg8[22] = 1; reg0[22..29] = 123; next;)
>>> 
>>> 
>>> 
>>> Out_stateful: Priority 110: set NFG id in CT label if reg8[21] is set.
>>> 
>>>   table=10(ls_out_stateful    ), priority=110  , match=(reg0[1] == 1
>> && reg0[13] == 0 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked
>> = 0; ct_mark.allow_established = reg0[20]; ct_label.acl_id =
>> reg2[16..31]; ct_label.network_function_group = 1;
>> ct_label.network_function_group_id = reg0[22..29]; }; next;)
>>>   table=10(ls_out_stateful    ), priority=110  , match=(reg0[1] == 1
>> && reg0[13] == 1 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked
>> = 0; ct_mark.allow_established = reg0[20]; ct_mark.obs_stage =
>> reg8[19..20]; ct_mark.obs_collector_id = reg8[8..15];
>> ct_label.obs_point_id = reg9; ct_label.acl_id = reg2[16..31];
>> ct_label.network_function_group = 1; ct_label.network_function_group_id
>> = reg0[22..29]; }; next;)
>>>   table=10(ls_out_stateful    ), priority=100  , match=(reg0[1] == 1
>> && reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0;
>> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31];
>> ct_label.network_function_group = 0; ct_label.network_function_group_id
>> = 0; }; next;)
>>>   table=10(ls_out_stateful    ), priority=100  , match=(reg0[1] == 1
>> && reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0;
>> ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20];
>> ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9;
>> ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 0;
>> ct_label.network_function_group_id = 0; }; next;)
>>>   table=10(ls_out_stateful    ), priority=0    , match=(1), action=(next;)
>>> 
>>> out_network_function: A new stage that has priority 99 rules to
>> redirect packet by setting outport to the NF “outport” (or its child
>> port) based on the NFG id set by the prior ACL stage, and then injecting
>> back to ingress. Priority 100 rules ensure that when the packets are
>> going to NF ports, they are not redirected again.
>>> Priority 1 rule: if reg8[21] is set, but the NF port (or child port)
>> is not present on this LS, drop packets.
>>> 
>>>   table=11(ls_out_network_function), priority=100  , match=(outport ==
>> "nfp1"), action=(next;)
>>>   table=11(ls_out_network_function), priority=100  , match=(outport ==
>> "nfp2"), action=(next;)
>>>   table=11(ls_out_network_function), priority=100  , match=(reg8[21]
>> == 1 && eth.mcast), action=(next;)
>>>   table=11(ls_out_network_function), priority=99   , match=(reg8[21]
>> == 1 && reg8[22] == 1 && reg0[22..29] == 123), action=(outport = "nfp2";
>> reg8[23] = 1; next(pipeline=ingress, table=29);)
>>>   table=11(ls_out_network_function), priority=1    , match=(reg8[21]
>> == 1), action=(drop;)
>>>   table=11(ls_out_network_function), priority=0    , match=(1),
>> action=(next;)
>>> 
>>> 
>>> in_l2_lkup: As described earlier, the priority 100 rule will forward
>> these packets.
>>> 
>>> Then the same packet comes out from nfp1 and goes through the ingress
>> processing where the outport gets set to p1. The egress pipeline
>> out_pre_acl priority 110 rule described earlier, matches against inport
>> as nfp1 and directly jumps to the stage after out_network_function. Thus
>> the packet is not redirected again.
>>> 
>>> 4.2.2 Response processing
>>> -------------------------
>>> in_acl_eval: High priority rules that allow response and related
>> packets to go through have been enhanced to also copy CT label NFG bit
>> into reg8[21].
>>> 
>>>   table=8(ls_in_acl_eval), priority=65532, match=(!ct.est && ct.rel
>> && !ct.new && !ct.inv && ct_mark.blocked == 0), action=(reg0[17] = 1;
>> reg8[21] = ct_label.network_function_group; reg8[16] = 1; ct_commit_nat;)
>>>   table=8 (ls_in_acl_eval), priority=65532, match=(ct.est && !ct.rel
>> && !ct.new && !ct.inv && ct.rpl && ct_mark.blocked == 0),
>> action=(reg0[9] = 0; reg0[10] = 0; reg0[17] = 1; reg8[21] =
>> ct_label.network_function_group; reg8[16] = 1; next;)
>>> 
>>> in_network_function: Priority 99 rule matches on the nfg_id in
>> ct_label and sets the outport to the NF “inport”.
>>> Priority 100 rule forwards all packets to NF ports to the next table.
>>>   table=22(ls_in_network_function), priority=99   , match=(reg8[21] ==
>> 1 && reg8[22] == 0 && ct_label.network_function_group_id == 123),
>> action=(outport = "nfp1"; output;)
>>> 
>>> 
>>> 5. Cross-host Traffic for VLAN Network
>>> ======================================
>>> For overlay subnets, all cross-host traffic exchanges are tunneled. In
>> the case of VLAN subnets, there needs to be special handling to
>> selectively tunnel only the traffic to or from the NF ports.
>>> Take the example of a from-lport ACL. Packets from p1 to p2, gets
>> redirected to nfp1 in host1. If this packet is simply sent out from
>> host1, the physical network will directly forward it to host2 where VM2
>> is. So, we need to tunnel the redirected packets from host1 to host3.
>> Now, once the packets come out of nfp2, if host3 sends the packets out,
>> the physical network would learn p1's MAC coming from host3. So, these
>> packets need to be tunneled back to host1. From there the packet would
>> be forwarded to VM2 via the physical network.
>>> 
>>>       -----                  -----                  --------
>>>      | VM2 |                | VM1 |                | NF VM  |
>>>       -----                  -----                  --------
>>>        / \                     |                    / \   |
>>>         | (7)                  |  (1)             (3)|    |(4)
>>>         |                     \ /                    |   \ /
>>>   --------------        --------------   (2)    ---------------
>>>  |      p2      |  (6) |      p1      |______\ |   nfp1  nfp2  |
>>>  |              |/____ |              |------/ |               |
>>>  |    host2     |\     |     host1    |/______ |     host3     |
>>>  |              |      |              |\------ |               |
>>>   --------------        --------------   (5)    --------------
>>> 
>>> The above figure shows the request packet path for a from-lport ACL.
>> Response would follow the same path in reverse direction.
>>> 
>>> To achieve this, the following would be done:
>>> 
>>> On host where the ACL port group members are present (host1)
>>> —-----------------------------------------------------------
>>> REMOTE_OUTPUT (table 42):
>>> Currently, it tunnels traffic destined to all non-local overlay ports
>> to their associated hosts. The same rule is now also added for traffic
>> to non-local NF ports. Thus the packets from p1 get tunneled to host 3.
>>> 
>>> On host with NF (host3) forward packet to nfp1
>>> —----------------------------------------------
>>> Upon reaching host3, the following rules come into play:
>>> PHY_TO_LOG (table 0):
>>> Ppriority 100: Existing rule - for each geneve tunnel interface on the
>> chassis, copies info from header to inport, outport, metadata registers.
>> Now the same rule also stores the tun intf id in a register (reg5[16..31]).
>>> 
>>> CHECK_LOOPBACK (table 44)
>>> This table has a rule that clears all the registers. The change is to
>> skip the clearing of reg5[16..31].
>>> 
>>> Logical egress pipeline:
>>> 
>>> ls_out_stateful priority 120: If the outport is an NF port, copy
>> reg5[16..31] (table0 had set it) to ct_label.tun_if_id.)
>>> 
>>>   table=10(ls_out_stateful    ), priority=120  , match=(outport ==
>> "nfp1" && reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0;
>> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31];
>> ct_label.tun_if_id = reg5[16..31]; }; next;)
>>>   table=10(ls_out_stateful    ), priority=120  , match=(outport ==
>> "nfp1" && reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0;
>> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31];
>> ct_mark.obs_stage = reg8[19..20]; ct_mark.obs_collector_id =
>> reg8[8..15]; ct_label.obs_point_id = reg9; ct_label.tun_if_id =
>> reg5[16..31]; }; next;)
>>> 
>>> The above sequence of flows ensure that if a packet is received via
>> tunnel on host3, with outport as nfp1, the tunnel interface id is
>> committed to the ct entry in nfp1's zone.
>>> 
>>> On host with NF (host3) tunnel packets from nfp2 back to host1
>>> —--------------------------------------------------------------
>>> When the same packet comes out of nfp2 on host3:
>>> 
>>> LOCAL_OUTPUT (table 43)
>>> When the packet comes out of the other NF port (nfp2), following two
>> rules send it back to the host that it originally came from:
>>> 
>>> Priority 110: For each NF port local to this host, following rule
>> processes the
>>> packet through CT of linked port (for nfp2, it is nfp1):
>>>   match: inport==nfp2 && RECIRC_BIT==0
>>>   action: RECIRC_BIT = 1, ct(zone=nfp1’s zone, table=LOCAL), resubmit
>> to table 43
>>> 
>>> Priority 109: For each {tunnel_id, nf port} on this host, if the
>> tun_if_id in ct_label matches the tunnel_id, send the recirculated
>> packet using tunnetl_id:
>>>   match: inport==nfp1 && RECIRC_BIT==1 && ct_label.tun_if_id==<tun-id>
>>>   action: tunnel packet using tun-id
>>> 
>>> If p1 and nfp1 happen to be on the same host, the tun_if_id would not
>> be set and thus none of the priority 109 rules would match. It would be
>> forwarded the usual way matching the existing priority 100 rules in
>> LOCAL_TABLE.
>>> 
>>> Special handling of the case where NF responds back on nfp1, instead
>> of forwarding packet out of nfp2:
>>> For example, a SYN packet from p1 got redirected to nfp1. Then the NF,
>> which is a firewall VM, drops the SYN and sends RST back on port nfp1.
>> In this case, looking up in the linked port (nfp2) ct zone will not give
>> anything. The following rule uses ct.inv to identify such scenarios and
>> uses nfp1’s CT zone to send the packet back. To achieve this, following
>> 2 rules are installed:
>>> 
>>> in_network_function:
>>> Priority 100 rule that allows packets incoming from NF type ports, is
>> enhanced with additional action to store the tun_if_id from ct_label
>> into reg5[16..31].
>>>   table=22(ls_in_network_function), priority=100  , match=(inport ==
>> "nfp1"), action=(reg5[16..31] = ct_label.tun_if_id; next;)
>>> 
>>> LOCAL_OUTPUT (table 43)
>>> Priority 110 rule: for recirculated packets, if ct (of the linked
>> port) is invalid, use the tun id from reg5[16..31] to tunnel the packet
>> back to host1 (as CT zone info has been overwritten in the above 110
>> priority rule in table 42).
>>>       match: inport==nf1 && RECIRC_BIT==1 && ct.inv &&
>> MFF_LOG_TUN_OFPORT==<tun-id>
>>>       action: tunnel packet using tun-id
>>> 
>>> 
>>> 6. NF insertion across logical switches
>>> =======================================
>>> If the port-group where the ACL is being applied has members across
>> multiple logical switches, there needs to be a NF port pair on each of
>> these switches.
>>> The NF VM will have only one inport and one outport. The CMS is
>> expected to create child ports linked to these ports on each logical
>> switch where port-group members are present.
>>> The network-function entity would be configured with the parent ports
>> only. When CMS creates the child ports, it does not need to change any
>> of the NF, NFG or ACL config tables.
>>> When northd configures the redirection rules for a specific LS, it
>> will use the parent or child port depending on what it finds on that LS.
>>>                                      --------
>>>                                     | NF VM  |
>>>                                      --------
>>>                                      |      |
>>>           -----                      |      |              -----
>>>          | VM1 |                    nfp1   nfp2           | VM2 |
>>>           ---- -   |     |         --------------          -----    |
>>      |
>>>             |      |     |        |    SVC LS    |          |       |
>>      |
>>>           p1|  nfp1_ch1  nfp2_ch1  --------------         p3|
>> nfp1_ch2  nfp2_ch2
>>>           --------------------                            
>> --------------------
>>>          |         LS1        |                           |        
>> LS2        |
>>>           --------------------                            
>> --------------------
>>> 
>>> In this example, the CMS created the parent ports for the NF VM on LS
>> named SVC LS. The ports are nfp1 and nfp2. The CMS configures the NF
>> using these ports:
>>> ovn-nbctl nf-add nf1 nfp1 nfp2
>>> ovn-nbctl nfg-add nfg1 123 inline nf1
>>> ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2'
>> allow-related nfg1
>>> 
>>> The port group to which the ACL is applied is pg1 and pg1 has two
>> ports: p1 on LS1 and p3 on LS2.
>>> The CMS needs to create child ports for the NF ports on LS1 and LS2.
>> On LS1: nfp1_ch1 and nfp2_ch1. On LS2: nfp1_ch2 and nfp2_ch2
>>> 
>>> When northd creates rules on LS1, it would use nfp1_ch1 and nfp2_ch1.
>>> 
>>>   table=22(ls_in_network_function), priority=100  , match=(inport ==
>> "nfp2_ch1"), action=(reg5[16..31] = ct_label.tun_if_id; next;)
>>>   table=22(ls_in_network_function), priority=99   , match=(reg8[21] ==
>> 1 && reg8[22] == 1 && reg0[22..29] == 123), action=(outport =
>> "nfp1_ch1"; output;)
>>> 
>>> When northd is creating rules on LS2, it would use nfp1_ch2 and nfp2_ch2.
>>>   table=22(ls_in_network_function), priority=100  , match=(inport ==
>> "nfp2_ch2"), action=(reg5[16..31] = ct_label.tun_if_id; next;)
>>>   table=22(ls_in_network_function), priority=99   , match=(reg8[21] ==
>> 1 && reg8[22] == 1 && reg0[22..29] == 123), action=(outport =
>> "nfp1_ch2"; output;)
>>> 
>>> 
>>> 7. Health Monitoring
>>> ====================
>>> The LB health monitoring functionality has been extended to support
>> NFs. Network_Function_Group has a list of Network_Functions, each of
>> which has a reference to network_Function_Health_Check that has the
>> monitoring config. There is a corresponding SB service_monitor
>> maintaining the online/offline status. When status changes, northd picks
>> one of the “online” NFs and sets it in the network_function_active field
>> of NFG. The redirection rule in LS uses the ports from this NF.
>>> 
>>> Ovn-controller performs the health monitoring by sending ICMP echo
>> request with source IP and MAC from NB global options “svc_monitor_ip4”
>> and “svc_monitor_mac”, and destination IP and MAC from new NB global
>> options “svc_monitor_ip4_dst” and “svc_monitor_mac_dst”. The sequence
>> number and id are randomly generated and stored in service_mon. The NF
>> VM forwards the same packet out of the other port. When it comes out,
>> ovn-controller matches the sequence number and id with stored values and
>> marks online if matched.
>>> 
>>> V1:
>>>   - First patch.
>>> 
>>> V2:
>>>   - Rebased code.
>>>   - Added "mode" field in Network_function_group table, with only allowed
>>>     value as "inline". This is for future expansion to include
>> "mirror" mode.
>>>   - Added a flow in the in_network_function and out_network_function
>> table to
>>>     skip redirection of multicast traffic.
>>> 
>>> V3:
>>>  - Rebased code.
>>> 
>>> V4:
>>>  - Rebased code.
>>> 
>>> V5:
>>>  - FIxed issue of packet duplication for NF on overlay. Added a check
>>>    in physical/controller.c so that only for localnet case the new flows
>>>    in table 44 (that tunnel packets back to source host based on tunnel
>>>    i/f stored in CT) get installed.
>>>  - Fixed packet drop issue for NF on overlay. This happens when port1
>> is on
>>>    host1, and port2 & NF are on host2; packets sent from port2 to port1
>>>    are redirected to NF port as intended but it is not reaching host2.
>>>    This is because OVS drops packets that are sent to same port on which
>>>    they were received (in this case packet received on host1 on
>>>    a tunnel interface and being forwarded to the same tunnel). Updated
>>>    the table 43 rule for remote overlay ports to clear in_port to get
>>>    around this.
>>>  - Added unit tests in ovn.at to cover various multi-chassis scenarios
>>>    for both VLAN and overlay cases.
>>> 
>>> V6:
>>> - Addressed review comments from Numan Siddique.
>>> - Updated ovn-northd.8.xml and ovn-nbctl.8.xml.
>>> - Unit test added for health monitoring of network function
>>> - System test added for single chassis case. Multichassis ST to be
>> added later.
>>>   There were already the multi-chassis cases added in ovn.at.
>>> 
>>> V7:
>>> - Rebased
>>> - Fixed unit test failures resulting from new IC SB service monitor
>> feature
>>> 
>>> V8:
>>> - Addressed review comments from Dumitru Ceara. Main changes are
>> listed below.
>>> - NFG unique id is not generated internally, but is configured by CMS.
>>> - NFG register changed from reg5[0…7] to reg0[22..29]
>>> - IPv6 support for health probes
>>> - Update to NEWs and various xml files for proper documentation.
>>> 
>>> V9:
>>> - Addressed new review comments from Dumitru Ceara. Main changes are
>> as below.
>>> - Changed isRoot from true to false for Network_Function_Health_Check.
>>> - Changed the behaviour of --may-exist from no-op to update.
>>> - In nbctl commands, replaced "network-function" with "nf" and
>>>   "network-function-group" with "nfg".
>>> - The change related to skipping the clearing of tunnel id part of
>> reg5 is now
>>>   being done only for NF datapaths.
>>> - Fixed the intermittent test failures earlier being caused by patch5, by
>>>   sorting the svc_macs.
>>> - Fixed backward compatibility with respect to "type" field in
>> service_monitor.
>>> - New system test added for the case without health_check.
>>> - Replaced the scoring mechanism for NF election with a deterministic
>> method.
>>> - Moved some test and xml changes to other patches to align with the
>> changes.
>>> - Added a few TODO items. Split the NEWs change across multiple patches.
>>> 
>> 
>> Hi Sragdhara, Naveen, Karthik, Numan
>> 
>> Thank you for this new revision and for the reviews.  Also thanks for
>> bearing with us while trying to get this new feature in OVN!  I know
>> it's been a challenge.
>> 
>> Overall I didn't see any major issues in the series.  There's quite a
>> lot of code being added/changed so I'm quite sure we missed some
>> things.  However, we have quite a lot of time to address any of those
>> in the remainder of the 26.03 development cycle.
>> 
>> I did apply some small changes to the last 3 patches of the series,
>> mostly style issues and then pushed the series to main.
>> 
>> But I also have a few items I'd like to try to follow up on.  It would
>> be great if you could set some time aside to look into that.  Please
>> see below, listed for the corresponding patches.
>> 
>>> Sragdhara Datta Chaudhuri (5):
>>>   ovn-nb: Network Function insertion OVN-NB schema changes.
>>>   ovn-nbctl: Network Function insertion commands.
>>>   northd, tests: Network Function insertion logical flow programming.
>>>   controller, tests: Network Function insertion tunneling of cross-host
>>>     VLAN traffic.
>> 
>> This patch changes the PHY_TO_LOG() table to automatically load the
>> "tun intf id" for decapsulated packets into a "random" register,
>> reg5[16..31].  I know we also define a logical field for these 16 bits
>> of reg5:
>> 
>> #define MFF_LOG_TUN_OFPORT MFF_REG5   /* 16..31 of the 32 bits */
>> 
>> But that seems a bit error prone.  We also (for NF traffic) skip
>> clearing these bits when going from the logical ingress pipeline to
>> the logical egress one.
>> 
>> I think it might make sense to have a dedicated (set of) registers
>> that are preserved between logical pipelines.  The problem is we
>> currently are out of OVS registers that we can use:
>> 
>> /* Logical fields.
>> *
>> * These values are documented in ovn-architecture(7), please update the
>> * documentation if you change any of them. */
>> #define MFF_LOG_DATAPATH MFF_METADATA /* Logical datapath (64 bits). */
>> #define MFF_LOG_FLAGS      MFF_REG10  /* One of MLF_* (32 bits). */
>> #define MFF_LOG_DNAT_ZONE  MFF_REG11  /* conntrack dnat zone for gateway
>> router
>>                                       * (32 bits). */
>> #define MFF_LOG_SNAT_ZONE  MFF_REG12  /* conntrack snat zone for gateway
>> router
>>                                       * (32 bits). */
>> #define MFF_LOG_CT_ZONE    MFF_REG13  /* Logical conntrack zone for lports
>>                                       * (0..15 of the 32 bits). */
>> #define MFF_LOG_ENCAP_ID   MFF_REG13  /* Encap ID for lports
>>                                       * (16..31 of the 32 bits). */
>> #define MFF_LOG_INPORT     MFF_REG14  /* Logical input port (32 bits). */
>> #define MFF_LOG_OUTPORT    MFF_REG15  /* Logical output port (32 bits). */
>> 
>> I had a quick glance at OVS' code and I _think_ there should be no
>> major issue if we add a nother xxreg (i.e., 2 extra xregs / 4 extra regs).
>> 
>> We're also running low on available bits in the logical flags register
>> (that's also preserved between ingress and egress) so having some extra
>> register storage space to work with would make our lives easier.  The
>> impact of adding new registers should be relatively minimal (I guess) as
>> it would only potentially increase the upcall handling time but likely
>> with a marginal amount.
>> 
>> I'm CC-ing Ilya to see if such a change would be acceptable in OVS.
>> 
>>>   northd, controller, tests: Network Function Health monitoring.
>> 
>> The new system test in this patch blindly kills any "random" tcpdump
>> instance that might be running on the host where the tests are being
>> executed.  We should probably improve this test to use
>> NETNS_START_TCPDUMP() or something similar.
>> 
>> Also, with this last patch, I saw the "622: Network function packet
>> flow - outbound" test fail in GitHub CI a couple of times.  I couldn't
>> reproduce the problem locally and after rebase it seems to pass. But
>> it would be great if you could monitor this test and fix it in the
>> future if it starts failing again.
>> 
>> Thanks again for all the hard work on this new feature!
>> 
>> Best regards,
>> Dumitru
>> 
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to