Hi Numan, Daniel, Lucas,

Thank  you so much for the feedback and providing your inputs.
I went the through the bug that was being referenced and Numan's inputs and 
following is my take on it.
Its a lengthy email, if you just want to look at the suggestions for fix then 
please scroll to the second section :).

Just taking a summary of the input/feedback points and the ones that are most 
pertinent to the discussion.

a. "Hard to colocate the the gateway ports of External port and Router port"
[ANKUR]:  Looking at https://bugzilla.redhat.com/show_bug.cgi?id=1829762.
I do not see a reference to an attempt for using "HA Chassis group" (or may be 
i missed it).
OVN has a primitive which allows a HA chassis cluster to be shared across 
multiple entities. I like this primitive, i think it can be easily used here.
And the whole purpose of having a separate table for HA chassis group is so 
that it can be shared.
Other than openstack implementation, if there are other hindrances in adopting 
it then we should try to address the same in OVN.

b. " if you see the logical swithes ls_vlan4 and ls_vlan5 in this example don't 
provide any N/S connectivity and hence it may not be accurate
to consider them as having gateway ports."
[ANKUR]: Its just the terminology and its interpretation, i.e what is our 
interpretation of NS traffic and gateway ports.
>From the router's perspective, all we are trying to say is that for a 
>distributed router port, if we receive traffic from non OVN endpoint
(i.e entry point to chassis is localnet port), then we need to have a specific 
designated chassis which is the entry point for ARP and Unicast traffic.

Now, the existing gateway port primitive does exactly what i have described 
above. However, if we want to use some other
term for discussion/documentation, i am totally fine with it.

c. "ideally there should be no external entity sharing the IP address from 
these subnets and using the same VLAN. Although practically it's possible to do 
[ANKUR]: Even in an ideal world its a valid requirement. Especially for vlan 
backed subnets, it is a valid and practical use case where OVN does not own
the whole ip space of a CIDR. For example, there could be baremetal endpoints 
or there is a transition phase of migrating virtual non ovn endpoints  
to ovn, or to avoid the pain of migration there are both ovn and non ovn 
virtual endpoints in same subnet, similarly there could be 
stretched subnet across sites (hybrid cloud use case), where both sites (and 
both of them need not be OVN run)
manage non overlapping ip pools for their respective endpoints.

d.  " If there is such a requirement, I think CMS should consider using 
'external' port for this"
[ANKUR]: Use of external port is that we leverage on DHCP service by OVN for 
non ovn endpoints.
Which is definitely good to have, but not mandatory (infact it MUST not be 
A deployment could have its own external DHCP server even for ovn endpoints or
external dhcp server for non ovn endpoints or non ovn endpoints have static IPs.
Neither of which mandates that non ovn endpoint has to be marked as external.
Infact, for hybrid connectivity cases like (like stretched subnets across 
sites/AZs as mentioned in c. above),
mandating DHCP service for whole CIDR by OVN would be non desirable.
TOR gateway is also a non ovn endpoint :).


There are multiple ways the scenario called out in this thread can be addressed.

a. Remove the "ls_in_external_port" stage in pipeline. For example, in the 
topology mentioned here:

I see following flows in "ls_in_external_port":
  table=18(ls_in_external_port), priority=100  , match=(inport == 
"ls-underlay-physnet1" && eth.src == 50:6b:8d:cc:17:7d && 
!is_chassis_resident("lsp-external") && arp.tpa == && arp.op == 
1), action=(drop;)
  table=18(ls_in_external_port), priority=100  , match=(inport == 
"ls-underlay-physnet1" && eth.src == 50:6b:8d:cc:17:7d && 
!is_chassis_resident("lsp-external") && nd_ns && ip6.dst == 
{fe80::200:1ff:fe01:204, ff02::1:ff01:204} && nd.target == 
fe80::200:1ff:fe01:204), action=(drop;)
  table=18(ls_in_external_port), priority=0    , match=(1), action=(next;)

My opinion/understanding is that it is on router to decide where it wants 
router port ARP to be resolved. A logical switch port cannot/should not decide, 
where it want the attached router port ip to be resolved.
If we do not have above flow, then it implies that significance of HA_Chassis 
group for external port is to call out which chassis will behave like a DHCP 
server for a given external endpoint.
For regular datapath operations, packet flow stays consistent with any 
external/non ovn endpoint.

If at all the stage was meant to have additional work, then atleast the flows 
at priority=100 should not be there.

P.N: I have not tried removing the above flow and testing end to end. It is 
possible, some additional changes could be needed.
Will be identify the same, once i give it a try.

b. If we still want to keep the pipeline stage and priority=100 flows, then 
change the is_chassis_resident check to the attached router port, rather than 
external logical switch port.

c. I am assuming that HA_Chassis_group was not used by openstack. If that is 
correct, then openstack can/should use Ha_Chassis_group and attach the same 
group to both external LSP and Logical Router.


IMO, the patch discussed in this thread and b. and c. above are all workarounds.
a. above seems to be a no workaround and clean solution (will give it a try and 
validate that it is working fine).
However, i will leave it open for further discussion.


I do not intend to block/delay the patch/fix.
However, i am strongly opinionated against having a solution that breaks the 
fundamental of having a consistent ARP cache.
I would definitely recommend that we avoid taking this path.

Please let me know your thoughts.
Please feel free to call out, if i missed something, i will be happy to discuss 

Appreciate the discussions and inputs.


From: dev <ovs-dev-boun...@openvswitch.org> on behalf of Ankur Sharma 
Sent: Saturday, July 25, 2020 10:09 PM
To: Numan Siddique <num...@ovn.org>; Daniel Alvarez Sanchez 
<dalva...@redhat.com>; Lucas Alvares Gomes Martins <lmart...@redhat.com>
Cc: d...@openvswitch.org <d...@openvswitch.org>
Subject: Re: [ovs-dev] [PATCH ovn v2] Fix the routing for external logical 
ports of bridged logical switches. 
Hi Numan,

Sorry, was a little out of sync with mailing list this week.
I will get back to you sometime early next week.
But yes, using chassis mac for router port ip is not correct ?.

Let me check the scenario you have called out and will try to propose 

From: Numan Siddique <num...@ovn.org>
Sent: Wednesday, July 22, 2020 12:24 PM
To: Ankur Sharma <ankur.sha...@nutanix.com>; Daniel Alvarez Sanchez 
<dalva...@redhat.com>; Lucas Alvares Gomes Martins <lmart...@redhat.com>
Cc: d...@openvswitch.org <d...@openvswitch.org>
Subject: Re: [ovs-dev] [PATCH ovn v2] Fix the routing for external logical 
ports of bridged logical switches.

On Mon, Jul 13, 2020 at 11:56 AM Numan Siddique 
<num...@ovn.org<mailto:num...@ovn.org>> wrote:
+Daniel Alvarez Sanchez<mailto:dalva...@redhat.com>
+Lucas Alvares Gomes Martins<mailto:lmart...@redhat.com>

On Mon, Jul 13, 2020 at 11:29 AM Ankur Sharma 
<ankur.sha...@nutanix.com<mailto:ankur.sha...@nutanix.com>> wrote:
Hi Numan,

Thank you so much for the details.

Hi Ankur,

Thanks for the detailed email. Your analysis is correct. I have few comments.
Please see below.

Hi Ankur,

Did you get any chance to look into my comments ? Just checking.


Following is my analysis on the feature:
a. Port of type EXTERNAL means that we create a logical switch port in OVN 
without a VIF backing.
b. i.e the physical port corresponding to external port is NOT behind OVN 
managed vswitch (for SRIOV specific case it is not behind any vswitch, since 
PNIC sends the packet directly yo guest driver).
    Just for the sake of further discussion we will refer the PHYSICAL PORT/VM 
corresponding to external port as SRIOV PORT/VM.
c. Now from OVN perspective, packets from SRIOV VM will enter the OVN flow 
pipeline via localnet port (on the Active HA Chassis).
d. For DHCP requests, the logical switch pipeline responds.
e. Now, we were trying to get the routing working.

Based on the understanding mentioned above, i tried following scenario and 
observed following:
a. SRIOV VM could talk to endpoints on other LS attached to LR via Active 
gateway/HA chassis.

router 2bd894b1-81a0-4095-9c58-0472aa5de19d (router) port router-to-gvlan1 mac: 
"00:00:01:01:02:03" networks: [" 
 port router-to-underlay mac: "00:00:01:01:02:...

a. Since packets are coming through localnet port, hence from Router 
perspective, it is an external endpoint, i.e NS.
b. Now for such cases, Router is designed to respond to ARP requests ONLY on 
gateway chassis and since we have centralized a chassis for NS now, hence from 
the Gateway chassis there is no MAC replacement.
c. Based on a. and b. above, i attached the same gateway chassis to LRP 
(Logical Router Port) connecting to SRIOV PORT's LS.
d. And routing across LR connected Switches worked fine.
e. Mac table on TOR and ARP table on SRIOV VM was also fine, i.e ARP cache had 
<router_port_ip, router_port_mac> and Mac table had an entry for router port 

a. For Routing, traffic from localnet ports has to enter via gateway node.
b. Hence, the Router port which connects to SR IOV VM Logical Switch has to be 
on same gateway node as corresponding external port (Which can be achieved 
easily by attaching same HA chassis group to both).

If you see the BZ here - 
  Openstack networking ovn folks want this to be decoupled. According to them, 
it will be hard to maintain the logic to collocate the
ha chassis group of external ports with the gateway chassis ports. I have CC'd 
them to get more comments from them.

There is another problem:
Let's say we have 3 logical switches - ls_vlan4 ( 
 , ls_vlan5 ( 
 and ls_public ( 
 connected to a logical router lr0. And all are VLAN bridge networks
but only ls_public provides N/S traffic. ls_vlan4 and ls_vlan5 are tenant 
bridged networks. I think this is a very common use case.

In this scenario, you will have one l3gateway port - lr0-public  connecting to 
logical switch ls_public. So there will be a set of gateway_chassis created for 

To solve this issue, we can also make the router ports - lr0-vlan4 and 
lr0-vlan5 (connecting to logical switches ls_vlan4 and ls_vlan5) as gateway 
ports by (once we enhance OVN
to support multiple gateway ports), but if you see the logical swithes ls_vlan4 
and ls_vlan5 in this example don't provide any N/S connectivity and hence it 
may not be accurate
to consider them as having gateway ports.

And the subnets (here 
 of ls_vlan4 and ls_vlan5 are internal to OVN and ideally there should be no 
external entity sharing the IP address from these subnets and using the same
VLAN. Although practically it's possible to do so. If there is such a 
requirement, I think CMS should consider using 'external' port for this. I had 
noticed the issue you mentioned below in (c) long time back,
I didn't think to bother much because of the reason I mentioned now. (i.e these 
subnets are internal to OVN)

That's why I think it may be OK to take the approach of this patch. i.e replace 
the chassis mac with the actual router mac in the ARP replies for the router 

In the case of ls_public (which provides N/S connectivity) I would expect an 
external router to handle routing for this. Please correct me if I'm wrong.

Let me know your thoughts.


a. Current state is slightly restrictive, because we support only ONE l3gateway 
port per router. Which means that SRIOV Logical Switch has to be connected to 
physical network gateway as well.
i.e SRIOV VMs have to on the same logical switch which has the gateway for all 
the external traffic.

b. A more generic and complete implementation would be to enhance  OVN to 
support multiple gateway ports in a distributed router.

c. I did observe 1 bug, where we are NOT blocking ARP requests for router port 
via localnet port in the absence of a gateway port configuration. Absence of 
guard, leads to multiple ARP responses and duplicate ICMP reply packets.I will 
submit a fix for this.

Please let me know your thoughts on the same and please feel free to call out, 
if i missed something.



From: Numan Siddique <num...@ovn.org<mailto:num...@ovn.org>>
Sent: Friday, July 10, 2020 6:18 AM
To: Ankur Sharma <ankur.sha...@nutanix.com<mailto:ankur.sha...@nutanix.com>>
Cc: d...@openvswitch.org<mailto:d...@openvswitch.org> 
Subject: Re: [ovs-dev] [PATCH ovn v2] Fix the routing for external logical 
ports of bridged logical switches.

On Fri, Jul 10, 2020 at 4:41 PM Numan Siddique 

On Fri, Jul 10, 2020 at 12:45 AM Ankur Sharma 
Hi Numan, Daniel,

I have not looked at the patch yet. But replacing arp.sha with chassis mac is 
not the correct approach from networking perspective.
Chassic mac is NOT meant to replace the IP-MAC binding of router port, it is 
ONLY meant to ensure that for EW traffic a distributed router port mac does not 
show on multiple TOR ports.
Both for NS and EW, ARP resolution for router port ip should be responded with 
router port mac ONLY.

I am trying to understand the use case and we can discuss an alternative in 
this thread.
Can you share the repro steps, i can try the same and will try to come up with 
an alternative.

Hi Ankur,

In this particular case, the originator of the traffic is from a logical port 
of type 'external'.

One example of using external ports is for SRIOV VMs. The traffic from these 
VMs are not seen
by the local ovn-controller. And we want to provide E-W routing and other OVN 
services like DHCP, DNS etc
to these VMS.

So one of the controller nodes (which can receive the traffic sent by these 
SRIOV VMs) binds these external ports
and it responds to the ARP requests and does the routing for it.

To reproduce the issue, can you please use own-fake-multi node setup from here 
? - 

The steps are:
1. Build OVN containers.
    ./ovn_cluster.sh build

Please note, before the 'start', you need to start openvswitch on the host.


2. ./ovn_cluster.sh start

3. sudo ip netns exec sw0-ext1 ping -c3
PING ( 56(84) bytes of data.
64 bytes from 
 icmp_seq=1 ttl=63 time=0.074 ms
64 bytes from 
 icmp_seq=1 ttl=63 time=0.086 ms (DUP!)
64 bytes from 
 icmp_seq=1 ttl=63 time=0.089 ms (DUP!)
64 bytes from 
 icmp_seq=2 ttl=63 time=0.105 ms
64 bytes from 
 icmp_seq=2 ttl=63 time=0.120 ms (DUP!)
64 bytes from 
 icmp_seq=2 ttl=63 time=0.124 ms (DUP!)
64 bytes from 
 icmp_seq=3 ttl=63 time=0.145 ms

--- ping statistics ---
3 packets transmitted, 3 received, +4 duplicates, 0% packet loss, time 2036ms
rtt min/avg/max/mdev = 0.074/0.106/0.145/0.023 ms

You will see a few DUP packets.

$sudo ip netns exec sw0-ext1 ping -c3
PING ( 56(84) bytes of data.
64 bytes from 
 icmp_seq=1 ttl=254 time=0.298 ms
64 bytes from 
 icmp_seq=1 ttl=254 time=0.358 ms (DUP!)
64 bytes from 
 icmp_seq=1 ttl=254 time=0.384 ms (DUP!)
64 bytes from 
 icmp_seq=2 ttl=254 time=0.598 ms
64 bytes from 
 icmp_seq=2 ttl=254 time=0.594 ms (DUP!)
64 bytes from 
 icmp_seq=2 ttl=254 time=0.656 ms (DUP!)
64 bytes from 
 icmp_seq=3 ttl=254 time=0.715 ms

--- ping statistics ---
3 packets transmitted, 3 received, +4 duplicates, 0% packet loss, time 2088ms
rtt min/avg/max/mdev = 0.298/0.514/0.715/0.152 ms

In the setup, sw0-ext1 represents an external logical switch port. If you see 
the script here [1],
sw0-ext1 is claimed by ovn-chassis-1 node.

And when sw0-ext1 sends ARP request to, the arp request is handled by 
and the reply has  - arp.sha = router mac  and eth.src = chassis mac of 

And hence sw0-ext1 sends ping packets with the destination mac of router port  
IP -
And all the 3 nodes reply - ovn-chassis-1, ovn-chassis-2 and ovn-gw-1.

I'm not sure if you have played with ovn-fake-multinode before. If you run 
"docker ps", you will see a docker
container representing each chassis.

Please do "docker exec -it ovn-central bash" and run a few ovn-nbctl/ovn-sbctl 
commands to know more.

You can also see the script in [1] and reproduce the issue in your setup.

I didn't find any other way to solve this issue. Also in normal situations 
where external ports are not used,
any arp request to the router IP from bridge logical switch ports don't leave 
the chassis since the local
ovn-controller itself replies. This is for tenant bridged VLAN logical 
switches. I guess for provider VLAN networks
(which provide the N/S traffic, I guess the arp request for the router port can 
come from the physical network).

[1] - 


Sent: Thursday, July 9, 2020 2:11 AM
Cc: Numan Siddique 
 Daniel Alvarez 
 Ankur Sharma 
Subject: [PATCH ovn v2] Fix the routing for external logical ports of bridged 
logical switches.

From: Numan Siddique 

Routing for external logical ports is broken if these ports belonged
to bridged logical switches (with localnet port) and 'ovn-chassis-mac-mappings'
is configured. External logical ports are those which are external to OVN,
but there is a logical port for it and it is claimed by one of the HA chassis.
The claimed chassis provides routing and other native OVN serices like dhcp and 

When the external port sends ARP request for the router IP, the claimed chassis
replies for the ARP request, but the arp.sha is set to the actual router mac 
of the chassis mac. This causes the traffic from external port VM/container to 
be handled
incorrectly. A ping to the router ip, is replied by all the chassis which can 
see this
packet instead of just the claimed HA chassis.

To fix this, this patch does 2 things.

1. In the table - OFTABLE_LOG_TO_PHY (65), it adds a 160 priority flow to
   modify the ARP packets arp.sha to store the chassis mac.

2. And when the packet destined to the chassis mac is received, it replaces the
   chassis mac with the actual router mac in table 0.

 Daniel Alvarez 
CC: Ankur Sharma 
Signed-off-by: Numan Siddique 

v1 -> v2
  * Rebased.

 controller/chassis.c  |  48 ++++++++------
 controller/chassis.h  |   2 +
 controller/physical.c | 145 +++++++++++++++++++++++++++++++++++++++---
          | 131 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 299 insertions(+), 27 deletions(-)

diff --git a/controller/chassis.c b/controller/chassis.c
index eec270ea39..25146d75f2 100644
--- a/controller/chassis.c
+++ b/controller/chassis.c
@@ -645,10 +645,11 @@ chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,

-chassis_get_mac(const struct sbrec_chassis *chassis_rec,
-                const char *bridge_mapping,
-                struct eth_addr *chassis_mac)
+chassis_get_mac_mappings(const struct sbrec_chassis *chassis_rec,
+                         struct smap *chassis_mappings)
+    smap_init(chassis_mappings);
     const char *tokens
         = get_chassis_mac_mappings(&chassis_rec->other_config);
     if (!tokens[0]) {
@@ -656,7 +657,6 @@ chassis_get_mac(const struct sbrec_chassis *chassis_rec,

     char *save_ptr = NULL;
-    bool ret = false;
     char *tokstr = xstrdup(tokens);

     /* Format for a chassis mac configuration is:
@@ -669,24 +669,36 @@ chassis_get_mac(const struct sbrec_chassis *chassis_rec,
         char *chassis_mac_bridge = strtok_r(token, ":", &save_ptr2);
         char *chassis_mac_str = strtok_r(NULL, "", &save_ptr2);

-        if (!strcmp(chassis_mac_bridge, bridge_mapping)) {
-            struct eth_addr temp_mac;
+        smap_replace(chassis_mappings, chassis_mac_bridge, chassis_mac_str);
+    }

-            /* Return the first chassis mac. */
-            char *err_str = str_to_mac(chassis_mac_str, &temp_mac);
-            if (err_str) {
-                free(err_str);
-                continue;
-            }
+    free(tokstr);
+    return true;

-            ret = true;
-            *chassis_mac = temp_mac;
-            break;
-        }
+chassis_get_mac(const struct sbrec_chassis *chassis_rec,
+                const char *bridge_mapping,
+                struct eth_addr *chassis_mac)
+    struct smap chassis_mappings;
+    if (!chassis_get_mac_mappings(chassis_rec, &chassis_mappings)) {
+        return false;

-    free(tokstr);
-    return ret;
+    const char *chassis_mac_str = smap_get_def(&chassis_mappings,
+                                               bridge_mapping, "");
+    struct eth_addr temp_mac;
+    char *err_str = str_to_mac(chassis_mac_str, &temp_mac);
+    if (err_str) {
+        free(err_str);
+        return false;
+    }
+    *chassis_mac = temp_mac;
+    return true;

 /* Returns true if the database is all cleaned up, false if more work is
diff --git a/controller/chassis.h b/controller/chassis.h
index 178d2957e8..dae761312d 100644
--- a/controller/chassis.h
+++ b/controller/chassis.h
@@ -42,6 +42,8 @@ bool chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
 bool chassis_get_mac(const struct sbrec_chassis *chassis,
                      const char *bridge_mapping,
                      struct eth_addr *chassis_mac);
+bool chassis_get_mac_mappings(const struct sbrec_chassis *,
+                              struct smap *chassis_mappings);
 const char *chassis_get_id(void);
 const char * get_chassis_mac_mappings(const struct smap *ext_ids);

diff --git a/controller/physical.c b/controller/physical.c
index 6d7d8e93bc..b43a157b94 100644
--- a/controller/physical.c
+++ b/controller/physical.c
@@ -62,7 +62,8 @@ load_logical_ingress_metadata(const struct sbrec_port_binding 
 /* UUID to identify OF flows not associated with ovsdb rows. */
 static struct uuid *hc_uuid = NULL;


 physical_register_ovs_idl(struct ovsdb_idl *ovs_idl)
@@ -148,6 +149,18 @@ put_move(enum mf_field_id src, int src_ofs,
     move->dst.n_bits = n_bits;

+static void
+put_value(const uint8_t *data, size_t len,
+          enum mf_field_id dst, int ofs, int n_bits,
+          struct ofpbuf *ofpacts)
+    struct ofpact_set_field *sf = ofpact_put_set_field(ofpacts,
+                                                       mf_from_id(dst), NULL,
+                                                       NULL);
+    bitwise_copy(data, len, 0, sf->value, sf->field->n_bytes, ofs, n_bits);
+    bitwise_one(ofpact_set_field_mask(sf), sf->field->n_bytes, ofs, n_bits);
 static void
 put_resubmit(uint8_t table_id, struct ofpbuf *ofpacts)
@@ -494,11 +507,10 @@ put_chassis_mac_conj_id_flow(const struct 
sbrec_chassis_table *chassis_table,

         match_set_dl_src(&match, chassis_mac);

         conj = ofpact_put_CONJUNCTION(ofpacts_p);
         conj->n_clauses = 2;
         conj->clause = 0;
         ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, 180,
@@ -507,6 +519,51 @@ put_chassis_mac_conj_id_flow(const struct 
sbrec_chassis_table *chassis_table,

+    /* We need to replace the packet destined to the chassis mac (eth.dst)
+     * with the router mac. This is required to support external ports.
+     * These ports don't see the router mac at all since we send the
+     * chassis MAC in the ARP reply for any ARP requests to the router IPs.
+     * Without these flows, the packets will not enter the router pipeline
+     * if they need to be routed.
+     * Please see put_replace_chassis_mac_flows() for the 2nd clause of
+     * */
+    struct smap chassis_mac_mappings = SMAP_INITIALIZER(&chassis_mac_mappings);
+    if (chassis_get_mac_mappings(chassis, &chassis_mac_mappings)) {
+        struct smap_node *node;
+        struct sset macs = SSET_INITIALIZER(&macs);
+        SMAP_FOR_EACH (node, &chassis_mac_mappings) {
+            struct eth_addr chassis_mac;
+            char *err_str = str_to_mac(node->value, &chassis_mac);
+            if (err_str) {
+                free(err_str);
+                continue;
+            }
+            if (!sset_add(&macs, node->value)) {
+                /* The OF flow for the mac is already added. */
+                continue;
+            }
+            ofpbuf_clear(ofpacts_p);
+            match_init_catchall(&match);
+            match_set_dl_dst(&match, chassis_mac);
+            struct ofpact_conjunction *conj;
+            conj = ofpact_put_CONJUNCTION(ofpacts_p);
+            conj->id = CHASSIS_MAC_TO_ROUTER_DST_MAC_CONJID;
+            conj->n_clauses = 2;
+            conj->clause = 0;
+            ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, 180,
+                            0, &match, ofpacts_p, hc_uuid);
+        }
+        sset_destroy(&macs);
+    }
+    smap_destroy(&chassis_mac_mappings);

 static void
@@ -555,7 +612,7 @@ put_replace_chassis_mac_flows(const struct simap *ct_zones,

         /* Match on ingress port, vlan_id and conjunction id */
         match_set_in_port(&match, ofport);
-        match_set_conj_id(&match, CHASSIS_MAC_TO_ROUTER_MAC_CONJID);
+        match_set_conj_id(&match, CHASSIS_MAC_TO_ROUTER_SRC_MAC_CONJID);

         if (tag) {
             match_set_dl_vlan(&match, htons(tag), 0);
@@ -572,6 +629,37 @@ put_replace_chassis_mac_flows(const struct simap *ct_zones,
         replace_mac = ofpact_put_SET_ETH_SRC(ofpacts_p);
         replace_mac->mac = router_port_mac;

+        /* Resubmit to first logical ingress pipeline table. */
+        put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p);
+        ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, 180,
+                        rport_binding->header_.uuid.parts[0],
+                        &match, ofpacts_p, hc_uuid);
+        ofpbuf_clear(ofpacts_p);
+        match_init_catchall(&match);
+        /* Add flow, which will match on conjunction id and will
+         * replace destination mac with router port mac */
+        /* Match on ingress port, vlan_id and conjunction id */
+        match_set_in_port(&match, ofport);
+        match_set_conj_id(&match, CHASSIS_MAC_TO_ROUTER_DST_MAC_CONJID);
+        if (tag) {
+            match_set_dl_vlan(&match, htons(tag), 0);
+        } else {
+            match_set_dl_tci_masked(&match, 0, htons(VLAN_CFI));
+        }
+        /* Actions */
+        if (tag) {
+            ofpact_put_STRIP_VLAN(ofpacts_p);
+        }
+        load_logical_ingress_metadata(localnet_port, &zone_ids, ofpacts_p);
+        replace_mac = ofpact_put_SET_ETH_DST(ofpacts_p);
+        replace_mac->mac = router_port_mac;
         /* Resubmit to first logical ingress pipeline table. */
         put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p);
         ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, 180,
@@ -579,7 +667,7 @@ put_replace_chassis_mac_flows(const struct simap *ct_zones,
                         &match, ofpacts_p, hc_uuid);

         /* Provide second search criteria, i.e localnet port's
-         * vlan ID for conjunction flow */
+         * vlan ID for conjunction flows. */
         struct ofpact_conjunction *conj;
@@ -591,12 +679,19 @@ put_replace_chassis_mac_flows(const struct simap 

         conj = ofpact_put_CONJUNCTION(ofpacts_p);
+        conj->n_clauses = 2;
+        conj->clause = 1;
+        conj = ofpact_put_CONJUNCTION(ofpacts_p);
         conj->n_clauses = 2;
         conj->clause = 1;
         ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, 180,
                         &match, ofpacts_p, hc_uuid);

@@ -665,9 +760,6 @@ put_replace_router_port_mac_flows(struct ovsdb_idl_index
          * a. Flow replaces ingress router port mac with a chassis mac.
          * b. Flow appends the vlan id localnet port is configured with.
-        match_init_catchall(&match);
-        ofpbuf_clear(ofpacts_p);
         ovs_assert(rport_binding->n_mac == 1);
         char *err_str = str_to_mac(rport_binding->mac[0], &router_port_mac);
         if (err_str) {
@@ -679,6 +771,9 @@ put_replace_router_port_mac_flows(struct ovsdb_idl_index

         /* Replace Router mac flow */
+        match_init_catchall(&match);
+        ofpbuf_clear(ofpacts_p);
         match_set_metadata(&match, htonll(dp_key));
         match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, port_key);
         match_set_dl_src(&match, router_port_mac);
@@ -698,6 +793,38 @@ put_replace_router_port_mac_flows(struct ovsdb_idl_index
         ofctrl_add_flow(flow_table, OFTABLE_LOG_TO_PHY, 150,
                         &match, ofpacts_p, &localnet_port->header_.uuid);
+        /* Replace Router mac in the ARP packets (arp.sha) to the chassis MAC.
+         * This is very important and required for external logical ports and
+         * when these ports send ARP for their router IPs, the chassis mac
+         * should be sent which has claimed these external ports. */
+        match_init_catchall(&match);
+        ofpbuf_clear(ofpacts_p);
+        match_set_metadata(&match, htonll(dp_key));
+        match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, port_key);
+        match_set_dl_src(&match, router_port_mac);
+        match_set_dl_type(&match, htons(ETH_TYPE_ARP));
+        match_set_arp_sha(&match, router_port_mac);
+        replace_mac = ofpact_put_SET_ETH_SRC(ofpacts_p);
+        replace_mac->mac = chassis_mac;
+        if (tag) {
+            struct ofpact_vlan_vid *vlan_vid;
+            vlan_vid = ofpact_put_SET_VLAN_VID(ofpacts_p);
+            vlan_vid->vlan_vid = tag;
+            vlan_vid->push_vlan_if_needed = true;
+        }
+        put_value(chassis_mac.ea, sizeof chassis_mac.ea, MFF_ARP_SHA,
+                  0, 48, ofpacts_p);
+        ofpact_put_OUTPUT(ofpacts_p)->port = ofport;
+        ofctrl_add_flow(flow_table, OFTABLE_LOG_TO_PHY, 160,
+                        localnet_port->header_.uuid.parts[0],
+                        &match, ofpacts_p, &localnet_port->header_.uuid);

diff --git a/tests/ovn.at 
 22N08Wzzaix2xc&e=> [ovn.at 
index 24d93bc245..f033401410 100644
--- a/tests/ovn.at 
+++ b/tests/ovn.at 
@@ -14748,6 +14748,137 @@ AT_CHECK([cat ext1_v6.packets | cut -c -120], [0], 
 cat ext1_v6.expected | cut -c 125- > expout
 AT_CHECK([cat ext1_v6.packets | cut -c 125-], [0], [expout])

+# Configure ovn-chassis-mac-mappings on all the hypervisors.
+as hv1
+ovs-vsctl set open . 
+as hv2
+ovs-vsctl set open . 
+as hv3
+ovs-vsctl set open . 
+OVS_WAIT_UNTIL([test 6 = $(as hv1 ovs-ofctl dump-flows br-int table=0 | grep 
conj -c)])
+OVS_WAIT_UNTIL([test 6 = $(as hv2 ovs-ofctl dump-flows br-int table=0 | grep 
conj -c)])
+OVS_WAIT_UNTIL([test 6 = $(as hv3 ovs-ofctl dump-flows br-int table=0 | grep 
conj -c)])
+OVS_WAIT_UNTIL([test 1 = $(as hv1 ovs-ofctl dump-flows br-int table=0 | \
+grep conj | grep "dl_dst=1e:02:ad:aa:bb:01" -c)])
+OVS_WAIT_UNTIL([test 1 = $(as hv2 ovs-ofctl dump-flows br-int table=0 | \
+grep conj | grep "dl_dst=1e:02:ad:aa:bb:02" -c)])
+OVS_WAIT_UNTIL([test 1 = $(as hv3 ovs-ofctl dump-flows br-int table=0 | \
+grep conj | grep "dl_dst=1e:02:ad:aa:bb:03" -c)])
+OVS_WAIT_UNTIL([test 1 = $(as hv1 ovs-ofctl dump-flows br-int table=65,arp | \
+grep "load:0x1e02adaabb01->NXM_NX_ARP_SHA" -c)])
+OVS_WAIT_UNTIL([test 0 = $(as hv2 ovs-ofctl dump-flows br-int table=65,arp | \
+grep "load:0x1e02adaabb01->NXM_NX_ARP_SHA" -c)])
+OVS_WAIT_UNTIL([test 1 = $(as hv2 ovs-ofctl dump-flows br-int table=65,arp | \
+grep "load:0x1e02adaabb02->NXM_NX_ARP_SHA" -c)])
+OVS_WAIT_UNTIL([test 1 = $(as hv3 ovs-ofctl dump-flows br-int table=65,arp | \
+grep "load:0x1e02adaabb03->NXM_NX_ARP_SHA" -c)])
+as hv1
+reset_pcap_file hv1-ext1 hv1/ext1
+send_arp_request() {
+    local inport=$1 eth_src=$2 eth_dst=$3 spa=$4 tpa=$5
+    local reply_src_mac=$6 reply_dst_mac=$7
+    local reply_sha=$8 reply_tha=$9
+    local eth_type=0806
+    local eth=${eth_dst}${eth_src}${eth_type}
+    local arp=0001080006040001${eth_src}${spa}${eth_dst}${tpa}
+    local request=${eth}${arp}
+    as hv1 ovs-appctl netdev-dummy/receive hv${inport}-ext${inport} $request
+    local reply=${reply_dst_mac}${reply_src_mac}${eth_type}
+    reply=${reply}0001080006040002${reply_sha}${tpa}${reply_tha}${spa}
+    echo $reply > hv1-ext${inport}.expected
+# Send ARP request to router ip -
+send_arp_request 1 ${src_mac} ${dst_mac} $(ip_to_hex 10 0 0 6) $(ip_to_hex 10 
0 0 1) \
+${reply_src_mac} ${repl_dst_mac} ${reply_src_mac} ${repl_dst_mac}
+OVS_WAIT_UNTIL([test 1 = $(as hv3 ovs-ofctl dump-flows br-int table=65,arp | \
+grep "load:0x1e02adaabb03->NXM_NX_ARP_SHA" | grep "n_packets=1" -c)])
+OVN_CHECK_PACKETS([hv1/ext1-tx.pcap], [hv1-ext1.expected])
+as hv1
+reset_pcap_file hv1-ext1 hv1/ext1
+# Send unicast ARP request destined to the chassis mac of hv3.
+send_arp_request 1 ${src_mac} ${dst_mac} $(ip_to_hex 10 0 0 6) $(ip_to_hex 10 
0 0 1) \
+${reply_src_mac} ${repl_dst_mac} ${reply_src_mac} ${repl_dst_mac}
+OVS_WAIT_UNTIL([test 1 = $(as hv3 ovs-ofctl dump-flows br-int table=65,arp | \
+grep "load:0x1e02adaabb03->NXM_NX_ARP_SHA" | grep "n_packets=2" -c)])
+OVN_CHECK_PACKETS([hv1/ext1-tx.pcap], [hv1-ext1.expected])
+# Make hv2 active.
+ovn-nbctl ha-chassis-group-add-chassis hagrp1 hv2 60
+    [chassis=`ovn-sbctl --bare --columns chassis find port_binding \
+    test "$chassis" = "$hv2_uuid"])
+reset_pcap_file hv1-ext1 hv1/ext1
+# Send ARP request to router ip - Should be replied by hv2.
+send_arp_request 1 ${src_mac} ${dst_mac} $(ip_to_hex 10 0 0 6) $(ip_to_hex 10 
0 0 1) \
+${reply_src_mac} ${repl_dst_mac} ${reply_src_mac} ${repl_dst_mac}
+OVS_WAIT_UNTIL([test 1 = $(as hv2 ovs-ofctl dump-flows br-int table=65,arp | \
+grep "load:0x1e02adaabb02->NXM_NX_ARP_SHA" | grep "n_packets=1" -c)])
+OVN_CHECK_PACKETS([hv1/ext1-tx.pcap], [hv1-ext1.expected])
+as hv1
+reset_pcap_file hv1-ext1 hv1/ext1
+# Send unicast ARP request destined to the chassis mac of hv2.
+send_arp_request 1 ${src_mac} ${dst_mac} $(ip_to_hex 10 0 0 6) $(ip_to_hex 10 
0 0 1) \
+${reply_src_mac} ${repl_dst_mac} ${reply_src_mac} ${repl_dst_mac}
+OVS_WAIT_UNTIL([test 1 = $(as hv2 ovs-ofctl dump-flows br-int table=65,arp | \
+grep "load:0x1e02adaabb02->NXM_NX_ARP_SHA" | grep "n_packets=2" -c)])
+OVN_CHECK_PACKETS([hv1/ext1-tx.pcap], [hv1-ext1.expected])
+ovn-nbctl ha-chassis-group-add-chassis hagrp1 hv3 70
+ovn-nbctl ha-chassis-group-add-chassis hagrp1 hv2 10
+    [chassis=`ovn-sbctl --bare --columns chassis find port_binding \
+    test "$chassis" = "$hv3_uuid"])
 # disconnect hv3 from the network, hv1 should take over
 as hv3

dev mailing list

dev mailing list

dev mailing list
dev mailing list

Reply via email to