On 30.11.20 07:07, Numan Siddique wrote:
On Mon, Nov 30, 2020 at 7:37 AM Han Zhou <hz...@ovn.org> wrote:
On Sat, Nov 28, 2020 at 12:31 PM Tony Liu <tonyliu0...@hotmail.com> wrote:
Hi Renat,

Hi folks,

What's this "logical datapath patches that Ilya Maximets submitted"?
Could you share some links?

There were couple discussions for the similar issue.
[1] raised the issue and results a new option
always_learn_from_arp_request to be added [2].
[3] results a patch to OVN ML2 driver [4] to set the option added by [1].

It seems that it helps to optimize logical_flow table.
I am not sure if it helps on mac_binding as well.

Is it the same issue we are trying to address here, by either
Numan's local cache or the solution proposed by Dumitru?

[1]
https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/049994.html
[2]
https://github.com/ovn-org/ovn/commit/61ccc6b5fc7c49b512e26347cfa12b86f0ec2fd9#diff-05b24a3133733fb7b0f979698083b8128e8f1f18c3c2bd09002ae788d34a32f5
[3] http://osdir.com/openstack-discuss/msg16002.html
[4] https://review.opendev.org/c/openstack/neutron/+/752678


Thanks!
Tony
Thanks Tony for pointing to the old discussion [0]. I thought setting the
option always_learn_from_arp_request to "false" on the logical routers
should have solved this scale problem in MAC_Binding table in this scenario.

However, it seems the commit a2b88dc513 ("pinctrl: Directly update
MAC_Bindings created by self originated GARPs.") have overridden the
option. (I haven't tested, but maybe @Dumitru Ceara <dce...@redhat.com> can
confirm.)

Similarly, for the Logical_Flow explosion, it should have been solved by
setting the option dynamic_neigh_routers to "true".

I think these two options are exactly for the scenario Renat is
reporting. @Renat, could you try setting these options as suggested above
using the OVN version before the commit a2b88dc513 to see if it solves your
problem?

When you test it out with the suggested commit, please delete the
mac_binding entries manually
as ovn-northd or ovn-controllers don't delete any entries from
mac_binding table.

We tested with dynamic_neigh_routers set to true, and we saw some very
positive change, size of Logical_Flows table decresed from 600k
entries to 100k. This is a huge difference, thanks for pointing this
out!

It did not affect MAC_Binding table with commit a2b88dc513 ("pinctrl:
Directly update MAC_Bindings created by self originated GARPs."), but
that was expected. Just for test purposes we commented out some code
as follows:

diff --git a/controller/pinctrl.c b/controller/pinctrl.c
index 291202c24..76047939c 100644
--- a/controller/pinctrl.c
+++ b/controller/pinctrl.c
@@ -4115,10 +4115,10 @@ send_garp_rarp_update(struct ovsdb_idl_txn 
*ovnsb_idl_txn,
                                   laddrs->ipv4_addrs[i].addr,
                                   binding_rec->datapath->tunnel_key,
                                   binding_rec->tunnel_key);
-                    send_garp_locally(ovnsb_idl_txn,
-                                      sbrec_mac_binding_by_lport_ip,
-                                      local_datapaths, binding_rec, laddrs->ea,
-                                      laddrs->ipv4_addrs[i].addr);
+                    //send_garp_locally(ovnsb_idl_txn,
+                    //                  sbrec_mac_binding_by_lport_ip,
+                    //                  local_datapaths, binding_rec, 
laddrs->ea,
+                    //                  laddrs->ipv4_addrs[i].addr);

                 }
                 free(name);

Together with dynamic_neigh_routers we achieved quite a stable setup,
with 62 MiB SB database, which is a huge step forward after 1.9 GiB.
MAC_Binding size stays around 2000 entries, in comparison to almost a
million.

Will it make sense to make behaviour introduced in a2b88dc513
toggleable via a command line option, before there is a better
solution?

Thanks,
Renat.

Regarding the proposals in this thread:
- Move MAC_Binding to LS (by Dumitru)
     This sounds good to me, while I am not sure about all the implications
yet, wondering why it was associated with LRP instead in the beginning.

- Remove MAC_Binding from SB (by Numan)
     I am a little concerned about this. The MAC_Binding in SB is required
for distributed LR to work for dynamic ARP resolving. Consider a general
use case: A - LS1 - LR1 - LS2 - B. A is on HV1 and B is on HV2. Now A sends
a packet to B's IP. Assume B's IP is unknown by OVN. The packet is routed
by LR1 and on the LRP facing LS2 an ARP is sent out over the LS1 logical
network. The above steps happen on HV1. Now the ARP request reaches HV2 and
is received by B, so B sends an ARP response. With the current
implementation, HV2's OVS flow would learn the MAC-IP binding from the ARP
response and update SB DB, and HV1 will get the SB update and install the
MAC Binding flow as a result of ARP resolving. The next time A sends a
packet to B, the HV1 will directly resolve the ARP from the MAC Binding
flows locally and send the IP packet to HV2. The SB DB MAC_Binding table
works as a distributed ARP/Neighbor cache. It is a mechanism to sync the
ARP cache from the place where it is learned to the place where it is
initiated, and all HVs benefit from this without the need to send ARP
themselves for the same LRP. In other words, the LRP is distributed, so the
ARP resolving is in a distributed fashion. Without this, each HV would
initiate ARP request on behalf of the same LRP, which would largely
increase the ARP traffic unnecessarily - even more than the traditional
network (where one physical router only needs to do one ARP resolving for
each neighbor and maintain one copy of ARP cache). And I am not sure if
there are other side effects when an endpoint sees unexpectedly frequent
ARP requests from the same LRP - would there be any rate limit that even
discards repeated ARP requests from the same source? Numan, maybe you have
already considered these. Would you share your thoughts?
Thanks for the comments and highlighting this use case which I missed
completely.

I was thinking more in lines on the N-S usecase with a distributed
gateway router port.
And I completely missed the E-W with an unknown address scenario. If
we don't consider
the unknown address scenario, I think moving away from MAC_Binding
south db tabe would
be beneficial in the long run. For  few reasons
    1. For better scale.
    2. To address the mac_binding stale entries (which presently CMS
have to handle)

For N-S traffic scenario, ovn-controller claiming the gw router port
will take care of generating the ARP.
For Floating IP dvr scenario, each compute node will have to generate
the ARP request to learn a remote.
I think this should be fine as it is just a one time thing.

Regarding the unknown address scenario, right now ovn controller
floods the packet to all the unknown logical ports
of a switch if OVN doesn't know the MAC. All these are unknown logical
ports belonging to a multicast group.

I think we should solve this case. In the case of Openstack, when port
security is disabled for a neutron port, the logical
port will have an unknown address configured. There are a few related
bugzillas/lauchpad bugs [1].

I think we should fix this behavior in OVN and ovn should do the mac
learning on the switch for the unknown ports. And If we do that,
I think the scenario you mentioned will be addressed.

Maybe we can extend Dumitru's suggestion and have just one approach
which does the mac learning on the switch (keeping
the SB Mac_binding table).
     -  for unknown logical ports
     -  for unknown macs for the N-S routing.

Any thoughts ?

FYI - I have a PoC/RFC patch in progress which adds the mac binding
cache support -
https://github.com/numansiddique/ovn/commit/22082d04ca789155ea2edd3c1706bde509ae44da

[1] - https://review.opendev.org/c/openstack/neutron/+/763567/
        https://bugzilla.redhat.com/show_bug.cgi?id=1888441
       https://bugs.launchpad.net/neutron/+bug/1904412
       https://bugzilla.redhat.com/show_bug.cgi?id=1672625

Thanks
Numan

Thanks,
Han

-----Original Message-----
From: dev <ovs-dev-boun...@openvswitch.org> On Behalf Of Numan Siddique
Sent: Thursday, November 26, 2020 11:36 AM
To: Daniel Alvarez Sanchez <dalva...@redhat.com>
Cc: ovs-dev <ovs-dev@openvswitch.org>
Subject: Re: [ovs-dev] Scaling of Logical_Flows and MAC_Binding tables

On Thu, Nov 26, 2020 at 4:32 PM Numan Siddique <num...@ovn.org> wrote:
On Thu, Nov 26, 2020 at 4:11 PM Daniel Alvarez Sanchez
<dalva...@redhat.com> wrote:
On Wed, Nov 25, 2020 at 7:59 PM Dumitru Ceara <dce...@redhat.com>
wrote:
On 11/25/20 7:06 PM, Numan Siddique wrote:
On Wed, Nov 25, 2020 at 10:24 PM Renat Nurgaliyev
<imple...@gmail.com>
wrote:


On 25.11.20 16:14, Dumitru Ceara wrote:
On 11/25/20 3:30 PM, Renat Nurgaliyev wrote:
Hello folks,

Hi Renat,

we run a lab where we try to evaluate scalability potential
of OVN
with
OpenStack as CMS.
Current lab setup is following:

500 networks
500 routers
1500 VM ports (3 per network/router)
1500 Floating IPs (one per VM port)

There is an external network, which is bridged to br-provider
on
gateway
nodes. There are 2000 ports
connected to this external network (1500 Floating IPs + 500
SNAT
router
ports). So the setup is not
very big we'd say, but after applying this configuration via
ML2/OVN plugin, northd kicks in and does its job, and after
its done, Logical_Flow table gets 645877 entries, which is
way too much. But ok, we move on and start one controller on
the gateway chassis, and here things get really messy.
MAC_Binding table grows from 0 to 999088 entries in one
moment, and after its done, the size of SB biggest tables
look like this:

999088 MAC_Binding
645877 Logical_Flow
4726 Port_Binding
1117 Multicast_Group
1068 Datapath_Binding
1046 Port_Group
551 IP_Multicast
519 DNS
517 HA_Chassis_Group
517 HA_Chassis
...

MAC binding table gets huge, basically it now has an entry
for every port that is connected to external network * number
of datapaths, which roughly makes it one million entries.
This table by itself increases the size of the SB by 200
megabytes. Logical_Flow table also gets very heavy, we have
already played a bit with logical datapath patches that Ilya
Maximets submitted, and it
looks
much better, but the size of
the MAC_Binding table still feels inadequate.

We would like to start to work at least on MAC_Binding table
optimisation, but it is a bit difficult to start working from
scratch. Can someone help us with ideas how this could be
optimised?

Maybe it would also make sense to group entries in
MAC_Binding table
in
the same way like it is proposed for logical flows in Ilya's
patch?

Maybe it would work but I'm not really sure how, right now.
However, what if we change the way MAC_Bindings are created?

Right now a MAC Binding is created for each logical router
port but in your case there are a lot of logical router ports
connected to the single provider logical switch and they all
learn the same ARPs.
What if we instead store MAC_Bindings per logical switch?
Basically sharing all these MAC_Bindings between all router
ports connected to
the
same LS.

Do you see any problem with this approach?

Thanks,
Dumitru


I believe that this approach is way to go, at least nothing
comes to my
mind
that could go wrong here. We will try to make a patch for that.
However, if
someone is familiar with the code and knows how to do it fast,
it would
also
be very nice.
This approach should work.

I've another idea (I won't call it a solution yet). What if we
drop the usage of MAC_Binding altogether ?
This would be great!

- When ovn-controller learns a mac_binding, it will not create a
row into the SB MAC_binding table
- Instead it will maintain the learnt mac binding in its memory.
- ovn-controller will still program the table 66 with the flow
to set the eth.dst (for the get_arp() action)

This has a couple of advantages
   - Right now we never flush the old/stale mac_binding entries.
   - If suppose the mac of an external IP has changed, but OVN
has an entry for that IP with old mac in the mac_binding table,
     we will use the old mac, causing the packet to be sent out
to the wrong destination and the packet might get lost.
   - So we will get rid of this problem
   - We will also save SB DB space.

There are few disadvantages
   -  Other ovn-controllers will not add the flows in table 66. I
guess this should be fine as each ovn-controller can generate
the ARP request and learn the mac.
   - When ovn-controller restarts we lose the learnt macs and
would need to learn again.

Any thoughts on this ?
It'd be great to have some sort of local ARP cache but I'm concerned
about the performance implications.

- How are you going to determine when an entry is stale?
If you slow path the packets to reset the timeout everytime a pkt
with source mac is received, it doesn't look good. Maybe you have
something else in mind.
Right now we don't stale any mac_binding entry. If I understand you
correctly, your concern is for the scenario where a floating ip is
updated with a different mac, how the local cache is updated ?

Right now networking-ovn (in the case of openstack) updates the
mac_binding entry in the South db for such cases right ?

FYI - I have started working on this approach as PoC. i.e to use local
mac_binding cache
instead of using the SB mac_binding table.

I will update this thread about the progress.

Thanks
Numan

Thanks
Numan

-

There's another scenario that we need to take care of and doesn't
seem
too obvious to address without MAC_Bindings.

GARPs were being injected in the L2 broadcast domain of a LS for
nat
addresses in case FIPs are reused by the CMS, introduced by:


https://github.com/ovn-
org/ovn/commit/069a32cbf443c937feff44078e8828d7a2702da8

Dumitru and I have been discussing the possibility of reverting this
patch
and rely on CMSs to maintain the MAC_Binding entries associated with
the
FIPs [0].
I'm against reverting this patch in OVN [1] for multiple reasons
being the
most important one the fact that if we rely on workarounds in the
CMS side,
we'll be creating a control plane dependency for something that is
pure
dataplane only (ie. if Neutron server is down - outage, upgrades,
etc. -,
traffic is going to be disrupted). On the other hand one could argue
that
the same dependency now exists on ovn-controller being up & running
but I
believe that this is better than a) relying on workarounds on CMSs
b)
relying on CMSs availability.

In the short term I think that moving the MAC_Binding entries to LS
instead
of LRP as it was suggested up thread would be a good idea and in the
long
haul, the ARP *local* cache seems to be the right solution.
Brainstorming
with Dumitru he suggested inspecting the flows regularly to see if
the
packet count on flows that check if src_mac == X has not increased
in a
while and then remove the ARP responder flows locally.

[0]
https://github.com/openstack/networking-
ovn/commit/5181f1106ff839d08152623c25c9a5f6797aa2d7
[1]
https://github.com/ovn-
org/ovn/commit/069a32cbf443c937feff44078e8828d7a2702da8

Recently, due to the dataplane scaling issue (4K resubmit limit
being
hit), we don't flood these packets on non-router ports and instead
create the MAC Bindings directly from ovn-controller:


https://github.com/ovn-
org/ovn/commit/a2b88dc5136507e727e4bcdc4bf6fde559f519a9
Without the MAC_Binding table we'd need to find a way to update or
flush
stale bindings when an IP is used for a VIF or FIP.

Thanks,
Dumitru

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to