Hi Han,
Just some updates here.
I tried with 4K networks on single router. Configuration was done without any
issues. I checked both
nb-db and sb-db, they all look good. It's just that router configuration is
huge (in Neutron DB, nb-db
and flow table in sb-db), because it contains all 4K ports. Also, the pipeline
of router datapath in sb-db
is quite big.
I see ovn-northd master and sb-db leader are busy, taking 90+% CPU. There are
only 3 compute nodes
and 2 gateway nodes. Does that monitor setting "ovn-monitor-all" matters in
such case? Any idea what
they are busy with, without any configuration updates from OpenStack? The nb-db
is not busy though.
Probably because nb-db is busy, ovn-controller can't connect to it
consistently. It keeps being
disconnected and reconnecting. Restarting ovn-controller seems help. I am able
to launch a few VMs
on different networks and they are connected via the router.
Now, I have problem on external access. The router is set as gateway to a
provider/underlay network
on an interface on the gateway node. The router is allocated an underlay
address from that provider
network. My understanding is that, the br-ex on gateway node holding the active
router will broadcast
ARP to announce that router underlay address in case of failover. Also, it will
respond ARP request for
that router underlay address. But when I run tcpdump on that underlay interface
on gateway node,
I see ARP request coming in, but no ARP response going out. I checked the flow
table in sb-db, it seems
ok. I also checked flow on br-ex by "ovs-ofctl dump-flows br-ex", I don't see
anything about ARP there.
How should I look into it?
Again, the case is to support 4K networks with external access (security group
is disabled),
4K routers (one for each network), 50 routers (one for 80 networks), 1 router
(for all 4K networks)...
All networks are isolated by ACL on the logical router. Which option should
work better?
Any comment is appreciated.
Thanks!
Tony
From: discuss on behalf of Tony Liu
Sent: July 21, 2020 09:09 PM
To: Daniel Alvarez
Cc: ovs-discuss@openvswitch.org
Subject: Re: [ovs-discuss] OVN scale
[root@ovn-db-2 ~]# ovn-nbctl list nb_global
_uuid : b7b3aa05-f7ed-4dbc-979f-10445ac325b8
connections : []
external_ids: {"neutron:liveness_check_at"="2020-07-22
04:03:17.726917+00:00"}
hv_cfg : 312
ipsec : false
name: ""
nb_cfg : 2636
options : {mac_prefix="ca:e8:07",
svc_monitor_mac="4e:d0:3a:80:d4:b7"}
sb_cfg : 2005
ssl : []
[root@ovn-db-2 ~]# ovn-sbctl list sb_global
_uuid : 3720bc1d-b0da-47ce-85ca-96fa8d398489
connections : []
external_ids: {}
ipsec : false
nb_cfg : 312
options : {mac_prefix="ca:e8:07",
svc_monitor_mac="4e:d0:3a:80:d4:b7"}
ssl : []
The NBDB and SBDB is definitely out of sync. Is there any way to force
ovn-northd sync them?
Thanks!
Tony
From: Tony Liu
Sent: July 21, 2020 08:39 PM
To: Daniel Alvarez
Cc: Cory Hawkless ; ovs-discuss@openvswitch.org
; Dumitru Ceara
Subject: Re: [ovs-discuss] OVN scale
When create a network (and subnet) on OpenStack, a GW port and service port
(for DHCP and metadata)
are also created. They are created in Neutron and onv-nb-db by ML2 driver. Then
ovn-northd will translate
such update from NBDB to SBDB. My question here is that, with 20.03, is this
translation incremental?
After created 4000 networks successfully on OpenStack, I see 4000 logical
switches and 8000 LS ports
in NBDB. But in SBDB, there are only 1567 port-bindings. The break happened
when translating 1568th
port. If ovn-northd recompiles the whole DB for every update, this problem can
be explained. The DB is
too big for ovn-northd to compile in time, so all the followed updates are
lost. Does it make sense?
I recall DB update is coordinated by some "version", like some changes happened
in NBDB, the version
bumps up, ovn-northd update SBDB and bumps up version as well, so they match.
So, if NBDB version
bumps up more than once while ovn-northd updating SBDB, is that still going to
work? If yes, then it's
just matter of time, no matter how fast update happening in NBDB, ovn-northd
will catch them up
eventually. Am I right about that?
Any comment is welcome.
Thanks!
Tony
From: Tony Liu
Sent: July 21, 2020 10:22 AM
To: Daniel Alvarez
Cc: Cory Hawkless ; ovs-discuss@openvswitch.org
; Dumitru Ceara
Subject: Re: [ovs-discuss] OVN scale
Hi Daniel, all
4000 networks and 50 routers, 200 networks on each router, they are all created.
CPU usage of Neutron server, ovn-nb-db, ovn-northd, ovn-sb-db, ovn-controller
and ovs-vswitchd is OK,
not consistently 100%, but still some spikes to it.
Now, when create VM, I got that "waiting for vif-plugged