Re: [ovs-discuss] OVN Scale with RAFT: how to make ovn-northd more reliable when RAFT leader unstable

2020-08-28 Thread Dumitru Ceara
On 8/28/20 1:01 AM, Winson Wang wrote:
> Hi Dumitru,
> 
> Have you tried the OVS "learn" action to see if it address this scale issue?
> 
> 
> Regards,
> Winson
> 
>  

Hi Winson,

Sorry, didn't get a chance to look at this yet. It's still on my todo-list.

Regards,
Dumitru

> 
> On Fri, Jul 17, 2020 at 8:53 AM Winson Wang  > wrote:
> 
> 
> 
> On Fri, Jul 17, 2020 at 12:54 AM Dumitru Ceara  > wrote:
> 
> On 7/17/20 2:58 AM, Winson Wang wrote:
> > Hi Dumitru,
> >
> > most of the flows are in table 19.
> 
> This is the ls_in_pre_hairpin table where we add flows for each
> backend
> of the load balancers.
> 
> >
> > -rw-r--r-- 1 root root 142M Jul 16 17:07 br-int.txt (all flows
> dump file)
> > -rw-r--r-- 1 root root 102M Jul 16 17:43 table-19.txt
> > -rw-r--r-- 1 root root 7.8M Jul 16 17:43 table-11.txt
> > -rw-r--r-- 1 root root 3.7M Jul 16 17:43 table-21.txt
> >
> > # cat table-19.txt |wc -l
> > 408458
> > ]# cat table-19.txt | grep "=9153" | wc -l
> > 124744
> >  cat table-19.txt | grep "=53" | wc -l
> > 249488
> > Coredns pod has svc with port number 53 and 9153.
> >
> 
> How many backends do you have for these VIPs (with port number
> 53 and
> 9153) in your load_balancer config?
> 
> backends number is 63 with 63 CoreDNS pods running with exposing
> Cluster IP 10.96.0.10.  tcp/53,  udp/53, tcp/9153.
> lb-list  |grep 10.96.0.10
> 3b8a468a-44d2-4a34-94ca-626dac936cde                        udp    
>    10.96.0.10:53         192.168.104.3:53
> ,192.168.105.3:53
> ,192.168.106.3:53
> ,192.168.107.3:53
> ,192.168.108.3:53
> ,192.168.109.3:53
> ,192.168.110.3:53
> ,192.168.111.3:53
> ,192.168.112.3:53
> ,192.168.113.3:53
> ,192.168.114.3:53
> ,192.168.115.3:53
> ,192.168.116.3:53
> ,192.168.118.4:53
> ,192.168.119.3:53
> ,192.168.120.4:53
> ,192.168.121.3:53
> ,192.168.122.3:53
> ,192.168.123.3:53
> ,192.168.130.3:53
> ,192.168.131.3:53
> ,192.168.136.3:53
> ,192.168.142.3:53
> ,192.168.4.3:53
> ,192.168.45.3:53
> ,192.168.46.3:53
> ,192.168.47.3:53
> ,192.168.48.3:53
> ,192.168.49.3:53
> ,192.168.50.3:53
> ,192.168.51.3:53
> ,192.168.52.3:53
> ,192.168.53.3:53
> ,192.168.54.3:53
> ,192.168.55.3:53
> ,192.168.56.3:53
> ,192.168.57.3:53
> ,192.168.58.3:53
> ,192.168.59.4:53
> ,192.168.60.4:53
> ,192.168.61.3:53
> ,192.168.62.3:53
> ,192.168.63.3:53
> ,192.168.64.3:53
> ,192.168.65.3:53
> ,192.168.66.3:53
> ,192.168.67.3:53
> ,192.168.68.4:53
> ,192.168.69.3:53
> ,192.168.70.3:53
> ,192.168.71.3:53
> ,192.168.72.3:53
> ,192.168.73.3:53
> ,192.168.74.4:53
> ,192.168.75.4:53
> ,192.168.76.3:53
> ,192.168.77.3:53
> ,192.168.78.3:53
> ,192.168.79.3:53
> ,192.168.80.4:53
> ,192.168.81.3:53
> ,192.168.82.4:53
> ,192.168.83.4:53 
>                                                             tcp    
>    10.96.0.10:53         192.168.104.3:53
> ,192.168.105.3:53
> ,192.168.106.3:53
> 

Re: [ovs-discuss] OVN Scale with RAFT: how to make ovn-northd more reliable when RAFT leader unstable

2020-08-27 Thread Winson Wang
Hi Dumitru,

Have you tried the OVS "learn" action to see if it address this scale issue?


Regards,
Winson



On Fri, Jul 17, 2020 at 8:53 AM Winson Wang  wrote:

>
>
> On Fri, Jul 17, 2020 at 12:54 AM Dumitru Ceara  wrote:
>
>> On 7/17/20 2:58 AM, Winson Wang wrote:
>> > Hi Dumitru,
>> >
>> > most of the flows are in table 19.
>>
>> This is the ls_in_pre_hairpin table where we add flows for each backend
>> of the load balancers.
>>
>> >
>> > -rw-r--r-- 1 root root 142M Jul 16 17:07 br-int.txt (all flows dump
>> file)
>> > -rw-r--r-- 1 root root 102M Jul 16 17:43 table-19.txt
>> > -rw-r--r-- 1 root root 7.8M Jul 16 17:43 table-11.txt
>> > -rw-r--r-- 1 root root 3.7M Jul 16 17:43 table-21.txt
>> >
>> > # cat table-19.txt |wc -l
>> > 408458
>> > ]# cat table-19.txt | grep "=9153" | wc -l
>> > 124744
>> >  cat table-19.txt | grep "=53" | wc -l
>> > 249488
>> > Coredns pod has svc with port number 53 and 9153.
>> >
>>
>> How many backends do you have for these VIPs (with port number 53 and
>> 9153) in your load_balancer config?
>>
> backends number is 63 with 63 CoreDNS pods running with exposing Cluster
> IP 10.96.0.10.  tcp/53,  udp/53, tcp/9153.
> lb-list  |grep 10.96.0.10
> 3b8a468a-44d2-4a34-94ca-626dac936cdeudp
> 10.96.0.10:53192.168.104.3:53,192.168.105.3:53,192.168.106.3:53,
> 192.168.107.3:53,192.168.108.3:53,192.168.109.3:53,192.168.110.3:53,
> 192.168.111.3:53,192.168.112.3:53,192.168.113.3:53,192.168.114.3:53,
> 192.168.115.3:53,192.168.116.3:53,192.168.118.4:53,192.168.119.3:53,
> 192.168.120.4:53,192.168.121.3:53,192.168.122.3:53,192.168.123.3:53,
> 192.168.130.3:53,192.168.131.3:53,192.168.136.3:53,192.168.142.3:53,
> 192.168.4.3:53,192.168.45.3:53,192.168.46.3:53,192.168.47.3:53,
> 192.168.48.3:53,192.168.49.3:53,192.168.50.3:53,192.168.51.3:53,
> 192.168.52.3:53,192.168.53.3:53,192.168.54.3:53,192.168.55.3:53,
> 192.168.56.3:53,192.168.57.3:53,192.168.58.3:53,192.168.59.4:53,
> 192.168.60.4:53,192.168.61.3:53,192.168.62.3:53,192.168.63.3:53,
> 192.168.64.3:53,192.168.65.3:53,192.168.66.3:53,192.168.67.3:53,
> 192.168.68.4:53,192.168.69.3:53,192.168.70.3:53,192.168.71.3:53,
> 192.168.72.3:53,192.168.73.3:53,192.168.74.4:53,192.168.75.4:53,
> 192.168.76.3:53,192.168.77.3:53,192.168.78.3:53,192.168.79.3:53,
> 192.168.80.4:53,192.168.81.3:53,192.168.82.4:53,192.168.83.4:53
> tcp
> 10.96.0.10:53192.168.104.3:53,192.168.105.3:53,192.168.106.3:53,
> 192.168.107.3:53,192.168.108.3:53,192.168.109.3:53,192.168.110.3:53,
> 192.168.111.3:53,192.168.112.3:53,192.168.113.3:53,192.168.114.3:53,
> 192.168.115.3:53,192.168.116.3:53,192.168.118.4:53,192.168.119.3:53,
> 192.168.120.4:53,192.168.121.3:53,192.168.122.3:53,192.168.123.3:53,
> 192.168.130.3:53,192.168.131.3:53,192.168.136.3:53,192.168.142.3:53,
> 192.168.4.3:53,192.168.45.3:53,192.168.46.3:53,192.168.47.3:53,
> 192.168.48.3:53,192.168.49.3:53,192.168.50.3:53,192.168.51.3:53,
> 192.168.52.3:53,192.168.53.3:53,192.168.54.3:53,192.168.55.3:53,
> 192.168.56.3:53,192.168.57.3:53,192.168.58.3:53,192.168.59.4:53,
> 192.168.60.4:53,192.168.61.3:53,192.168.62.3:53,192.168.63.3:53,
> 192.168.64.3:53,192.168.65.3:53,192.168.66.3:53,192.168.67.3:53,
> 192.168.68.4:53,192.168.69.3:53,192.168.70.3:53,192.168.71.3:53,
> 192.168.72.3:53,192.168.73.3:53,192.168.74.4:53,192.168.75.4:53,
> 192.168.76.3:53,192.168.77.3:53,192.168.78.3:53,192.168.79.3:53,
> 192.168.80.4:53,192.168.81.3:53,192.168.82.4:53,192.168.83.4:53
> tcp
> 10.96.0.10:9153  192.168.104.3:9153,192.168.105.3:9153,
> 192.168.106.3:9153,192.168.107.3:9153,192.168.108.3:9153,
> 192.168.109.3:9153,192.168.110.3:9153,192.168.111.3:9153,
> 192.168.112.3:9153,192.168.113.3:9153,192.168.114.3:9153,
> 192.168.115.3:9153,192.168.116.3:9153,192.168.118.4:9153,
> 192.168.119.3:9153,192.168.120.4:9153,192.168.121.3:9153,
> 192.168.122.3:9153,192.168.123.3:9153,192.168.130.3:9153,
> 192.168.131.3:9153,192.168.136.3:9153,192.168.142.3:9153,192.168.4.3:9153,
> 192.168.45.3:9153,192.168.46.3:9153,192.168.47.3:9153,192.168.48.3:9153,
> 192.168.49.3:9153,192.168.50.3:9153,192.168.51.3:9153,192.168.52.3:9153,
> 192.168.53.3:9153,192.168.54.3:9153,192.168.55.3:9153,192.168.56.3:9153,
> 192.168.57.3:9153,192.168.58.3:9153,192.168.59.4:9153,192.168.60.4:9153,
> 192.168.61.3:9153,192.168.62.3:9153,192.168.63.3:9153,192.168.64.3:9153,
> 192.168.65.3:9153,192.168.66.3:9153,192.168.67.3:9153,192.168.68.4:9153,
> 192.168.69.3:9153,192.168.70.3:9153,192.168.71.3:9153,192.168.72.3:9153,
> 192.168.73.3:9153,192.168.74.4:9153,192.168.75.4:9153,192.168.76.3:9153,
> 192.168.77.3:9153,192.168.78.3:9153,192.168.79.3:9153,192.168.80.4:9153,
> 192.168.81.3:9153,192.168.82.4:9153,192.168.83.4:9153
>
>>
>> Thanks,
>> Dumitru
>>
>> > Please let me know if you need more information.
>> >
>> >
>> > Regards,
>> > Winson
>> >
>> >
>> > On Thu, Jul 16, 2020 at 11:23 AM Dumitru Ceara 

Re: [ovs-discuss] OVN Scale with RAFT: how to make raft cluster clients to balanced state again

2020-08-05 Thread Girish Moodalbail
On Wed, Aug 5, 2020 at 5:23 PM Han Zhou  wrote:

>
>
> On Wed, Aug 5, 2020 at 4:35 PM Girish Moodalbail 
> wrote:
>
>>
>>
>> On Wed, Aug 5, 2020 at 3:05 PM Han Zhou  wrote:
>>
>>>
>>>
>>> On Wed, Aug 5, 2020 at 12:51 PM Winson Wang 
>>> wrote:
>>>
 Hello OVN Experts:

 With large scale ovn-k8s cluster,  there are several conditions that
 would make ovn-controller clients connect SB central from a balanced state
 to an unbalanced state.
 Is there an ongoing project to address this problem?
 If not,  I have one proposal not sure if it is doable.
 Please share your thoughts.

 The issue:

 OVN SB RAFT 3 node cluster,  at first all the ovn-controller clients
 will connect all the 3 nodes in a balanced state.

 The following conditions will make the connections become unbalanced.

-

One RAFT node restart,  all the ovn-controller clients to reconnect
to the two remaining cluster nodes.


-

Ovn-k8s,  after SB raft pods rolling upgrade, the last raft pod has
no client connections.


 RAFT clients in an unbalanced state would trigger more stress to the
 raft cluster,  which makes the raft unstable under stress compared to a
 balanced state.
 The proposal solution:

 Ovn-controller adds next unix commands “reconnect” with argument of
 preferred SB node IP.

 When unbalanced state happens,  the UNIX command can trigger
 ovn-controller reconnect

 To new SB raft node with fast sync which doesn’t trigger the whole DB
 downloading process.


>>> Thanks Winson. The proposal sounds good to me. Will you implement it?
>>>
>>
>> Han/Winson,
>>
>> The fast re-sync is for ovsdb-server restart and it will not apply for
>> ovn-controller restart, right?
>>
>>
> Right, but the proposal is to provide a command just to reconnect, without
> restarting. In that case fast-resync should work.
>
>
>> If the ovsdb-client (ovn-controller) restarts, then it would have lost
>> all its state and when it starts again it will still need to download
>> logical_flows, port_bindings , and other tables it cares about. So, fast
>> re-sync may not apply to this case.
>>
>> Also, the ovn-controller should stash the IP address of the SB server to
>> which it is connected to in Open_vSwitch table's external_id column. It
>> updates this field whenever it re-connects to a different SB server
>> (because that ovsdb-server instance failed or restarted). When
>> ovn-controller itself restarts it could check for the value in this field
>> and try to connect to it first and on failure fallback to connect to
>> default connection approach.
>>
>
> The imbalance is usually caused by failover on server side. When one
> server is down, all clients are expected to connect to the rest of the
> servers, and when the server is back, there is no motivation for the
> clients to reconnect again (unless you purposely restart the clients, which
> would bring 1/3 of the restarted clients back to the old server). So I
> don't understand how "stash the IP address" would work in this scenario.
>
> The proposal above by Winson is to purposely trigger a reconnection
> towards the desired server without restarting the clients, which I think
> solves this problem directly.
>

Right. This is what we discussed internally, however when I read this email
on the list I got confused with the other thread (rolling update of
ovn-controller in K8s cluster which involves restart of ovn-controller).
Sorry, for the noise.

Regards,
~Girish
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN Scale with RAFT: how to make raft cluster clients to balanced state again

2020-08-05 Thread Winson Wang
Hi Han,


On Wed, Aug 5, 2020 at 3:05 PM Han Zhou  wrote:

>
>
> On Wed, Aug 5, 2020 at 12:51 PM Winson Wang 
> wrote:
>
>> Hello OVN Experts:
>>
>> With large scale ovn-k8s cluster,  there are several conditions that
>> would make ovn-controller clients connect SB central from a balanced state
>> to an unbalanced state.
>> Is there an ongoing project to address this problem?
>> If not,  I have one proposal not sure if it is doable.
>> Please share your thoughts.
>>
>> The issue:
>>
>> OVN SB RAFT 3 node cluster,  at first all the ovn-controller clients will
>> connect all the 3 nodes in a balanced state.
>>
>> The following conditions will make the connections become unbalanced.
>>
>>-
>>
>>One RAFT node restart,  all the ovn-controller clients to reconnect
>>to the two remaining cluster nodes.
>>
>>
>>-
>>
>>Ovn-k8s,  after SB raft pods rolling upgrade, the last raft pod has
>>no client connections.
>>
>>
>> RAFT clients in an unbalanced state would trigger more stress to the raft
>> cluster,  which makes the raft unstable under stress compared to a balanced
>> state.
>> The proposal solution:
>>
>> Ovn-controller adds next unix commands “reconnect” with argument of
>> preferred SB node IP.
>>
>> When unbalanced state happens,  the UNIX command can trigger
>> ovn-controller reconnect
>>
>> To new SB raft node with fast sync which doesn’t trigger the whole DB
>> downloading process.
>>
>>
> Thanks Winson. The proposal sounds good to me. Will you implement it?
>

Thanks for reviewing my proposal solution.
I am hoping someone from the OVN team who more familiar with the
ovn-controller code to deliver the feature if possible:).

Regards,
Winson


> Han
>
>
>
>>
>> --
>> Winson
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "ovn-kubernetes" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to ovn-kubernetes+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS--iOW0LxxtkOhJpRT49E-9bJVy0iXraC1LMDUWeu6kLA%40mail.gmail.com
>> 
>> .
>>
>

-- 
Winson
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN Scale with RAFT: how to make raft cluster clients to balanced state again

2020-08-05 Thread Han Zhou
On Wed, Aug 5, 2020 at 3:59 PM Tony Liu  wrote:

> Sorry for hijacking this thread, I'd like to get some clarifications.
>
> How is the initial balanced state established, say 100 ovn-controllers
> connecting to 3 ovn-sb-db?
>
> The ovn-controller by default randomly connects to any servers specified
in the connection method, e.g. tcp::6642,tcp:6642,tcp::6643.
(Please see ovsdb(7) for details on "Connection Method".)

So initially it is balanced.


> The ovn-controller doesn't have to connect to the leader of ovn-sb-db,
> does it? In case it connects to the follower, the write request still
> needs to be forwarded to the leader, right?
>
> These logs keep showing up.
> 
> 2020-08-05T22:48:33.141Z|103607|reconnect|INFO|tcp:10.6.20.84:6642:
> connecting...
> 2020-08-05T22:48:33.151Z|103608|reconnect|INFO|tcp:127.0.0.1:6640:
> connected
> 2020-08-05T22:48:33.151Z|103609|reconnect|INFO|tcp:10.6.20.84:6642:
> connected
> 2020-08-05T22:48:33.159Z|103610|main|INFO|OVNSB commit failed, force
> recompute next time.
> 2020-08-05T22:48:33.161Z|103611|ovsdb_idl|INFO|tcp:10.6.20.84:6642:
> clustered database server is disconnected from cluster; trying another
> server
> 2020-08-05T22:48:33.161Z|103612|reconnect|INFO|tcp:10.6.20.84:6642:
> connection attempt timed out
> 2020-08-05T22:48:33.161Z|103613|reconnect|INFO|tcp:10.6.20.84:6642:
> waiting 2 seconds before reconnect
> 
> What's that "clustered database server is disconnected from cluster" mean?
>
> It means the server is part of a cluster, but it is disconnected from the
cluster, e.g. due to network partitioning, or overloaded and lost
heartbeat, or the cluster lost quorum and there is no leader elected.
If you use a clustered DB, it's better to set the connect method to all
servers (or you can use a LB VIP that points to all servers), instead of
only specifying a single server, which doesn't provide desired HA.


>
> Thanks!
>
> Tony
>
>
> > -Original Message-
> > From: discuss  On Behalf Of Han
> > Zhou
> > Sent: Wednesday, August 5, 2020 3:05 PM
> > To: Winson Wang 
> > Cc: winson wang ; ovn-kuberne...@googlegroups.com;
> > ovs-discuss@openvswitch.org
> > Subject: Re: [ovs-discuss] OVN Scale with RAFT: how to make raft cluster
> > clients to balanced state again
> >
> >
> >
> > On Wed, Aug 5, 2020 at 12:51 PM Winson Wang  > <mailto:windson.w...@gmail.com> > wrote:
> >
> >
> >   Hello OVN Experts:
> >
> >   With large scale ovn-k8s cluster,  there are several conditions
> > that would make ovn-controller clients connect SB central from a
> > balanced state to an unbalanced state.
> >
> >   Is there an ongoing project to address this problem?
> >   If not,  I have one proposal not sure if it is doable.
> >   Please share your thoughts.
> >
> >   The issue:
> >
> >   OVN SB RAFT 3 node cluster,  at first all the ovn-controller
> > clients will connect all the 3 nodes in a balanced state.
> >
> >   The following conditions will make the connections become
> > unbalanced.
> >
> >   *   One RAFT node restart,  all the ovn-controller clients to
> > reconnect to the two remaining cluster nodes.
> >
> >   *   Ovn-k8s,  after SB raft pods rolling upgrade, the last raft
> > pod has no client connections.
> >
> >
> >   RAFT clients in an unbalanced state would trigger more stress to
> > the raft cluster,  which makes the raft unstable under stress compared
> > to a balanced state.
> >
> >
> >   The proposal solution:
> >
> >
> >
> >   Ovn-controller adds next unix commands “reconnect” with argument of
> > preferred SB node IP.
> >
> >   When unbalanced state happens,  the UNIX command can trigger ovn-
> > controller reconnect
> >
> >   To new SB raft node with fast sync which doesn’t trigger the whole
> > DB downloading process.
> >
> >
> >
> > Thanks Winson. The proposal sounds good to me. Will you implement it?
> >
> > Han
> >
> >
> >
> >
> >
> >   --
> >
> >   Winson
> >
> >
> >
> >   --
> >   You received this message because you are subscribed to the Google
> > Groups "ovn-kubernetes" group.
> >   To unsubscribe from this group and stop receiving emails from it,
> > send an email to ovn-kubernetes+unsubscr...@googlegroups.com
> > <mailto:ovn-kubernetes+unsubscr...@googlegroups.com> .
> >   To view this discussion on the web visit
> > https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS--
> > iOW0LxxtkOhJpRT49E-9bJVy0iXraC1LMDUWeu6kLA%40mail.gmail.com
> > <https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS--
> > iOW0LxxtkOhJpRT49E-
> > 9bJVy0iXraC1LMDUWeu6kLA%40mail.gmail.com?utm_medium=email_source=foo
> > ter> .
> >
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN Scale with RAFT: how to make raft cluster clients to balanced state again

2020-08-05 Thread Han Zhou
On Wed, Aug 5, 2020 at 4:35 PM Girish Moodalbail 
wrote:

>
>
> On Wed, Aug 5, 2020 at 3:05 PM Han Zhou  wrote:
>
>>
>>
>> On Wed, Aug 5, 2020 at 12:51 PM Winson Wang 
>> wrote:
>>
>>> Hello OVN Experts:
>>>
>>> With large scale ovn-k8s cluster,  there are several conditions that
>>> would make ovn-controller clients connect SB central from a balanced state
>>> to an unbalanced state.
>>> Is there an ongoing project to address this problem?
>>> If not,  I have one proposal not sure if it is doable.
>>> Please share your thoughts.
>>>
>>> The issue:
>>>
>>> OVN SB RAFT 3 node cluster,  at first all the ovn-controller clients
>>> will connect all the 3 nodes in a balanced state.
>>>
>>> The following conditions will make the connections become unbalanced.
>>>
>>>-
>>>
>>>One RAFT node restart,  all the ovn-controller clients to reconnect
>>>to the two remaining cluster nodes.
>>>
>>>
>>>-
>>>
>>>Ovn-k8s,  after SB raft pods rolling upgrade, the last raft pod has
>>>no client connections.
>>>
>>>
>>> RAFT clients in an unbalanced state would trigger more stress to the
>>> raft cluster,  which makes the raft unstable under stress compared to a
>>> balanced state.
>>> The proposal solution:
>>>
>>> Ovn-controller adds next unix commands “reconnect” with argument of
>>> preferred SB node IP.
>>>
>>> When unbalanced state happens,  the UNIX command can trigger
>>> ovn-controller reconnect
>>>
>>> To new SB raft node with fast sync which doesn’t trigger the whole DB
>>> downloading process.
>>>
>>>
>> Thanks Winson. The proposal sounds good to me. Will you implement it?
>>
>
> Han/Winson,
>
> The fast re-sync is for ovsdb-server restart and it will not apply for
> ovn-controller restart, right?
>
>
Right, but the proposal is to provide a command just to reconnect, without
restarting. In that case fast-resync should work.


> If the ovsdb-client (ovn-controller) restarts, then it would have lost all
> its state and when it starts again it will still need to download
> logical_flows, port_bindings , and other tables it cares about. So, fast
> re-sync may not apply to this case.
>
> Also, the ovn-controller should stash the IP address of the SB server to
> which it is connected to in Open_vSwitch table's external_id column. It
> updates this field whenever it re-connects to a different SB server
> (because that ovsdb-server instance failed or restarted). When
> ovn-controller itself restarts it could check for the value in this field
> and try to connect to it first and on failure fallback to connect to
> default connection approach.
>

The imbalance is usually caused by failover on server side. When one server
is down, all clients are expected to connect to the rest of the servers,
and when the server is back, there is no motivation for the clients to
reconnect again (unless you purposely restart the clients, which would
bring 1/3 of the restarted clients back to the old server). So I don't
understand how "stash the IP address" would work in this scenario.

The proposal above by Winson is to purposely trigger a reconnection towards
the desired server without restarting the clients, which I think solves
this problem directly.

Thanks,
Han


>
> Regards,
> ~Girish
>
>
>
>
>>
>> Han
>>
>>
>>
>>>
>>> --
>>> Winson
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "ovn-kubernetes" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to ovn-kubernetes+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS--iOW0LxxtkOhJpRT49E-9bJVy0iXraC1LMDUWeu6kLA%40mail.gmail.com
>>> 
>>> .
>>>
>> ___
>> discuss mailing list
>> disc...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>
> --
> You received this message because you are subscribed to the Google Groups
> "ovn-kubernetes" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ovn-kubernetes+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STTrZb%2BNo8%2B3%3DOJcMqd6T_1sS5bm-xnF6v_P4%2B2uqKtZAQ%40mail.gmail.com
> 
> .
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN Scale with RAFT: how to make raft cluster clients to balanced state again

2020-08-05 Thread Girish Moodalbail
On Wed, Aug 5, 2020 at 3:05 PM Han Zhou  wrote:

>
>
> On Wed, Aug 5, 2020 at 12:51 PM Winson Wang 
> wrote:
>
>> Hello OVN Experts:
>>
>> With large scale ovn-k8s cluster,  there are several conditions that
>> would make ovn-controller clients connect SB central from a balanced state
>> to an unbalanced state.
>> Is there an ongoing project to address this problem?
>> If not,  I have one proposal not sure if it is doable.
>> Please share your thoughts.
>>
>> The issue:
>>
>> OVN SB RAFT 3 node cluster,  at first all the ovn-controller clients will
>> connect all the 3 nodes in a balanced state.
>>
>> The following conditions will make the connections become unbalanced.
>>
>>-
>>
>>One RAFT node restart,  all the ovn-controller clients to reconnect
>>to the two remaining cluster nodes.
>>
>>
>>-
>>
>>Ovn-k8s,  after SB raft pods rolling upgrade, the last raft pod has
>>no client connections.
>>
>>
>> RAFT clients in an unbalanced state would trigger more stress to the raft
>> cluster,  which makes the raft unstable under stress compared to a balanced
>> state.
>> The proposal solution:
>>
>> Ovn-controller adds next unix commands “reconnect” with argument of
>> preferred SB node IP.
>>
>> When unbalanced state happens,  the UNIX command can trigger
>> ovn-controller reconnect
>>
>> To new SB raft node with fast sync which doesn’t trigger the whole DB
>> downloading process.
>>
>>
> Thanks Winson. The proposal sounds good to me. Will you implement it?
>

Han/Winson,

The fast re-sync is for ovsdb-server restart and it will not apply for
ovn-controller restart, right?

If the ovsdb-client (ovn-controller) restarts, then it would have lost all
its state and when it starts again it will still need to download
logical_flows, port_bindings , and other tables it cares about. So, fast
re-sync may not apply to this case.

Also, the ovn-controller should stash the IP address of the SB server to
which it is connected to in Open_vSwitch table's external_id column. It
updates this field whenever it re-connects to a different SB server
(because that ovsdb-server instance failed or restarted). When
ovn-controller itself restarts it could check for the value in this field
and try to connect to it first and on failure fallback to connect to
default connection approach.

Regards,
~Girish




>
> Han
>
>
>
>>
>> --
>> Winson
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "ovn-kubernetes" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to ovn-kubernetes+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS--iOW0LxxtkOhJpRT49E-9bJVy0iXraC1LMDUWeu6kLA%40mail.gmail.com
>> 
>> .
>>
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN Scale with RAFT: how to make raft cluster clients to balanced state again

2020-08-05 Thread Tony Liu
Sorry for hijacking this thread, I'd like to get some clarifications.

How is the initial balanced state established, say 100 ovn-controllers
connecting to 3 ovn-sb-db?

The ovn-controller doesn't have to connect to the leader of ovn-sb-db,
does it? In case it connects to the follower, the write request still
needs to be forwarded to the leader, right?

These logs keep showing up.

2020-08-05T22:48:33.141Z|103607|reconnect|INFO|tcp:10.6.20.84:6642: 
connecting...
2020-08-05T22:48:33.151Z|103608|reconnect|INFO|tcp:127.0.0.1:6640: connected
2020-08-05T22:48:33.151Z|103609|reconnect|INFO|tcp:10.6.20.84:6642: connected
2020-08-05T22:48:33.159Z|103610|main|INFO|OVNSB commit failed, force recompute 
next time.
2020-08-05T22:48:33.161Z|103611|ovsdb_idl|INFO|tcp:10.6.20.84:6642: clustered 
database server is disconnected from cluster; trying another server
2020-08-05T22:48:33.161Z|103612|reconnect|INFO|tcp:10.6.20.84:6642: connection 
attempt timed out
2020-08-05T22:48:33.161Z|103613|reconnect|INFO|tcp:10.6.20.84:6642: waiting 2 
seconds before reconnect

What's that "clustered database server is disconnected from cluster" mean?


Thanks!

Tony


> -Original Message-
> From: discuss  On Behalf Of Han
> Zhou
> Sent: Wednesday, August 5, 2020 3:05 PM
> To: Winson Wang 
> Cc: winson wang ; ovn-kuberne...@googlegroups.com;
> ovs-discuss@openvswitch.org
> Subject: Re: [ovs-discuss] OVN Scale with RAFT: how to make raft cluster
> clients to balanced state again
> 
> 
> 
> On Wed, Aug 5, 2020 at 12:51 PM Winson Wang  <mailto:windson.w...@gmail.com> > wrote:
> 
> 
>   Hello OVN Experts:
> 
>   With large scale ovn-k8s cluster,  there are several conditions
> that would make ovn-controller clients connect SB central from a
> balanced state to an unbalanced state.
> 
>   Is there an ongoing project to address this problem?
>   If not,  I have one proposal not sure if it is doable.
>   Please share your thoughts.
> 
>   The issue:
> 
>   OVN SB RAFT 3 node cluster,  at first all the ovn-controller
> clients will connect all the 3 nodes in a balanced state.
> 
>   The following conditions will make the connections become
> unbalanced.
> 
>   *   One RAFT node restart,  all the ovn-controller clients to
> reconnect to the two remaining cluster nodes.
> 
>   *   Ovn-k8s,  after SB raft pods rolling upgrade, the last raft
> pod has no client connections.
> 
> 
>   RAFT clients in an unbalanced state would trigger more stress to
> the raft cluster,  which makes the raft unstable under stress compared
> to a balanced state.
> 
> 
>   The proposal solution:
> 
> 
> 
>   Ovn-controller adds next unix commands “reconnect” with argument of
> preferred SB node IP.
> 
>   When unbalanced state happens,  the UNIX command can trigger ovn-
> controller reconnect
> 
>   To new SB raft node with fast sync which doesn’t trigger the whole
> DB downloading process.
> 
> 
> 
> Thanks Winson. The proposal sounds good to me. Will you implement it?
> 
> Han
> 
> 
> 
> 
> 
>   --
> 
>   Winson
> 
> 
> 
>   --
>   You received this message because you are subscribed to the Google
> Groups "ovn-kubernetes" group.
>   To unsubscribe from this group and stop receiving emails from it,
> send an email to ovn-kubernetes+unsubscr...@googlegroups.com
> <mailto:ovn-kubernetes+unsubscr...@googlegroups.com> .
>   To view this discussion on the web visit
> https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS--
> iOW0LxxtkOhJpRT49E-9bJVy0iXraC1LMDUWeu6kLA%40mail.gmail.com
> <https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS--
> iOW0LxxtkOhJpRT49E-
> 9bJVy0iXraC1LMDUWeu6kLA%40mail.gmail.com?utm_medium=email_source=foo
> ter> .
> 

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN Scale with RAFT: how to make raft cluster clients to balanced state again

2020-08-05 Thread Han Zhou
On Wed, Aug 5, 2020 at 12:51 PM Winson Wang  wrote:

> Hello OVN Experts:
>
> With large scale ovn-k8s cluster,  there are several conditions that would
> make ovn-controller clients connect SB central from a balanced state to
> an unbalanced state.
> Is there an ongoing project to address this problem?
> If not,  I have one proposal not sure if it is doable.
> Please share your thoughts.
>
> The issue:
>
> OVN SB RAFT 3 node cluster,  at first all the ovn-controller clients will
> connect all the 3 nodes in a balanced state.
>
> The following conditions will make the connections become unbalanced.
>
>-
>
>One RAFT node restart,  all the ovn-controller clients to reconnect to
>the two remaining cluster nodes.
>
>
>-
>
>Ovn-k8s,  after SB raft pods rolling upgrade, the last raft pod has no
>client connections.
>
>
> RAFT clients in an unbalanced state would trigger more stress to the raft
> cluster,  which makes the raft unstable under stress compared to a balanced
> state.
> The proposal solution:
>
> Ovn-controller adds next unix commands “reconnect” with argument of
> preferred SB node IP.
>
> When unbalanced state happens,  the UNIX command can trigger
> ovn-controller reconnect
>
> To new SB raft node with fast sync which doesn’t trigger the whole DB
> downloading process.
>
>
Thanks Winson. The proposal sounds good to me. Will you implement it?

Han



>
> --
> Winson
>
> --
> You received this message because you are subscribed to the Google Groups
> "ovn-kubernetes" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ovn-kubernetes+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS--iOW0LxxtkOhJpRT49E-9bJVy0iXraC1LMDUWeu6kLA%40mail.gmail.com
> 
> .
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] OVN Scale with RAFT: how to make raft cluster clients to balanced state again

2020-08-05 Thread Winson Wang
Hello OVN Experts:

With large scale ovn-k8s cluster,  there are several conditions that would
make ovn-controller clients connect SB central from a balanced state to
an unbalanced state.
Is there an ongoing project to address this problem?
If not,  I have one proposal not sure if it is doable.
Please share your thoughts.

The issue:

OVN SB RAFT 3 node cluster,  at first all the ovn-controller clients will
connect all the 3 nodes in a balanced state.

The following conditions will make the connections become unbalanced.

   -

   One RAFT node restart,  all the ovn-controller clients to reconnect to
   the two remaining cluster nodes.


   -

   Ovn-k8s,  after SB raft pods rolling upgrade, the last raft pod has no
   client connections.


RAFT clients in an unbalanced state would trigger more stress to the raft
cluster,  which makes the raft unstable under stress compared to a balanced
state.
The proposal solution:

Ovn-controller adds next unix commands “reconnect” with argument of
preferred SB node IP.

When unbalanced state happens,  the UNIX command can trigger ovn-controller
reconnect

To new SB raft node with fast sync which doesn’t trigger the whole DB
downloading process.


-- 
Winson
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN scale

2020-08-04 Thread Tony Liu
Hi,

Continue this thread with some updates.

I finally got 4096 networks and 256 router created, 16 networks connecting
to each router. All routers are set as external gateway.

On underlay, those 256 gateway addresses on the provider network are
reachable. Ping is steady.

I launched 10 VMs on one compute node. One of them failed because network
allocation failed. Didn't look into it.

When ping from underlay to VM, it's bumpy. There is 1s or 2s delay about
every 10 pings.

Can't launch any more VMs. It always fails.

One of the Neutron node is very busy. From the logging on INFO level,
it just keeps connecting to OVN.

The active ovn-northd is busy, but all ovn-nb-db and ovn-sb-db are not.

On compute node, ovn-controller is very busy. It keeps saying
"commit failed".

2020-08-05T02:44:23.927Z|04125|reconnect|INFO|tcp:10.6.20.84:6642: connected
2020-08-05T02:44:23.936Z|04126|main|INFO|OVNSB commit failed, force recompute 
next time.
2020-08-05T02:44:23.938Z|04127|ovsdb_idl|INFO|tcp:10.6.20.84:6642: clustered 
database server is disconnected from cluster; trying another server
2020-08-05T02:44:23.939Z|04128|reconnect|INFO|tcp:10.6.20.84:6642: connection 
attempt timed out
2020-08-05T02:44:23.939Z|04129|reconnect|INFO|tcp:10.6.20.84:6642: waiting 2 
seconds before reconnect


The connection to local OVSDB keeps being dropped, because no probe
response. The probe interval is set to 30s already.

2020-08-05T02:47:15.437Z|04351|poll_loop|INFO|wakeup due to [POLLIN] on fd 20 
(10.6.20.22:42362<->10.6.20.86:6642) at lib/stream-fd.c:157 (100% CPU usage)
2020-08-05T02:47:15.438Z|04352|reconnect|WARN|tcp:127.0.0.1:6640: connection 
dropped (Broken pipe)
2020-08-05T02:47:15.438Z|04353|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt:
 connecting...
2020-08-05T02:47:15.449Z|04354|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt:
 connected


Also error about localnet port.

2020-08-05T02:47:15.403Z|04345|patch|ERR|bridge not found for localnet port 
'provnet-006baf64-409d-434d-b95b-017a77969b55' with network name 'physnet1'


First of all, this kind of scale should work fine, right?

Any advices how to look into it?


Thanks!

Tony

> -Original Message-
> From: dev  On Behalf Of Tony Liu
> Sent: Monday, July 27, 2020 10:16 AM
> To: Han Zhou 
> Cc: ovs-...@openvswitch.org; ovs-discuss@openvswitch.org
> Subject: Re: [ovs-dev] [ovs-discuss] OVN scale
> 
> Hi Han,
> 
> Just some updates here.
> 
> I tried with 4K networks on single router. Configuration was done
> without any issues. I checked both nb-db and sb-db, they all look good.
> It's just that router configuration is huge (in Neutron DB, nb-db and
> flow table in sb-db), because it contains all 4K ports. Also, the
> pipeline of router datapath in sb-db is quite big.
> 
> I see ovn-northd master and sb-db leader are busy, taking 90+% CPU.
> There are only 3 compute nodes and 2 gateway nodes. Does that monitor
> setting "ovn-monitor-all" matters in such case? Any idea what they are
> busy with, without any configuration updates from OpenStack? The nb-db
> is not busy though.
> 
> Probably because nb-db is busy, ovn-controller can't connect to it
> consistently. It keeps being disconnected and reconnecting. Restarting
> ovn-controller seems help. I am able to launch a few VMs on different
> networks and they are connected via the router.
> 
> Now, I have problem on external access. The router is set as gateway to
> a provider/underlay network on an interface on the gateway node. The
> router is allocated an underlay address from that provider network. My
> understanding is that, the br-ex on gateway node holding the active
> router will broadcast ARP to announce that router underlay address in
> case of failover. Also, it will respond ARP request for that router
> underlay address. But when I run tcpdump on that underlay interface on
> gateway node, I see ARP request coming in, but no ARP response going out.
> I checked the flow table in sb-db, it seems ok. I also checked flow on
> br-ex by "ovs-ofctl dump-flows br-ex", I don't see anything about ARP
> there.
> How should I look into it?
> 
> Again, the case is to support 4K networks with external access (security
> group is disabled), 4K routers (one for each network), 50 routers (one
> for 80 networks), 1 router (for all 4K networks)...
> All networks are isolated by ACL on the logical router. Which option
> should work better?
> Any comment is appreciated.
> 
> 
> Thanks!
> 
> Tony
> 
> 
> 
> From: discuss  on behalf of Tony
> Liu 
> Sent: July 21, 2020 09:09 PM
> To: Daniel Alvarez 
> Cc: ovs-discuss@openvswitch.org 
> Subject: Re: [ovs-discuss] OVN scale
> 
> [root@

Re: [ovs-discuss] OVN scale

2020-07-28 Thread Han Zhou
On Mon, Jul 27, 2020 at 10:16 AM Tony Liu  wrote:

> Hi Han,
>
> Just some updates here.
>
> I tried with 4K networks on single router. Configuration was done without
> any issues. I checked both
> nb-db and sb-db, they all look good. It's just that router configuration
> is huge (in Neutron DB, nb-db
> and flow table in sb-db), because it contains all 4K ports. Also, the
> pipeline of router datapath in sb-db
> is quite big.
>
> I see ovn-northd master and sb-db leader are busy, taking 90+% CPU. There
> are only 3 compute nodes
> and 2 gateway nodes. Does that monitor setting "ovn-monitor-all" matters
> in such case? Any idea what
> they are busy with, without any configuration updates from OpenStack? The
> nb-db is not busy though.
>

Did you create logical switch ports in your test? Did you do port-binding
on compute nodes? If yes, then "ovn-monitor-all" would matter, since all
networks are connected to the same router. With "ovn-monitor-all" = true,
it would avoid the huge monitor condition change messages.

Normally, if there is no NB-DB change, all components should be idle.


> Probably because nb-db is busy, ovn-controller can't connect to it
> consistently. It keeps being
> disconnected and reconnecting. Restarting ovn-controller seems help. I am
> able to launch a few VMs
> on different networks and they are connected via the router.
>
> If you are seeing ovn-controller disconnected to sb-db due to probe
timeout, you can disable/adjust the probe interval. See this slide:
https://www.slideshare.net/hanzhou1978/large-scale-overlay-networks-with-ovn-problems-and-solutions/16



> Now, I have problem on external access. The router is set as gateway to a
> provider/underlay network
> on an interface on the gateway node. The router is allocated an underlay
> address from that provider
> network. My understanding is that, the br-ex on gateway node holding the
> active router will broadcast
> ARP to announce that router underlay address in case of failover. Also, it
> will respond ARP request for
> that router underlay address. But when I run tcpdump on that underlay
> interface on gateway node,
> I see ARP request coming in, but no ARP response going out. I checked the
> flow table in sb-db, it seems
> ok. I also checked flow on br-ex by "ovs-ofctl dump-flows br-ex", I don't
> see anything about ARP there.
> How should I look into it?
>

"br-ex" is not managed by OVN, so you won't see any flows there. Did you
use OpenStack commands to setup the gateway? Did you see port-binding of
the gateway port in SB DB?


> Again, the case is to support 4K networks with external access (security
> group is disabled),
> 4K routers (one for each network), 50 routers (one for 80 networks), 1
> router (for all 4K networks)...
> All networks are isolated by ACL on the logical router. Which option
> should work better?
> Any comment is appreciated.
>
>
If the 4K networks don't need to communicate with each other, then what
would scale the best (in theory) is: 4K routers (one for each network) with
ovn-monitor-all=false. This way, each HV only need to process a small
proportion of the data. (the monitor condition change message should also
be small because each HV only monitor the networks that have related VMs on
the HV).

Thanks,
Han


> Thanks!
>
> Tony
>
>
> --
> *From:* discuss  on behalf of Tony
> Liu 
> *Sent:* July 21, 2020 09:09 PM
> *To:* Daniel Alvarez 
> *Cc:* ovs-discuss@openvswitch.org 
> *Subject:* Re: [ovs-discuss] OVN scale
>
> [root@ovn-db-2 ~]# ovn-nbctl list nb_global
> _uuid   : b7b3aa05-f7ed-4dbc-979f-10445ac325b8
> connections : []
> external_ids: {"neutron:liveness_check_at"="2020-07-22
> 04:03:17.726917+00:00"}
> hv_cfg  : 312
> ipsec   : false
> name: ""
> nb_cfg  : 2636
> options : {mac_prefix="ca:e8:07",
> svc_monitor_mac="4e:d0:3a:80:d4:b7"}
> sb_cfg  : 2005
> ssl : []
>
> [root@ovn-db-2 ~]# ovn-sbctl list sb_global
> _uuid   : 3720bc1d-b0da-47ce-85ca-96fa8d398489
> connections : []
> external_ids: {}
> ipsec   : false
> nb_cfg  : 312
> options : {mac_prefix="ca:e8:07",
> svc_monitor_mac="4e:d0:3a:80:d4:b7"}
> ssl             : []
>
> The NBDB and SBDB is definitely out of sync. Is there any way to force
> ovn-northd sync them?
>
> Thanks!
>
> Tony
>
> --
> *From:* Tony Liu 
> *Sent:* July 21, 2020 08:39 PM
> *To:* Daniel Alvarez 
> *Cc:*

Re: [ovs-discuss] OVN scale

2020-07-27 Thread Tony Liu
Hi Han,

Just some updates here.

I tried with 4K networks on single router. Configuration was done without any 
issues. I checked both
nb-db and sb-db, they all look good. It's just that router configuration is 
huge (in Neutron DB, nb-db
and flow table in sb-db), because it contains all 4K ports. Also, the pipeline 
of router datapath in sb-db
is quite big.

I see ovn-northd master and sb-db leader are busy, taking 90+% CPU. There are 
only 3 compute nodes
and 2 gateway nodes. Does that monitor setting "ovn-monitor-all" matters in 
such case? Any idea what
they are busy with, without any configuration updates from OpenStack? The nb-db 
is not busy though.

Probably because nb-db is busy, ovn-controller can't connect to it 
consistently. It keeps being
disconnected and reconnecting. Restarting ovn-controller seems help. I am able 
to launch a few VMs
on different networks and they are connected via the router.

Now, I have problem on external access. The router is set as gateway to a 
provider/underlay network
on an interface on the gateway node. The router is allocated an underlay 
address from that provider
network. My understanding is that, the br-ex on gateway node holding the active 
router will broadcast
ARP to announce that router underlay address in case of failover. Also, it will 
respond ARP request for
that router underlay address. But when I run tcpdump on that underlay interface 
on gateway node,
I see ARP request coming in, but no ARP response going out. I checked the flow 
table in sb-db, it seems
ok. I also checked flow on br-ex by "ovs-ofctl dump-flows br-ex", I don't see 
anything about ARP there.
How should I look into it?

Again, the case is to support 4K networks with external access (security group 
is disabled),
4K routers (one for each network), 50 routers (one for 80 networks), 1 router 
(for all 4K networks)...
All networks are isolated by ACL on the logical router. Which option should 
work better?
Any comment is appreciated.


Thanks!

Tony



From: discuss  on behalf of Tony Liu 

Sent: July 21, 2020 09:09 PM
To: Daniel Alvarez 
Cc: ovs-discuss@openvswitch.org 
Subject: Re: [ovs-discuss] OVN scale

[root@ovn-db-2 ~]# ovn-nbctl list nb_global
_uuid   : b7b3aa05-f7ed-4dbc-979f-10445ac325b8
connections : []
external_ids: {"neutron:liveness_check_at"="2020-07-22 
04:03:17.726917+00:00"}
hv_cfg  : 312
ipsec   : false
name: ""
nb_cfg  : 2636
options : {mac_prefix="ca:e8:07", 
svc_monitor_mac="4e:d0:3a:80:d4:b7"}
sb_cfg  : 2005
ssl : []

[root@ovn-db-2 ~]# ovn-sbctl list sb_global
_uuid   : 3720bc1d-b0da-47ce-85ca-96fa8d398489
connections : []
external_ids: {}
ipsec   : false
nb_cfg  : 312
options : {mac_prefix="ca:e8:07", 
svc_monitor_mac="4e:d0:3a:80:d4:b7"}
ssl : []

The NBDB and SBDB is definitely out of sync. Is there any way to force 
ovn-northd sync them?

Thanks!

Tony


From: Tony Liu 
Sent: July 21, 2020 08:39 PM
To: Daniel Alvarez 
Cc: Cory Hawkless ; ovs-discuss@openvswitch.org 
; Dumitru Ceara 
Subject: Re: [ovs-discuss] OVN scale

When create a network (and subnet) on OpenStack, a GW port and service port 
(for DHCP and metadata)
are also created. They are created in Neutron and onv-nb-db by ML2 driver. Then 
ovn-northd will translate
such update from NBDB to SBDB. My question here is that, with 20.03, is this 
translation incremental?

After created 4000 networks successfully on OpenStack, I see 4000 logical 
switches and 8000 LS ports
in NBDB. But in SBDB, there are only 1567 port-bindings. The break happened 
when translating 1568th
port. If ovn-northd recompiles the whole DB for every update, this problem can 
be explained. The DB is
too big for ovn-northd to compile in time, so all the followed updates are 
lost. Does it make sense?

I recall DB update is coordinated by some "version", like some changes happened 
in NBDB, the version
bumps up, ovn-northd update SBDB and bumps up version as well, so they match. 
So, if NBDB version
bumps up more than once while ovn-northd updating SBDB, is that still going to 
work? If yes, then it's
just matter of time, no matter how fast update happening in NBDB, ovn-northd 
will catch them up
eventually. Am I right about that?

Any comment is welcome.


Thanks!

Tony



From: Tony Liu 
Sent: July 21, 2020 10:22 AM
To: Daniel Alvarez 
Cc: Cory Hawkless ; ovs-discuss@openvswitch.org 
; Dumitru Ceara 
Subject: Re: [ovs-discuss] OVN scale

Hi Daniel, all

4000 networks and 50 routers, 200 networks on each router, they are all created.
CPU usage of Neutron server, ovn-nb-db, ovn-northd, ovn-sb-db, ovn-controller 
and ovs-vswitchd is OK,
n

Re: [ovs-discuss] OVN scale

2020-07-23 Thread Michał Nasiadka
Hi Tony,

I’m the core reviewer/developer behind OVN implementation in Kolla-Ansible.
Just to make some clarifications to the thread - yes Kolla-Ansible deploys a 
raft cluster.

I would be happy to see some results/reports from that OVN scale test - and if 
you would see any improvements/bugs we could resolve - please just let me know.

Best regards,

Michal

> On 21 Jul 2020, at 19:22, Tony Liu  wrote:
> 
> Hi Daniel, all
> 
> 4000 networks and 50 routers, 200 networks on each router, they are all 
> created.
> CPU usage of Neutron server, ovn-nb-db, ovn-northd, ovn-sb-db, ovn-controller 
> and ovs-vswitchd is OK,
> not consistently 100%, but still some spikes to it.
> 
> Now, when create VM, I got that "waiting for vif-plugged-in timeout". This 
> brings out another question,
> it used to be neutron-agent notifying Neutron server port status change, with 
> OVN, who does it?
> How should I look into this?
> 
> Please see my other comments Inline...
> 
> 
> Thanks!
> 
> Tony
> From: Daniel Alvarez mailto:dalva...@redhat.com>>
> Sent: July 21, 2020 12:06 AM
> To: Tony Liu mailto:tonyliu0...@hotmail.com>>
> Cc: Cory Hawkless mailto:c...@hawkless.id.au>>; 
> ovs-discuss@openvswitch.org <mailto:ovs-discuss@openvswitch.org> 
> mailto:ovs-discuss@openvswitch.org>>; Dumitru 
> Ceara mailto:dce...@redhat.com>>
> Subject: Re: [ovs-discuss] OVN scale
>  
> Hi Tony, all
> 
> 
> 
>> On 21 Jul 2020, at 07:53, Tony Liu > <mailto:tonyliu0...@hotmail.com>> wrote:
>> 
>> 
>> Hi Cory,
>> 
>> With 4000 networks all connecting to one router with external GW, all 
>> networks and router
>> are created and connected. I launched a few VMs on some networks, they are 
>> connected and
>> all have external connectivity. When running ping on VM, there is a slow 
>> ping (a few seconds)
>> out of 10+ normal pings (< 1ms). When checking CPU usage, I see Neutron 
>> server, OVN DB,
>> OVN controller and ovs-switchd all take almost 100% CPU. It's been like that 
>> for hours already.
>> Since they are all created and some of them work fine (didn't validate all 
>> networks), not sure
>> what those services are busy with. Checked log, the ovn-controller keep 
>> switching between
>> ovn-sb-db, because of heartbeat timeout.
>> 
> 
> How are you deploying OpenStack and in particular the OVN dbs? Is it RAFT 
> cluster?
> 
> > Kolla Ansible. I see cluster-local-address and remote address (to the first 
> > node)
> > is specified for all 3 nodes. I assume clustering is enabled.
> > Is there different type of cluster?
> 
> What’s your current value for ovn-remote-probe-interval? If it’s too low, 
> this can be triggering reconnections all the time and creating a snowball 
> effect.
> 
> > external_ids: {ovn-encap-ip="10.6.30.22", ovn-encap-type=geneve, 
> > ovn-remote="tcp:10.6.20.84:6642,tcp:10.6.20.85:6642,tcp:10.6.20.86:6642", 
> > ovn-remote-probe-interval="6", system-id="compute-3"}
> 
> You can bump the probe interval timeout like this:
> 
> ovs-vsctl set open . external_ids:ovn-remote-probe-interval=
> 
> 
>> I'd like know if that's expected, or something I can tune to fix the 
>> problem. If that's expected,
>> I can't think of anything other than building multiple clusters to support 
>> that kind of scale.
>> 
>> I am running test with 4000 networks with 50 routers, 80 networks on each 
>> router. Wondering
>> if that's going to help.
> 
> Reducing the number of routers should help. Also there are some improvements 
> in 20.06 release when it comes to the number of logical flows by a series of 
> patches from Han. I will post the links later, sorry.
> 
> Also there is a big improvement around large Port Groups as they are now 
> split by data path reducing dramatically the calculations in ovn-controller. 
> Specially in scenarios with a large number of networks like yours.
> However you seem to have no security groups and hence no Port Groups in the 
> NB database. Is this correct?
> 
> > Yes. For now, I want to avoid scale impact from SG, so I disable it.
> 
> Is there any chance you can re run the initial scenario but with 20.06?
> 
> > Is there container for 20.06? Or where I can get the packages of 20.06?
> >I should be able to upgrade 20.03 to 20.06 by upgrading packages.
>> 
>> The goal is to have thousands networks connecting to external. I'd like to 
>> know what's the
>> expected scale supported by current OVN.
> 
> +Dumitru as we know that there is a limit o

Re: [ovs-discuss] OVN scale

2020-07-21 Thread Tony Liu
[root@ovn-db-2 ~]# ovn-nbctl list nb_global
_uuid   : b7b3aa05-f7ed-4dbc-979f-10445ac325b8
connections : []
external_ids: {"neutron:liveness_check_at"="2020-07-22 
04:03:17.726917+00:00"}
hv_cfg  : 312
ipsec   : false
name: ""
nb_cfg  : 2636
options : {mac_prefix="ca:e8:07", 
svc_monitor_mac="4e:d0:3a:80:d4:b7"}
sb_cfg  : 2005
ssl : []

[root@ovn-db-2 ~]# ovn-sbctl list sb_global
_uuid   : 3720bc1d-b0da-47ce-85ca-96fa8d398489
connections : []
external_ids: {}
ipsec   : false
nb_cfg  : 312
options : {mac_prefix="ca:e8:07", 
svc_monitor_mac="4e:d0:3a:80:d4:b7"}
ssl : []

The NBDB and SBDB is definitely out of sync. Is there any way to force 
ovn-northd sync them?

Thanks!

Tony


From: Tony Liu 
Sent: July 21, 2020 08:39 PM
To: Daniel Alvarez 
Cc: Cory Hawkless ; ovs-discuss@openvswitch.org 
; Dumitru Ceara 
Subject: Re: [ovs-discuss] OVN scale

When create a network (and subnet) on OpenStack, a GW port and service port 
(for DHCP and metadata)
are also created. They are created in Neutron and onv-nb-db by ML2 driver. Then 
ovn-northd will translate
such update from NBDB to SBDB. My question here is that, with 20.03, is this 
translation incremental?

After created 4000 networks successfully on OpenStack, I see 4000 logical 
switches and 8000 LS ports
in NBDB. But in SBDB, there are only 1567 port-bindings. The break happened 
when translating 1568th
port. If ovn-northd recompiles the whole DB for every update, this problem can 
be explained. The DB is
too big for ovn-northd to compile in time, so all the followed updates are 
lost. Does it make sense?

I recall DB update is coordinated by some "version", like some changes happened 
in NBDB, the version
bumps up, ovn-northd update SBDB and bumps up version as well, so they match. 
So, if NBDB version
bumps up more than once while ovn-northd updating SBDB, is that still going to 
work? If yes, then it's
just matter of time, no matter how fast update happening in NBDB, ovn-northd 
will catch them up
eventually. Am I right about that?

Any comment is welcome.


Thanks!

Tony



From: Tony Liu 
Sent: July 21, 2020 10:22 AM
To: Daniel Alvarez 
Cc: Cory Hawkless ; ovs-discuss@openvswitch.org 
; Dumitru Ceara 
Subject: Re: [ovs-discuss] OVN scale

Hi Daniel, all

4000 networks and 50 routers, 200 networks on each router, they are all created.
CPU usage of Neutron server, ovn-nb-db, ovn-northd, ovn-sb-db, ovn-controller 
and ovs-vswitchd is OK,
not consistently 100%, but still some spikes to it.

Now, when create VM, I got that "waiting for vif-plugged-in timeout". This 
brings out another question,
it used to be neutron-agent notifying Neutron server port status change, with 
OVN, who does it?
How should I look into this?

Please see my other comments Inline...


Thanks!

Tony

From: Daniel Alvarez 
Sent: July 21, 2020 12:06 AM
To: Tony Liu 
Cc: Cory Hawkless ; ovs-discuss@openvswitch.org 
; Dumitru Ceara 
Subject: Re: [ovs-discuss] OVN scale

Hi Tony, all



On 21 Jul 2020, at 07:53, Tony Liu  wrote:


Hi Cory,

With 4000 networks all connecting to one router with external GW, all networks 
and router
are created and connected. I launched a few VMs on some networks, they are 
connected and
all have external connectivity. When running ping on VM, there is a slow ping 
(a few seconds)
out of 10+ normal pings (< 1ms). When checking CPU usage, I see Neutron server, 
OVN DB,
OVN controller and ovs-switchd all take almost 100% CPU. It's been like that 
for hours already.
Since they are all created and some of them work fine (didn't validate all 
networks), not sure
what those services are busy with. Checked log, the ovn-controller keep 
switching between
ovn-sb-db, because of heartbeat timeout.


How are you deploying OpenStack and in particular the OVN dbs? Is it RAFT 
cluster?

> Kolla Ansible. I see cluster-local-address and remote address (to the first 
> node)
> is specified for all 3 nodes. I assume clustering is enabled.
> Is there different type of cluster?

What’s your current value for ovn-remote-probe-interval? If it’s too low, this 
can be triggering reconnections all the time and creating a snowball effect.

> external_ids: {ovn-encap-ip="10.6.30.22", ovn-encap-type=geneve, 
> ovn-remote="tcp:10.6.20.84:6642,tcp:10.6.20.85:6642,tcp:10.6.20.86:6642", 
> ovn-remote-probe-interval="6", system-id="compute-3"}

You can bump the probe interval timeout like this:

ovs-vsctl set open . external_ids:ovn-remote-probe-interval=


I'd like know if that's expected, or something I can tune to fix the problem. 

Re: [ovs-discuss] OVN scale

2020-07-21 Thread Tony Liu
When create a network (and subnet) on OpenStack, a GW port and service port 
(for DHCP and metadata)
are also created. They are created in Neutron and onv-nb-db by ML2 driver. Then 
ovn-northd will translate
such update from NBDB to SBDB. My question here is that, with 20.03, is this 
translation incremental?

After created 4000 networks successfully on OpenStack, I see 4000 logical 
switches and 8000 LS ports
in NBDB. But in SBDB, there are only 1567 port-bindings. The break happened 
when translating 1568th
port. If ovn-northd recompiles the whole DB for every update, this problem can 
be explained. The DB is
too big for ovn-northd to compile in time, so all the followed updates are 
lost. Does it make sense?

I recall DB update is coordinated by some "version", like some changes happened 
in NBDB, the version
bumps up, ovn-northd update SBDB and bumps up version as well, so they match. 
So, if NBDB version
bumps up more than once while ovn-northd updating SBDB, is that still going to 
work? If yes, then it's
just matter of time, no matter how fast update happening in NBDB, ovn-northd 
will catch them up
eventually. Am I right about that?

Any comment is welcome.


Thanks!

Tony



From: Tony Liu 
Sent: July 21, 2020 10:22 AM
To: Daniel Alvarez 
Cc: Cory Hawkless ; ovs-discuss@openvswitch.org 
; Dumitru Ceara 
Subject: Re: [ovs-discuss] OVN scale

Hi Daniel, all

4000 networks and 50 routers, 200 networks on each router, they are all created.
CPU usage of Neutron server, ovn-nb-db, ovn-northd, ovn-sb-db, ovn-controller 
and ovs-vswitchd is OK,
not consistently 100%, but still some spikes to it.

Now, when create VM, I got that "waiting for vif-plugged-in timeout". This 
brings out another question,
it used to be neutron-agent notifying Neutron server port status change, with 
OVN, who does it?
How should I look into this?

Please see my other comments Inline...


Thanks!

Tony

From: Daniel Alvarez 
Sent: July 21, 2020 12:06 AM
To: Tony Liu 
Cc: Cory Hawkless ; ovs-discuss@openvswitch.org 
; Dumitru Ceara 
Subject: Re: [ovs-discuss] OVN scale

Hi Tony, all



On 21 Jul 2020, at 07:53, Tony Liu  wrote:


Hi Cory,

With 4000 networks all connecting to one router with external GW, all networks 
and router
are created and connected. I launched a few VMs on some networks, they are 
connected and
all have external connectivity. When running ping on VM, there is a slow ping 
(a few seconds)
out of 10+ normal pings (< 1ms). When checking CPU usage, I see Neutron server, 
OVN DB,
OVN controller and ovs-switchd all take almost 100% CPU. It's been like that 
for hours already.
Since they are all created and some of them work fine (didn't validate all 
networks), not sure
what those services are busy with. Checked log, the ovn-controller keep 
switching between
ovn-sb-db, because of heartbeat timeout.


How are you deploying OpenStack and in particular the OVN dbs? Is it RAFT 
cluster?

> Kolla Ansible. I see cluster-local-address and remote address (to the first 
> node)
> is specified for all 3 nodes. I assume clustering is enabled.
> Is there different type of cluster?

What’s your current value for ovn-remote-probe-interval? If it’s too low, this 
can be triggering reconnections all the time and creating a snowball effect.

> external_ids: {ovn-encap-ip="10.6.30.22", ovn-encap-type=geneve, 
> ovn-remote="tcp:10.6.20.84:6642,tcp:10.6.20.85:6642,tcp:10.6.20.86:6642", 
> ovn-remote-probe-interval="6", system-id="compute-3"}

You can bump the probe interval timeout like this:

ovs-vsctl set open . external_ids:ovn-remote-probe-interval=


I'd like know if that's expected, or something I can tune to fix the problem. 
If that's expected,
I can't think of anything other than building multiple clusters to support that 
kind of scale.

I am running test with 4000 networks with 50 routers, 80 networks on each 
router. Wondering
if that's going to help.

Reducing the number of routers should help. Also there are some improvements in 
20.06 release when it comes to the number of logical flows by a series of 
patches from Han. I will post the links later, sorry.

Also there is a big improvement around large Port Groups as they are now split 
by data path reducing dramatically the calculations in ovn-controller. 
Specially in scenarios with a large number of networks like yours.
However you seem to have no security groups and hence no Port Groups in the NB 
database. Is this correct?

> Yes. For now, I want to avoid scale impact from SG, so I disable it.

Is there any chance you can re run the initial scenario but with 20.06?

> Is there container for 20.06? Or where I can get the packages of 20.06?
>I should be able to upgrade 20.03 to 20.06 by upgrading packages.

The goal is to have thousands networks connecting to external. I'd like to know 
what's th

Re: [ovs-discuss] OVN scale

2020-07-21 Thread Tony Liu
Hi Daniel, all

4000 networks and 50 routers, 200 networks on each router, they are all created.
CPU usage of Neutron server, ovn-nb-db, ovn-northd, ovn-sb-db, ovn-controller 
and ovs-vswitchd is OK,
not consistently 100%, but still some spikes to it.

Now, when create VM, I got that "waiting for vif-plugged-in timeout". This 
brings out another question,
it used to be neutron-agent notifying Neutron server port status change, with 
OVN, who does it?
How should I look into this?

Please see my other comments Inline...


Thanks!

Tony

From: Daniel Alvarez 
Sent: July 21, 2020 12:06 AM
To: Tony Liu 
Cc: Cory Hawkless ; ovs-discuss@openvswitch.org 
; Dumitru Ceara 
Subject: Re: [ovs-discuss] OVN scale

Hi Tony, all



On 21 Jul 2020, at 07:53, Tony Liu  wrote:


Hi Cory,

With 4000 networks all connecting to one router with external GW, all networks 
and router
are created and connected. I launched a few VMs on some networks, they are 
connected and
all have external connectivity. When running ping on VM, there is a slow ping 
(a few seconds)
out of 10+ normal pings (< 1ms). When checking CPU usage, I see Neutron server, 
OVN DB,
OVN controller and ovs-switchd all take almost 100% CPU. It's been like that 
for hours already.
Since they are all created and some of them work fine (didn't validate all 
networks), not sure
what those services are busy with. Checked log, the ovn-controller keep 
switching between
ovn-sb-db, because of heartbeat timeout.


How are you deploying OpenStack and in particular the OVN dbs? Is it RAFT 
cluster?

> Kolla Ansible. I see cluster-local-address and remote address (to the first 
> node)
> is specified for all 3 nodes. I assume clustering is enabled.
> Is there different type of cluster?

What’s your current value for ovn-remote-probe-interval? If it’s too low, this 
can be triggering reconnections all the time and creating a snowball effect.

> external_ids: {ovn-encap-ip="10.6.30.22", ovn-encap-type=geneve, 
> ovn-remote="tcp:10.6.20.84:6642,tcp:10.6.20.85:6642,tcp:10.6.20.86:6642", 
> ovn-remote-probe-interval="6", system-id="compute-3"}

You can bump the probe interval timeout like this:

ovs-vsctl set open . external_ids:ovn-remote-probe-interval=


I'd like know if that's expected, or something I can tune to fix the problem. 
If that's expected,
I can't think of anything other than building multiple clusters to support that 
kind of scale.

I am running test with 4000 networks with 50 routers, 80 networks on each 
router. Wondering
if that's going to help.

Reducing the number of routers should help. Also there are some improvements in 
20.06 release when it comes to the number of logical flows by a series of 
patches from Han. I will post the links later, sorry.

Also there is a big improvement around large Port Groups as they are now split 
by data path reducing dramatically the calculations in ovn-controller. 
Specially in scenarios with a large number of networks like yours.
However you seem to have no security groups and hence no Port Groups in the NB 
database. Is this correct?

> Yes. For now, I want to avoid scale impact from SG, so I disable it.

Is there any chance you can re run the initial scenario but with 20.06?

> Is there container for 20.06? Or where I can get the packages of 20.06?
>I should be able to upgrade 20.03 to 20.06 by upgrading packages.

The goal is to have thousands networks connecting to external. I'd like to know 
what's the
expected scale supported by current OVN.

+Dumitru as we know that there is a limit of 3000 in the number of re 
submissions. So having 3K routers connected to the public logical switch may 
hit this limitation. Please @Dumitru correct me if I’m wrong.

Any comment is welcome.


Thanks!

Tony


From: Cory Hawkless 
Sent: July 20, 2020 10:04 PM
To: Tony Liu ; ovs-discuss@openvswitch.org 

Subject: RE: OVN scale


I would expect to see 100% cpu utilisation on anything involved in the process 
of creating 4000 networks and routers but the question is for how long do you 
see high utilisation? Does it last for seconds, minutes, hours?

Do the resources actually get created after some period of time or is the 
process failing?



From: discuss [mailto:ovs-discuss-boun...@openvswitch.org] On Behalf Of Tony Liu
Sent: Tuesday, 21 July 2020 1:53 PM
To: ovs-discuss@openvswitch.org
Subject: [ovs-discuss] OVN scale



Hi folks,



​This is my first email here. Please let me know if there is any rule

or convention I need to follow. Don't want to break it.



I started with OpenStack Ussuri and OVN 20.03.0 recently and currently

running some scaling test. Searched around for scaling info and noticed

some improvements already presented, which is pretty cool.

Wondering that "incremental" by DDlog implemented yet?



With a 3-node OVN DB cluster and 3 compu

Re: [ovs-discuss] OVN scale

2020-07-21 Thread Daniel Alvarez
Hi Tony, all



> On 21 Jul 2020, at 07:53, Tony Liu  wrote:
> 
> 
> Hi Cory,
> 
> With 4000 networks all connecting to one router with external GW, all 
> networks and router
> are created and connected. I launched a few VMs on some networks, they are 
> connected and
> all have external connectivity. When running ping on VM, there is a slow ping 
> (a few seconds)
> out of 10+ normal pings (< 1ms). When checking CPU usage, I see Neutron 
> server, OVN DB,
> OVN controller and ovs-switchd all take almost 100% CPU. It's been like that 
> for hours already.
> Since they are all created and some of them work fine (didn't validate all 
> networks), not sure
> what those services are busy with. Checked log, the ovn-controller keep 
> switching between
> ovn-sb-db, because of heartbeat timeout.
> 

How are you deploying OpenStack and in particular the OVN dbs? Is it RAFT 
cluster?

What’s your current value for ovn-remote-probe-interval? If it’s too low, this 
can be triggering reconnections all the time and creating a snowball effect.

You can bump the probe interval timeout like this:

ovs-vsctl set open . external_ids:ovn-remote-probe-interval=


> I'd like know if that's expected, or something I can tune to fix the problem. 
> If that's expected,
> I can't think of anything other than building multiple clusters to support 
> that kind of scale.
> 
> I am running test with 4000 networks with 50 routers, 80 networks on each 
> router. Wondering
> if that's going to help.

Reducing the number of routers should help. Also there are some improvements in 
20.06 release when it comes to the number of logical flows by a series of 
patches from Han. I will post the links later, sorry.

Also there is a big improvement around large Port Groups as they are now split 
by data path reducing dramatically the calculations in ovn-controller. 
Specially in scenarios with a large number of networks like yours.
However you seem to have no security groups and hence no Port Groups in the NB 
database. Is this correct?

Is there any chance you can re run the initial scenario but with 20.06?

> 
> The goal is to have thousands networks connecting to external. I'd like to 
> know what's the
> expected scale supported by current OVN.

+Dumitru as we know that there is a limit of 3000 in the number of re 
submissions. So having 3K routers connected to the public logical switch may 
hit this limitation. Please @Dumitru correct me if I’m wrong.
> 
> Any comment is welcome.
> 
> 
> Thanks!
> 
> Tony
> 
> From: Cory Hawkless 
> Sent: July 20, 2020 10:04 PM
> To: Tony Liu ; ovs-discuss@openvswitch.org 
> 
> Subject: RE: OVN scale
>  
> I would expect to see 100% cpu utilisation on anything involved in the 
> process of creating 4000 networks and routers but the question is for how 
> long do you see high utilisation? Does it last for seconds, minutes, hours?
> Do the resources actually get created after some period of time or is the 
> process failing?
>  
> From: discuss [mailto:ovs-discuss-boun...@openvswitch.org] On Behalf Of Tony 
> Liu
> Sent: Tuesday, 21 July 2020 1:53 PM
> To: ovs-discuss@openvswitch.org
> Subject: [ovs-discuss] OVN scale
>  
> Hi folks,
>  
> ​This is my first email here. Please let me know if there is any rule
> or convention I need to follow. Don't want to break it.
>  
> I started with OpenStack Ussuri and OVN 20.03.0 recently and currently
> running some scaling test. Searched around for scaling info and noticed
> some improvements already presented, which is pretty cool.
> Wondering that "incremental" by DDlog implemented yet?
>  
> With a 3-node OVN DB cluster and 3 compute nodes (with OVN controller),
> I created 4000 networks from OpenStack, 4000 logical routers with
> external GW, add one network to each LR. Port security is disabled on
> all networks. Then I see ovn-northd, ovn-controller and ovs-switchd all
> take almost 100% CPU. Is this expected?
>  
> I revised solution and running test to have 4000 networks, 20 LRs and
> 200 networks on each LR. Will see if this makes any difference.
>  
> Is there any scaling and performance report with the latest OVN release
> as my reference?
>  
>  
> Thanks!
>  
> Tony
>  
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN scale

2020-07-20 Thread Tony Liu
Hi Cory,

With 4000 networks all connecting to one router with external GW, all networks 
and router
are created and connected. I launched a few VMs on some networks, they are 
connected and
all have external connectivity. When running ping on VM, there is a slow ping 
(a few seconds)
out of 10+ normal pings (< 1ms). When checking CPU usage, I see Neutron server, 
OVN DB,
OVN controller and ovs-switchd all take almost 100% CPU. It's been like that 
for hours already.
Since they are all created and some of them work fine (didn't validate all 
networks), not sure
what those services are busy with. Checked log, the ovn-controller keep 
switching between
ovn-sb-db, because of heartbeat timeout.

I'd like know if that's expected, or something I can tune to fix the problem. 
If that's expected,
I can't think of anything other than building multiple clusters to support that 
kind of scale.

I am running test with 4000 networks with 50 routers, 80 networks on each 
router. Wondering
if that's going to help.

The goal is to have thousands networks connecting to external. I'd like to know 
what's the
expected scale supported by current OVN.

Any comment is welcome.


Thanks!

Tony


From: Cory Hawkless 
Sent: July 20, 2020 10:04 PM
To: Tony Liu ; ovs-discuss@openvswitch.org 

Subject: RE: OVN scale


I would expect to see 100% cpu utilisation on anything involved in the process 
of creating 4000 networks and routers but the question is for how long do you 
see high utilisation? Does it last for seconds, minutes, hours?

Do the resources actually get created after some period of time or is the 
process failing?



From: discuss [mailto:ovs-discuss-boun...@openvswitch.org] On Behalf Of Tony Liu
Sent: Tuesday, 21 July 2020 1:53 PM
To: ovs-discuss@openvswitch.org
Subject: [ovs-discuss] OVN scale



Hi folks,



​This is my first email here. Please let me know if there is any rule

or convention I need to follow. Don't want to break it.



I started with OpenStack Ussuri and OVN 20.03.0 recently and currently

running some scaling test. Searched around for scaling info and noticed

some improvements already presented, which is pretty cool.

Wondering that "incremental" by DDlog implemented yet?



With a 3-node OVN DB cluster and 3 compute nodes (with OVN controller),

I created 4000 networks from OpenStack, 4000 logical routers with

external GW, add one network to each LR. Port security is disabled on

all networks. Then I see ovn-northd, ovn-controller and ovs-switchd all

take almost 100% CPU. Is this expected?



I revised solution and running test to have 4000 networks, 20 LRs and

200 networks on each LR. Will see if this makes any difference.



Is there any scaling and performance report with the latest OVN release

as my reference?





Thanks!



Tony


___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN scale

2020-07-20 Thread Cory Hawkless
I would expect to see 100% cpu utilisation on anything involved in the process 
of creating 4000 networks and routers but the question is for how long do you 
see high utilisation? Does it last for seconds, minutes, hours?
Do the resources actually get created after some period of time or is the 
process failing?

From: discuss [mailto:ovs-discuss-boun...@openvswitch.org] On Behalf Of Tony Liu
Sent: Tuesday, 21 July 2020 1:53 PM
To: ovs-discuss@openvswitch.org
Subject: [ovs-discuss] OVN scale

Hi folks,

​This is my first email here. Please let me know if there is any rule
or convention I need to follow. Don't want to break it.

I started with OpenStack Ussuri and OVN 20.03.0 recently and currently
running some scaling test. Searched around for scaling info and noticed
some improvements already presented, which is pretty cool.
Wondering that "incremental" by DDlog implemented yet?

With a 3-node OVN DB cluster and 3 compute nodes (with OVN controller),
I created 4000 networks from OpenStack, 4000 logical routers with
external GW, add one network to each LR. Port security is disabled on
all networks. Then I see ovn-northd, ovn-controller and ovs-switchd all
take almost 100% CPU. Is this expected?

I revised solution and running test to have 4000 networks, 20 LRs and
200 networks on each LR. Will see if this makes any difference.

Is there any scaling and performance report with the latest OVN release
as my reference?


Thanks!

Tony

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] OVN scale

2020-07-20 Thread Tony Liu
Hi folks,

​This is my first email here. Please let me know if there is any rule
or convention I need to follow. Don't want to break it.

I started with OpenStack Ussuri and OVN 20.03.0 recently and currently
running some scaling test. Searched around for scaling info and noticed
some improvements already presented, which is pretty cool.
Wondering that "incremental" by DDlog implemented yet?

With a 3-node OVN DB cluster and 3 compute nodes (with OVN controller),
I created 4000 networks from OpenStack, 4000 logical routers with
external GW, add one network to each LR. Port security is disabled on
all networks. Then I see ovn-northd, ovn-controller and ovs-switchd all
take almost 100% CPU. Is this expected?

I revised solution and running test to have 4000 networks, 20 LRs and
200 networks on each LR. Will see if this makes any difference.

Is there any scaling and performance report with the latest OVN release
as my reference?


Thanks!

Tony

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN Scale with RAFT: how to make ovn-northd more reliable when RAFT leader unstable

2020-07-17 Thread Winson Wang
On Fri, Jul 17, 2020 at 12:54 AM Dumitru Ceara  wrote:

> On 7/17/20 2:58 AM, Winson Wang wrote:
> > Hi Dumitru,
> >
> > most of the flows are in table 19.
>
> This is the ls_in_pre_hairpin table where we add flows for each backend
> of the load balancers.
>
> >
> > -rw-r--r-- 1 root root 142M Jul 16 17:07 br-int.txt (all flows dump file)
> > -rw-r--r-- 1 root root 102M Jul 16 17:43 table-19.txt
> > -rw-r--r-- 1 root root 7.8M Jul 16 17:43 table-11.txt
> > -rw-r--r-- 1 root root 3.7M Jul 16 17:43 table-21.txt
> >
> > # cat table-19.txt |wc -l
> > 408458
> > ]# cat table-19.txt | grep "=9153" | wc -l
> > 124744
> >  cat table-19.txt | grep "=53" | wc -l
> > 249488
> > Coredns pod has svc with port number 53 and 9153.
> >
>
> How many backends do you have for these VIPs (with port number 53 and
> 9153) in your load_balancer config?
>
backends number is 63 with 63 CoreDNS pods running with exposing Cluster IP
10.96.0.10.  tcp/53,  udp/53, tcp/9153.
lb-list  |grep 10.96.0.10
3b8a468a-44d2-4a34-94ca-626dac936cdeudp
10.96.0.10:53192.168.104.3:53,192.168.105.3:53,192.168.106.3:53,
192.168.107.3:53,192.168.108.3:53,192.168.109.3:53,192.168.110.3:53,
192.168.111.3:53,192.168.112.3:53,192.168.113.3:53,192.168.114.3:53,
192.168.115.3:53,192.168.116.3:53,192.168.118.4:53,192.168.119.3:53,
192.168.120.4:53,192.168.121.3:53,192.168.122.3:53,192.168.123.3:53,
192.168.130.3:53,192.168.131.3:53,192.168.136.3:53,192.168.142.3:53,
192.168.4.3:53,192.168.45.3:53,192.168.46.3:53,192.168.47.3:53,
192.168.48.3:53,192.168.49.3:53,192.168.50.3:53,192.168.51.3:53,
192.168.52.3:53,192.168.53.3:53,192.168.54.3:53,192.168.55.3:53,
192.168.56.3:53,192.168.57.3:53,192.168.58.3:53,192.168.59.4:53,
192.168.60.4:53,192.168.61.3:53,192.168.62.3:53,192.168.63.3:53,
192.168.64.3:53,192.168.65.3:53,192.168.66.3:53,192.168.67.3:53,
192.168.68.4:53,192.168.69.3:53,192.168.70.3:53,192.168.71.3:53,
192.168.72.3:53,192.168.73.3:53,192.168.74.4:53,192.168.75.4:53,
192.168.76.3:53,192.168.77.3:53,192.168.78.3:53,192.168.79.3:53,
192.168.80.4:53,192.168.81.3:53,192.168.82.4:53,192.168.83.4:53
tcp
10.96.0.10:53192.168.104.3:53,192.168.105.3:53,192.168.106.3:53,
192.168.107.3:53,192.168.108.3:53,192.168.109.3:53,192.168.110.3:53,
192.168.111.3:53,192.168.112.3:53,192.168.113.3:53,192.168.114.3:53,
192.168.115.3:53,192.168.116.3:53,192.168.118.4:53,192.168.119.3:53,
192.168.120.4:53,192.168.121.3:53,192.168.122.3:53,192.168.123.3:53,
192.168.130.3:53,192.168.131.3:53,192.168.136.3:53,192.168.142.3:53,
192.168.4.3:53,192.168.45.3:53,192.168.46.3:53,192.168.47.3:53,
192.168.48.3:53,192.168.49.3:53,192.168.50.3:53,192.168.51.3:53,
192.168.52.3:53,192.168.53.3:53,192.168.54.3:53,192.168.55.3:53,
192.168.56.3:53,192.168.57.3:53,192.168.58.3:53,192.168.59.4:53,
192.168.60.4:53,192.168.61.3:53,192.168.62.3:53,192.168.63.3:53,
192.168.64.3:53,192.168.65.3:53,192.168.66.3:53,192.168.67.3:53,
192.168.68.4:53,192.168.69.3:53,192.168.70.3:53,192.168.71.3:53,
192.168.72.3:53,192.168.73.3:53,192.168.74.4:53,192.168.75.4:53,
192.168.76.3:53,192.168.77.3:53,192.168.78.3:53,192.168.79.3:53,
192.168.80.4:53,192.168.81.3:53,192.168.82.4:53,192.168.83.4:53
tcp
10.96.0.10:9153  192.168.104.3:9153,192.168.105.3:9153,
192.168.106.3:9153,192.168.107.3:9153,192.168.108.3:9153,192.168.109.3:9153,
192.168.110.3:9153,192.168.111.3:9153,192.168.112.3:9153,192.168.113.3:9153,
192.168.114.3:9153,192.168.115.3:9153,192.168.116.3:9153,192.168.118.4:9153,
192.168.119.3:9153,192.168.120.4:9153,192.168.121.3:9153,192.168.122.3:9153,
192.168.123.3:9153,192.168.130.3:9153,192.168.131.3:9153,192.168.136.3:9153,
192.168.142.3:9153,192.168.4.3:9153,192.168.45.3:9153,192.168.46.3:9153,
192.168.47.3:9153,192.168.48.3:9153,192.168.49.3:9153,192.168.50.3:9153,
192.168.51.3:9153,192.168.52.3:9153,192.168.53.3:9153,192.168.54.3:9153,
192.168.55.3:9153,192.168.56.3:9153,192.168.57.3:9153,192.168.58.3:9153,
192.168.59.4:9153,192.168.60.4:9153,192.168.61.3:9153,192.168.62.3:9153,
192.168.63.3:9153,192.168.64.3:9153,192.168.65.3:9153,192.168.66.3:9153,
192.168.67.3:9153,192.168.68.4:9153,192.168.69.3:9153,192.168.70.3:9153,
192.168.71.3:9153,192.168.72.3:9153,192.168.73.3:9153,192.168.74.4:9153,
192.168.75.4:9153,192.168.76.3:9153,192.168.77.3:9153,192.168.78.3:9153,
192.168.79.3:9153,192.168.80.4:9153,192.168.81.3:9153,192.168.82.4:9153,
192.168.83.4:9153

>
> Thanks,
> Dumitru
>
> > Please let me know if you need more information.
> >
> >
> > Regards,
> > Winson
> >
> >
> > On Thu, Jul 16, 2020 at 11:23 AM Dumitru Ceara  > > wrote:
> >
> > On 7/15/20 8:02 PM, Winson Wang wrote:
> > > +add ovn-Kubernetes group.
> > >
> > > Hi Dumitru,
> > >
> > > With recent patches from you and Han,  now for k8s basic workload,
> > such
> > > node resources and pod resources are fixed and look good.

Re: [ovs-discuss] OVN Scale with RAFT: how to make ovn-northd more reliable when RAFT leader unstable

2020-07-17 Thread Dumitru Ceara
On 7/17/20 2:58 AM, Winson Wang wrote:
> Hi Dumitru,
> 
> most of the flows are in table 19.

This is the ls_in_pre_hairpin table where we add flows for each backend
of the load balancers.

> 
> -rw-r--r-- 1 root root 142M Jul 16 17:07 br-int.txt (all flows dump file)
> -rw-r--r-- 1 root root 102M Jul 16 17:43 table-19.txt
> -rw-r--r-- 1 root root 7.8M Jul 16 17:43 table-11.txt
> -rw-r--r-- 1 root root 3.7M Jul 16 17:43 table-21.txt
> 
> # cat table-19.txt |wc -l
> 408458
> ]# cat table-19.txt | grep "=9153" | wc -l
> 124744
>  cat table-19.txt | grep "=53" | wc -l
> 249488
> Coredns pod has svc with port number 53 and 9153.
> 

How many backends do you have for these VIPs (with port number 53 and
9153) in your load_balancer config?

Thanks,
Dumitru

> Please let me know if you need more information.
> 
> 
> Regards,
> Winson
> 
> 
> On Thu, Jul 16, 2020 at 11:23 AM Dumitru Ceara  > wrote:
> 
> On 7/15/20 8:02 PM, Winson Wang wrote:
> > +add ovn-Kubernetes group.
> >
> > Hi Dumitru,
> >
> > With recent patches from you and Han,  now for k8s basic workload,
> such
> > node resources and pod resources are fixed and look good.
> > Much thanks!
> 
> Hi Winson,
> 
> Glad to hear that!
> 
> >
> > For k8s workload which exposes as svc IP is every common,  for
> example,
> > the coreDNS pod's deployment.
> > With large cluster size such  as 1000,  there is service to auto scale
> > up coreDNS deployment,  if we use default 16 nodes per coredns, 
> it could be
> > 63 coredns pods.
> > On my 1006 nodes setup,  deployment from coreDNS from 2 to 63.
> > SB raft election 16s is not good for this operation in my test
> > environment, it makes one raft node cannot finish the election in two
> > election slot when making all it's
> > clients disconnect and reconnect to two other raft nodes,  which makes
> > raft clients in an unbalanced state after this operation.
> > This condition might be avoided without larger election timer.
> >
> > For the SB and work node resource side:
> > SB DB size increased 27MB.
> > br-int open flows increased around 369K, 
> > RSS memory of (ovs + ovn-controller) increased more than 600MB.
> 
> This increase on the hypervisor side is most likely because of the
> openflows for hairpin traffic for VIPs (service IP). To confirm, would
> it be possible to take a snapshot of the OVS flow table and see how many
> flows there are per table?
> 
> >
> > So if OVN experts can figure how to optimize it would be very
> great for
> > ovn-k8s scale up to large cluster size I think.
> >
> 
> If the above is due to flows for LB flows to handle hairpin traffic, the
> only idea I have is to use OVS "learn" action to have the flows
> generated as needed. However, I didn't get the chance to try it out yet.
> 
> Thanks,
> Dumitru
> 
> >
> > Regards,
> > Winson
> >
> >
> > On Fri, May 1, 2020 at 1:35 AM Dumitru Ceara  
> > >> wrote:
> >
> >     On 5/1/20 12:00 AM, Winson Wang wrote:
> >     > Hi Han,  Dumitru,
> >     >
> >
> >     Hi Winson,
> >
> >     > With the fix from Dumitru
> >     >
> >   
>  
> https://github.com/ovn-org/ovn/commit/97e82ae5f135a088c9e95b49122d8217718d23f4
> >     >
> >     > It can greatly reduced the OVS SB RAFT workload based on my
> stress
> >     test
> >     > mode with k8s svc with large endpoints.
> >     >
> >     > The DB file size increased much less with fix, so it will
> not trigger
> >     > the leader election with same work load.
> >     >
> >     > Dumitru,  based my test,  logic flows number is fixed with
> cluster
> >     size
> >     > regardless of number of VIP endpoints.
> >
> >     The number of logical flows will be fixed based on number of
> VIPs (2 per
> >     VIP) but the size of the match expression depends on the number of
> >     backends per VIP so the SB DB size will increase when adding
> backends to
> >     existing VIPs.
> >
> >     >
> >     > But the open flow count on each node still have the relationship
> >     of the
> >     > endpoints size.
> >
> >     Yes, this is due to the match expression in the logical flow
> above which
> >     is of the form:
> >
> >     (ip.src == backend-ip1 && ip.dst == backend-ip2) || .. ||
> (ip.src ==
> >     backend-ipn && ip.dst == backend-ipn)
> >
> >     This will get expanded to n openflow rules, one per backend, to
> >     determine if traffic was hairpinned.
> >
> >     > Any idea how to reduce the open flow cnt on each node's br-int?
> >     >
> >     >
> >
> >     Unfortunately I 

Re: [ovs-discuss] OVN Scale with RAFT: how to make ovn-northd more reliable when RAFT leader unstable

2020-07-16 Thread Winson Wang
Hi Dumitru,

most of the flows are in table 19.

-rw-r--r-- 1 root root 142M Jul 16 17:07 br-int.txt (all flows dump file)
-rw-r--r-- 1 root root 102M Jul 16 17:43 table-19.txt
-rw-r--r-- 1 root root 7.8M Jul 16 17:43 table-11.txt
-rw-r--r-- 1 root root 3.7M Jul 16 17:43 table-21.txt

# cat table-19.txt |wc -l
408458
]# cat table-19.txt | grep "=9153" | wc -l
124744
 cat table-19.txt | grep "=53" | wc -l
249488
Coredns pod has svc with port number 53 and 9153.

Please let me know if you need more information.


Regards,
Winson


On Thu, Jul 16, 2020 at 11:23 AM Dumitru Ceara  wrote:

> On 7/15/20 8:02 PM, Winson Wang wrote:
> > +add ovn-Kubernetes group.
> >
> > Hi Dumitru,
> >
> > With recent patches from you and Han,  now for k8s basic workload, such
> > node resources and pod resources are fixed and look good.
> > Much thanks!
>
> Hi Winson,
>
> Glad to hear that!
>
> >
> > For k8s workload which exposes as svc IP is every common,  for example,
> > the coreDNS pod's deployment.
> > With large cluster size such  as 1000,  there is service to auto scale
> > up coreDNS deployment,  if we use default 16 nodes per coredns,  it
> could be
> > 63 coredns pods.
> > On my 1006 nodes setup,  deployment from coreDNS from 2 to 63.
> > SB raft election 16s is not good for this operation in my test
> > environment, it makes one raft node cannot finish the election in two
> > election slot when making all it's
> > clients disconnect and reconnect to two other raft nodes,  which makes
> > raft clients in an unbalanced state after this operation.
> > This condition might be avoided without larger election timer.
> >
> > For the SB and work node resource side:
> > SB DB size increased 27MB.
> > br-int open flows increased around 369K,
> > RSS memory of (ovs + ovn-controller) increased more than 600MB.
>
> This increase on the hypervisor side is most likely because of the
> openflows for hairpin traffic for VIPs (service IP). To confirm, would
> it be possible to take a snapshot of the OVS flow table and see how many
> flows there are per table?
>
> >
> > So if OVN experts can figure how to optimize it would be very great for
> > ovn-k8s scale up to large cluster size I think.
> >
>
> If the above is due to flows for LB flows to handle hairpin traffic, the
> only idea I have is to use OVS "learn" action to have the flows
> generated as needed. However, I didn't get the chance to try it out yet.
>
> Thanks,
> Dumitru
>
> >
> > Regards,
> > Winson
> >
> >
> > On Fri, May 1, 2020 at 1:35 AM Dumitru Ceara  > > wrote:
> >
> > On 5/1/20 12:00 AM, Winson Wang wrote:
> > > Hi Han,  Dumitru,
> > >
> >
> > Hi Winson,
> >
> > > With the fix from Dumitru
> > >
> >
> https://github.com/ovn-org/ovn/commit/97e82ae5f135a088c9e95b49122d8217718d23f4
> > >
> > > It can greatly reduced the OVS SB RAFT workload based on my stress
> > test
> > > mode with k8s svc with large endpoints.
> > >
> > > The DB file size increased much less with fix, so it will not
> trigger
> > > the leader election with same work load.
> > >
> > > Dumitru,  based my test,  logic flows number is fixed with cluster
> > size
> > > regardless of number of VIP endpoints.
> >
> > The number of logical flows will be fixed based on number of VIPs (2
> per
> > VIP) but the size of the match expression depends on the number of
> > backends per VIP so the SB DB size will increase when adding
> backends to
> > existing VIPs.
> >
> > >
> > > But the open flow count on each node still have the relationship
> > of the
> > > endpoints size.
> >
> > Yes, this is due to the match expression in the logical flow above
> which
> > is of the form:
> >
> > (ip.src == backend-ip1 && ip.dst == backend-ip2) || .. || (ip.src ==
> > backend-ipn && ip.dst == backend-ipn)
> >
> > This will get expanded to n openflow rules, one per backend, to
> > determine if traffic was hairpinned.
> >
> > > Any idea how to reduce the open flow cnt on each node's br-int?
> > >
> > >
> >
> > Unfortunately I don't think there's a way to determine if traffic was
> > hairpinned because I don't think we can have openflow rules that
> match
> > on "ip.src == ip.dst". So in the worst case, we will probably need
> two
> > openflow rules per backend IP (one for initiator traffic, one for
> > reply).
> >
> > I'll think more about it though.
> >
> > Regards,
> > Dumitru
> >
> > > Regards,
> > > Winson
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Apr 29, 2020 at 1:42 PM Winson Wang
> > mailto:windson.w...@gmail.com>
> > > >>
> > wrote:
> > >
> > > Hi Han,
> > >
> > > Thanks for quick reply.
> > > Please see my reply below.
> > >
> > > On Wed, Apr 29, 2020 at 12:31 

Re: [ovs-discuss] OVN Scale with RAFT: how to make ovn-northd more reliable when RAFT leader unstable

2020-07-16 Thread Dumitru Ceara
On 7/15/20 8:02 PM, Winson Wang wrote:
> +add ovn-Kubernetes group.
> 
> Hi Dumitru,
> 
> With recent patches from you and Han,  now for k8s basic workload, such
> node resources and pod resources are fixed and look good.
> Much thanks!

Hi Winson,

Glad to hear that!

> 
> For k8s workload which exposes as svc IP is every common,  for example,
> the coreDNS pod's deployment.
> With large cluster size such  as 1000,  there is service to auto scale
> up coreDNS deployment,  if we use default 16 nodes per coredns,  it could be
> 63 coredns pods.
> On my 1006 nodes setup,  deployment from coreDNS from 2 to 63.
> SB raft election 16s is not good for this operation in my test
> environment, it makes one raft node cannot finish the election in two
> election slot when making all it's
> clients disconnect and reconnect to two other raft nodes,  which makes
> raft clients in an unbalanced state after this operation.
> This condition might be avoided without larger election timer.
> 
> For the SB and work node resource side:
> SB DB size increased 27MB.
> br-int open flows increased around 369K, 
> RSS memory of (ovs + ovn-controller) increased more than 600MB.

This increase on the hypervisor side is most likely because of the
openflows for hairpin traffic for VIPs (service IP). To confirm, would
it be possible to take a snapshot of the OVS flow table and see how many
flows there are per table?

> 
> So if OVN experts can figure how to optimize it would be very great for
> ovn-k8s scale up to large cluster size I think.
> 

If the above is due to flows for LB flows to handle hairpin traffic, the
only idea I have is to use OVS "learn" action to have the flows
generated as needed. However, I didn't get the chance to try it out yet.

Thanks,
Dumitru

> 
> Regards,
> Winson
> 
> 
> On Fri, May 1, 2020 at 1:35 AM Dumitru Ceara  > wrote:
> 
> On 5/1/20 12:00 AM, Winson Wang wrote:
> > Hi Han,  Dumitru,
> >
> 
> Hi Winson,
> 
> > With the fix from Dumitru
> >
> 
> https://github.com/ovn-org/ovn/commit/97e82ae5f135a088c9e95b49122d8217718d23f4
> >
> > It can greatly reduced the OVS SB RAFT workload based on my stress
> test
> > mode with k8s svc with large endpoints.
> >
> > The DB file size increased much less with fix, so it will not trigger
> > the leader election with same work load.
> >
> > Dumitru,  based my test,  logic flows number is fixed with cluster
> size
> > regardless of number of VIP endpoints.
> 
> The number of logical flows will be fixed based on number of VIPs (2 per
> VIP) but the size of the match expression depends on the number of
> backends per VIP so the SB DB size will increase when adding backends to
> existing VIPs.
> 
> >
> > But the open flow count on each node still have the relationship
> of the
> > endpoints size.
> 
> Yes, this is due to the match expression in the logical flow above which
> is of the form:
> 
> (ip.src == backend-ip1 && ip.dst == backend-ip2) || .. || (ip.src ==
> backend-ipn && ip.dst == backend-ipn)
> 
> This will get expanded to n openflow rules, one per backend, to
> determine if traffic was hairpinned.
> 
> > Any idea how to reduce the open flow cnt on each node's br-int?
> >
> >
> 
> Unfortunately I don't think there's a way to determine if traffic was
> hairpinned because I don't think we can have openflow rules that match
> on "ip.src == ip.dst". So in the worst case, we will probably need two
> openflow rules per backend IP (one for initiator traffic, one for
> reply).
> 
> I'll think more about it though.
> 
> Regards,
> Dumitru
> 
> > Regards,
> > Winson
> >
> >
> >
> >
> >
> >
> >
> > On Wed, Apr 29, 2020 at 1:42 PM Winson Wang
> mailto:windson.w...@gmail.com>
> > >>
> wrote:
> >
> >     Hi Han,
> >
> >     Thanks for quick reply.
> >     Please see my reply below.
> >
> >     On Wed, Apr 29, 2020 at 12:31 PM Han Zhou  
> >     >> wrote:
> >
> >
> >
> >         On Wed, Apr 29, 2020 at 10:29 AM Winson Wang
> >         mailto:windson.w...@gmail.com>
> >> wrote:
> >         >
> >         > Hello Experts,
> >         >
> >         > I am doing stress with k8s cluster with ovn,  one thing I am
> >         seeing is that when raft nodes
> >         > got update for large data in short time from ovn-northd,  3
> >         raft nodes will trigger voting and leader role switched
> from one
> >         node to another.
> >         >
> >         > From ovn-northd side,  I can see ovn-northd will trigger the
> >         BACKOFF, 

Re: [ovs-discuss] OVN Scale with RAFT: how to make ovn-northd more reliable when RAFT leader unstable

2020-07-15 Thread Winson Wang
+add ovn-Kubernetes group.

Hi Dumitru,

With recent patches from you and Han,  now for k8s basic workload, such
node resources and pod resources are fixed and look good.
Much thanks!

For k8s workload which exposes as svc IP is every common,  for example, the
coreDNS pod's deployment.
With large cluster size such  as 1000,  there is service to auto scale up
coreDNS deployment,  if we use default 16 nodes per coredns,  it could be
63 coredns pods.
On my 1006 nodes setup,  deployment from coreDNS from 2 to 63.
SB raft election 16s is not good for this operation in my test environment,
it makes one raft node cannot finish the election in two election slot when
making all it's
clients disconnect and reconnect to two other raft nodes,  which makes raft
clients in an unbalanced state after this operation.
This condition might be avoided without larger election timer.

For the SB and work node resource side:
SB DB size increased 27MB.
br-int open flows increased around 369K,
RSS memory of (ovs + ovn-controller) increased more than 600MB.

So if OVN experts can figure how to optimize it would be very great for
ovn-k8s scale up to large cluster size I think.


Regards,
Winson


On Fri, May 1, 2020 at 1:35 AM Dumitru Ceara  wrote:

> On 5/1/20 12:00 AM, Winson Wang wrote:
> > Hi Han,  Dumitru,
> >
>
> Hi Winson,
>
> > With the fix from Dumitru
> >
> https://github.com/ovn-org/ovn/commit/97e82ae5f135a088c9e95b49122d8217718d23f4
> >
> > It can greatly reduced the OVS SB RAFT workload based on my stress test
> > mode with k8s svc with large endpoints.
> >
> > The DB file size increased much less with fix, so it will not trigger
> > the leader election with same work load.
> >
> > Dumitru,  based my test,  logic flows number is fixed with cluster size
> > regardless of number of VIP endpoints.
>
> The number of logical flows will be fixed based on number of VIPs (2 per
> VIP) but the size of the match expression depends on the number of
> backends per VIP so the SB DB size will increase when adding backends to
> existing VIPs.
>
> >
> > But the open flow count on each node still have the relationship of the
> > endpoints size.
>
> Yes, this is due to the match expression in the logical flow above which
> is of the form:
>
> (ip.src == backend-ip1 && ip.dst == backend-ip2) || .. || (ip.src ==
> backend-ipn && ip.dst == backend-ipn)
>
> This will get expanded to n openflow rules, one per backend, to
> determine if traffic was hairpinned.
>
> > Any idea how to reduce the open flow cnt on each node's br-int?
> >
> >
>
> Unfortunately I don't think there's a way to determine if traffic was
> hairpinned because I don't think we can have openflow rules that match
> on "ip.src == ip.dst". So in the worst case, we will probably need two
> openflow rules per backend IP (one for initiator traffic, one for reply).
>
> I'll think more about it though.
>
> Regards,
> Dumitru
>
> > Regards,
> > Winson
> >
> >
> >
> >
> >
> >
> >
> > On Wed, Apr 29, 2020 at 1:42 PM Winson Wang  > > wrote:
> >
> > Hi Han,
> >
> > Thanks for quick reply.
> > Please see my reply below.
> >
> > On Wed, Apr 29, 2020 at 12:31 PM Han Zhou  > > wrote:
> >
> >
> >
> > On Wed, Apr 29, 2020 at 10:29 AM Winson Wang
> > mailto:windson.w...@gmail.com>> wrote:
> > >
> > > Hello Experts,
> > >
> > > I am doing stress with k8s cluster with ovn,  one thing I am
> > seeing is that when raft nodes
> > > got update for large data in short time from ovn-northd,  3
> > raft nodes will trigger voting and leader role switched from one
> > node to another.
> > >
> > > From ovn-northd side,  I can see ovn-northd will trigger the
> > BACKOFF, RECONNECT...
> > >
> > > Since ovn-northd only connect to NB/SB leader only and how can
> > we make ovn-northd more available  in most of the time?
> > >
> > > Is it possible to make ovn-northd have established connections
> > to all raft nodes to avoid the
> > > reconnect mechanism?
> > > Since the backoff time 8s is not configurable for now.
> > >
> > >
> > > Test logs:
> > >
> > >
> > 2020-04-29T17:03:08.296Z|41861|ovsdb_idl|INFO|tcp:
> 10.0.2.152:6642 :
> > clustered database server is not cluster leader; trying another
> > server
> > >
> > >
> > 2020-04-29T17:03:08.296Z|41862|reconnect|DBG|tcp:10.0.2.152:6642
> > : entering RECONNECT
> > >
> > >
> > 2020-04-29T17:03:08.304Z|41863|reconnect|DBG|tcp:10.0.2.152:6642
> > : entering BACKOFF
> > >
> > > 2020-04-29T17:03:09.708Z|41867|coverage|INFO|Dropped 2 log
> > messages in last 78 seconds (most recently, 71 seconds ago) due
> >

Re: [ovs-discuss] OVN Scale with RAFT: how to make ovn-northd more reliable when RAFT leader unstable

2020-05-01 Thread Dumitru Ceara
On 5/1/20 12:00 AM, Winson Wang wrote:
> Hi Han,  Dumitru,
> 

Hi Winson,

> With the fix from Dumitru
> https://github.com/ovn-org/ovn/commit/97e82ae5f135a088c9e95b49122d8217718d23f4
> 
> It can greatly reduced the OVS SB RAFT workload based on my stress test
> mode with k8s svc with large endpoints.
> 
> The DB file size increased much less with fix, so it will not trigger
> the leader election with same work load.
> 
> Dumitru,  based my test,  logic flows number is fixed with cluster size
> regardless of number of VIP endpoints.

The number of logical flows will be fixed based on number of VIPs (2 per
VIP) but the size of the match expression depends on the number of
backends per VIP so the SB DB size will increase when adding backends to
existing VIPs.

> 
> But the open flow count on each node still have the relationship of the
> endpoints size.

Yes, this is due to the match expression in the logical flow above which
is of the form:

(ip.src == backend-ip1 && ip.dst == backend-ip2) || .. || (ip.src ==
backend-ipn && ip.dst == backend-ipn)

This will get expanded to n openflow rules, one per backend, to
determine if traffic was hairpinned.

> Any idea how to reduce the open flow cnt on each node's br-int?
> 
> 

Unfortunately I don't think there's a way to determine if traffic was
hairpinned because I don't think we can have openflow rules that match
on "ip.src == ip.dst". So in the worst case, we will probably need two
openflow rules per backend IP (one for initiator traffic, one for reply).

I'll think more about it though.

Regards,
Dumitru

> Regards,
> Winson
> 
> 
> 
> 
> 
> 
> 
> On Wed, Apr 29, 2020 at 1:42 PM Winson Wang  > wrote:
> 
> Hi Han,
> 
> Thanks for quick reply.
> Please see my reply below.
> 
> On Wed, Apr 29, 2020 at 12:31 PM Han Zhou  > wrote:
> 
> 
> 
> On Wed, Apr 29, 2020 at 10:29 AM Winson Wang
> mailto:windson.w...@gmail.com>> wrote:
> >
> > Hello Experts,
> >
> > I am doing stress with k8s cluster with ovn,  one thing I am
> seeing is that when raft nodes
> > got update for large data in short time from ovn-northd,  3
> raft nodes will trigger voting and leader role switched from one
> node to another.
> >
> > From ovn-northd side,  I can see ovn-northd will trigger the
> BACKOFF, RECONNECT...
> >
> > Since ovn-northd only connect to NB/SB leader only and how can
> we make ovn-northd more available  in most of the time?
> >
> > Is it possible to make ovn-northd have established connections
> to all raft nodes to avoid the
> > reconnect mechanism?
> > Since the backoff time 8s is not configurable for now.
> >
> >
> > Test logs:
> >
> >
> 2020-04-29T17:03:08.296Z|41861|ovsdb_idl|INFO|tcp:10.0.2.152:6642 
> :
> clustered database server is not cluster leader; trying another
> server
> >
> >
> 2020-04-29T17:03:08.296Z|41862|reconnect|DBG|tcp:10.0.2.152:6642
> : entering RECONNECT
> >
> >
> 2020-04-29T17:03:08.304Z|41863|reconnect|DBG|tcp:10.0.2.152:6642
> : entering BACKOFF
> >
> > 2020-04-29T17:03:09.708Z|41867|coverage|INFO|Dropped 2 log
> messages in last 78 seconds (most recently, 71 seconds ago) due
> to excessive rate
> >
> > 2020-04-29T17:03:09.708Z|41868|coverage|INFO|Skipping details
> of duplicate event coverage for hash=ceada91f
> >
> >
> 2020-04-29T17:03:16.304Z|41869|reconnect|DBG|tcp:10.0.2.153:6642
> : entering CONNECTING
> >
> >
> 2020-04-29T17:03:16.308Z|41870|reconnect|INFO|tcp:10.0.2.153:6642 
> :
> connected
> >
> >
> 2020-04-29T17:03:16.308Z|41871|reconnect|DBG|tcp:10.0.2.153:6642
> : entering ACTIVE
> >
> > 2020-04-29T17:03:16.308Z|41872|ovn_northd|INFO|ovn-northd lock
> lost. This ovn-northd instance is now on standby.
> >
> > 2020-04-29T17:03:16.309Z|41873|ovn_northd|INFO|ovn-northd lock
> acquired. This ovn-northd instance is now active.
> >
> >
> 2020-04-29T17:03:16.311Z|41874|ovsdb_idl|INFO|tcp:10.0.2.153:6642 
> :
> clustered database server is disconnected from cluster; trying
> another server
> >
> >
> 2020-04-29T17:03:16.311Z|41875|reconnect|DBG|tcp:10.0.2.153:6642
> : entering RECONNECT
> >
> >
> 2020-04-29T17:03:16.312Z|41876|reconnect|DBG|tcp:10.0.2.153:6642
> : entering 

Re: [ovs-discuss] OVN Scale with RAFT: how to make ovn-northd more reliable when RAFT leader unstable

2020-04-30 Thread Winson Wang
Hi Han,  Dumitru,

With the fix from Dumitru
https://github.com/ovn-org/ovn/commit/97e82ae5f135a088c9e95b49122d8217718d23f4

It can greatly reduced the OVS SB RAFT workload based on my stress test
mode with k8s svc with large endpoints.

The DB file size increased much less with fix, so it will not trigger the
leader election with same work load.

Dumitru,  based my test,  logic flows number is fixed with cluster size
regardless of number of VIP endpoints.

But the open flow count on each node still have the relationship of the
endpoints size.
Any idea how to reduce the open flow cnt on each node's br-int?


Regards,
Winson







On Wed, Apr 29, 2020 at 1:42 PM Winson Wang  wrote:

> Hi Han,
>
> Thanks for quick reply.
> Please see my reply below.
>
> On Wed, Apr 29, 2020 at 12:31 PM Han Zhou  wrote:
>
>>
>>
>> On Wed, Apr 29, 2020 at 10:29 AM Winson Wang 
>> wrote:
>> >
>> > Hello Experts,
>> >
>> > I am doing stress with k8s cluster with ovn,  one thing I am seeing is
>> that when raft nodes
>> > got update for large data in short time from ovn-northd,  3 raft nodes
>> will trigger voting and leader role switched from one node to another.
>> >
>> > From ovn-northd side,  I can see ovn-northd will trigger the BACKOFF,
>> RECONNECT...
>> >
>> > Since ovn-northd only connect to NB/SB leader only and how can we make
>> ovn-northd more available  in most of the time?
>> >
>> > Is it possible to make ovn-northd have established connections to all
>> raft nodes to avoid the
>> > reconnect mechanism?
>> > Since the backoff time 8s is not configurable for now.
>> >
>> >
>> > Test logs:
>> >
>> > 2020-04-29T17:03:08.296Z|41861|ovsdb_idl|INFO|tcp:10.0.2.152:6642:
>> clustered database server is not cluster leader; trying another server
>> >
>> > 2020-04-29T17:03:08.296Z|41862|reconnect|DBG|tcp:10.0.2.152:6642:
>> entering RECONNECT
>> >
>> > 2020-04-29T17:03:08.304Z|41863|reconnect|DBG|tcp:10.0.2.152:6642:
>> entering BACKOFF
>> >
>> > 2020-04-29T17:03:09.708Z|41867|coverage|INFO|Dropped 2 log messages in
>> last 78 seconds (most recently, 71 seconds ago) due to excessive rate
>> >
>> > 2020-04-29T17:03:09.708Z|41868|coverage|INFO|Skipping details of
>> duplicate event coverage for hash=ceada91f
>> >
>> > 2020-04-29T17:03:16.304Z|41869|reconnect|DBG|tcp:10.0.2.153:6642:
>> entering CONNECTING
>> >
>> > 2020-04-29T17:03:16.308Z|41870|reconnect|INFO|tcp:10.0.2.153:6642:
>> connected
>> >
>> > 2020-04-29T17:03:16.308Z|41871|reconnect|DBG|tcp:10.0.2.153:6642:
>> entering ACTIVE
>> >
>> > 2020-04-29T17:03:16.308Z|41872|ovn_northd|INFO|ovn-northd lock lost.
>> This ovn-northd instance is now on standby.
>> >
>> > 2020-04-29T17:03:16.309Z|41873|ovn_northd|INFO|ovn-northd lock
>> acquired. This ovn-northd instance is now active.
>> >
>> > 2020-04-29T17:03:16.311Z|41874|ovsdb_idl|INFO|tcp:10.0.2.153:6642:
>> clustered database server is disconnected from cluster; trying another
>> server
>> >
>> > 2020-04-29T17:03:16.311Z|41875|reconnect|DBG|tcp:10.0.2.153:6642:
>> entering RECONNECT
>> >
>> > 2020-04-29T17:03:16.312Z|41876|reconnect|DBG|tcp:10.0.2.153:6642:
>> entering BACKOFF
>> >
>> > 2020-04-29T17:03:24.316Z|41877|reconnect|DBG|tcp:10.0.2.151:6642:
>> entering CONNECTING
>> >
>> > 2020-04-29T17:03:24.321Z|41878|reconnect|INFO|tcp:10.0.2.151:6642:
>> connected
>> >
>> > 2020-04-29T17:03:24.321Z|41879|reconnect|DBG|tcp:10.0.2.151:6642:
>> entering ACTIVE
>> >
>> > 2020-04-29T17:03:24.321Z|41880|ovn_northd|INFO|ovn-northd lock lost.
>> This ovn-northd instance is now on standby.
>> >
>> > 2020-04-29T17:03:24.354Z|41881|ovn_northd|INFO|ovn-northd lock
>> acquired. This ovn-northd instance is now active.
>> >
>> > 2020-04-29T17:03:24.358Z|41882|ovsdb_idl|INFO|tcp:10.0.2.151:6642:
>> clustered database server is not cluster leader; trying another server
>> >
>> > 2020-04-29T17:03:24.358Z|41883|reconnect|DBG|tcp:10.0.2.151:6642:
>> entering RECONNECT
>> >
>> > 2020-04-29T17:03:24.360Z|41884|reconnect|DBG|tcp:10.0.2.151:6642:
>> entering BACKOFF
>> >
>> > 2020-04-29T17:03:32.367Z|41885|reconnect|DBG|tcp:10.0.2.152:6642:
>> entering CONNECTING
>> >
>> > 2020-04-29T17:03:32.372Z|41886|reconnect|INFO|tcp:10.0.2.152:6642:
>> connected
>> >
>> > 2020-04-29T17:03:32.372Z|41887|reconnect|DBG|tcp:10.0.2.152:6642:
>> entering ACTIVE
>> >
>> > 2020-04-29T17:03:32.372Z|41888|ovn_northd|INFO|ovn-northd lock lost.
>> This ovn-northd instance is now on standby.
>> >
>> > 2020-04-29T17:03:32.373Z|41889|ovn_northd|INFO|ovn-northd lock
>> acquired. This ovn-northd instance is now active.
>> >
>> > 2020-04-29T17:03:32.376Z|41890|ovsdb_idl|INFO|tcp:10.0.2.152:6642:
>> clustered database server is not cluster leader; trying another server
>> >
>> > 2020-04-29T17:03:32.376Z|41891|reconnect|DBG|tcp:10.0.2.152:6642:
>> entering RECONNECT
>> >
>> > 2020-04-29T17:03:32.378Z|41892|reconnect|DBG|tcp:10.0.2.152:6642:
>> entering BACKOFF
>> >
>> > 2020-04-29T17:03:40.381Z|41893|reconnect|DBG|tcp:10.0.2.153:6642:
>> entering CONNECTING
>> >
>> > 

Re: [ovs-discuss] OVN Scale with RAFT: how to make ovn-northd more reliable when RAFT leader unstable

2020-04-29 Thread Winson Wang
Hi Han,

Thanks for quick reply.
Please see my reply below.

On Wed, Apr 29, 2020 at 12:31 PM Han Zhou  wrote:

>
>
> On Wed, Apr 29, 2020 at 10:29 AM Winson Wang 
> wrote:
> >
> > Hello Experts,
> >
> > I am doing stress with k8s cluster with ovn,  one thing I am seeing is
> that when raft nodes
> > got update for large data in short time from ovn-northd,  3 raft nodes
> will trigger voting and leader role switched from one node to another.
> >
> > From ovn-northd side,  I can see ovn-northd will trigger the BACKOFF,
> RECONNECT...
> >
> > Since ovn-northd only connect to NB/SB leader only and how can we make
> ovn-northd more available  in most of the time?
> >
> > Is it possible to make ovn-northd have established connections to all
> raft nodes to avoid the
> > reconnect mechanism?
> > Since the backoff time 8s is not configurable for now.
> >
> >
> > Test logs:
> >
> > 2020-04-29T17:03:08.296Z|41861|ovsdb_idl|INFO|tcp:10.0.2.152:6642:
> clustered database server is not cluster leader; trying another server
> >
> > 2020-04-29T17:03:08.296Z|41862|reconnect|DBG|tcp:10.0.2.152:6642:
> entering RECONNECT
> >
> > 2020-04-29T17:03:08.304Z|41863|reconnect|DBG|tcp:10.0.2.152:6642:
> entering BACKOFF
> >
> > 2020-04-29T17:03:09.708Z|41867|coverage|INFO|Dropped 2 log messages in
> last 78 seconds (most recently, 71 seconds ago) due to excessive rate
> >
> > 2020-04-29T17:03:09.708Z|41868|coverage|INFO|Skipping details of
> duplicate event coverage for hash=ceada91f
> >
> > 2020-04-29T17:03:16.304Z|41869|reconnect|DBG|tcp:10.0.2.153:6642:
> entering CONNECTING
> >
> > 2020-04-29T17:03:16.308Z|41870|reconnect|INFO|tcp:10.0.2.153:6642:
> connected
> >
> > 2020-04-29T17:03:16.308Z|41871|reconnect|DBG|tcp:10.0.2.153:6642:
> entering ACTIVE
> >
> > 2020-04-29T17:03:16.308Z|41872|ovn_northd|INFO|ovn-northd lock lost.
> This ovn-northd instance is now on standby.
> >
> > 2020-04-29T17:03:16.309Z|41873|ovn_northd|INFO|ovn-northd lock acquired.
> This ovn-northd instance is now active.
> >
> > 2020-04-29T17:03:16.311Z|41874|ovsdb_idl|INFO|tcp:10.0.2.153:6642:
> clustered database server is disconnected from cluster; trying another
> server
> >
> > 2020-04-29T17:03:16.311Z|41875|reconnect|DBG|tcp:10.0.2.153:6642:
> entering RECONNECT
> >
> > 2020-04-29T17:03:16.312Z|41876|reconnect|DBG|tcp:10.0.2.153:6642:
> entering BACKOFF
> >
> > 2020-04-29T17:03:24.316Z|41877|reconnect|DBG|tcp:10.0.2.151:6642:
> entering CONNECTING
> >
> > 2020-04-29T17:03:24.321Z|41878|reconnect|INFO|tcp:10.0.2.151:6642:
> connected
> >
> > 2020-04-29T17:03:24.321Z|41879|reconnect|DBG|tcp:10.0.2.151:6642:
> entering ACTIVE
> >
> > 2020-04-29T17:03:24.321Z|41880|ovn_northd|INFO|ovn-northd lock lost.
> This ovn-northd instance is now on standby.
> >
> > 2020-04-29T17:03:24.354Z|41881|ovn_northd|INFO|ovn-northd lock acquired.
> This ovn-northd instance is now active.
> >
> > 2020-04-29T17:03:24.358Z|41882|ovsdb_idl|INFO|tcp:10.0.2.151:6642:
> clustered database server is not cluster leader; trying another server
> >
> > 2020-04-29T17:03:24.358Z|41883|reconnect|DBG|tcp:10.0.2.151:6642:
> entering RECONNECT
> >
> > 2020-04-29T17:03:24.360Z|41884|reconnect|DBG|tcp:10.0.2.151:6642:
> entering BACKOFF
> >
> > 2020-04-29T17:03:32.367Z|41885|reconnect|DBG|tcp:10.0.2.152:6642:
> entering CONNECTING
> >
> > 2020-04-29T17:03:32.372Z|41886|reconnect|INFO|tcp:10.0.2.152:6642:
> connected
> >
> > 2020-04-29T17:03:32.372Z|41887|reconnect|DBG|tcp:10.0.2.152:6642:
> entering ACTIVE
> >
> > 2020-04-29T17:03:32.372Z|41888|ovn_northd|INFO|ovn-northd lock lost.
> This ovn-northd instance is now on standby.
> >
> > 2020-04-29T17:03:32.373Z|41889|ovn_northd|INFO|ovn-northd lock acquired.
> This ovn-northd instance is now active.
> >
> > 2020-04-29T17:03:32.376Z|41890|ovsdb_idl|INFO|tcp:10.0.2.152:6642:
> clustered database server is not cluster leader; trying another server
> >
> > 2020-04-29T17:03:32.376Z|41891|reconnect|DBG|tcp:10.0.2.152:6642:
> entering RECONNECT
> >
> > 2020-04-29T17:03:32.378Z|41892|reconnect|DBG|tcp:10.0.2.152:6642:
> entering BACKOFF
> >
> > 2020-04-29T17:03:40.381Z|41893|reconnect|DBG|tcp:10.0.2.153:6642:
> entering CONNECTING
> >
> > 2020-04-29T17:03:40.385Z|41894|reconnect|INFO|tcp:10.0.2.153:6642:
> connected
> >
> > 2020-04-29T17:03:40.385Z|41895|reconnect|DBG|tcp:10.0.2.153:6642:
> entering ACTIVE
> >
> > 2020-04-29T17:03:40.385Z|41896|ovn_northd|INFO|ovn-northd lock lost.
> This ovn-northd instance is now on standby.
> >
> > 2020-04-29T17:03:40.385Z|41897|ovn_northd|INFO|ovn-northd lock acquired.
> This ovn-northd instance is now active.
> >
> >
> > --
> > Winson
>
> Hi Winson,
>
> Since northd heavily writes to SB DB, it is implemented to connect to
> leader only, for better performance (avoid the extra cost of a follower
> forwarding writes to leader). When leader re-election happened, it has to
> reconnect to the new leader. However, if the cluster is unstable, this step
> would also take longer time than expected. I'd suggest to tune the 

Re: [ovs-discuss] OVN Scale with RAFT: how to make ovn-northd more reliable when RAFT leader unstable

2020-04-29 Thread Han Zhou
On Wed, Apr 29, 2020 at 10:29 AM Winson Wang  wrote:
>
> Hello Experts,
>
> I am doing stress with k8s cluster with ovn,  one thing I am seeing is
that when raft nodes
> got update for large data in short time from ovn-northd,  3 raft nodes
will trigger voting and leader role switched from one node to another.
>
> From ovn-northd side,  I can see ovn-northd will trigger the BACKOFF,
RECONNECT...
>
> Since ovn-northd only connect to NB/SB leader only and how can we make
ovn-northd more available  in most of the time?
>
> Is it possible to make ovn-northd have established connections to all
raft nodes to avoid the
> reconnect mechanism?
> Since the backoff time 8s is not configurable for now.
>
>
> Test logs:
>
> 2020-04-29T17:03:08.296Z|41861|ovsdb_idl|INFO|tcp:10.0.2.152:6642:
clustered database server is not cluster leader; trying another server
>
> 2020-04-29T17:03:08.296Z|41862|reconnect|DBG|tcp:10.0.2.152:6642:
entering RECONNECT
>
> 2020-04-29T17:03:08.304Z|41863|reconnect|DBG|tcp:10.0.2.152:6642:
entering BACKOFF
>
> 2020-04-29T17:03:09.708Z|41867|coverage|INFO|Dropped 2 log messages in
last 78 seconds (most recently, 71 seconds ago) due to excessive rate
>
> 2020-04-29T17:03:09.708Z|41868|coverage|INFO|Skipping details of
duplicate event coverage for hash=ceada91f
>
> 2020-04-29T17:03:16.304Z|41869|reconnect|DBG|tcp:10.0.2.153:6642:
entering CONNECTING
>
> 2020-04-29T17:03:16.308Z|41870|reconnect|INFO|tcp:10.0.2.153:6642:
connected
>
> 2020-04-29T17:03:16.308Z|41871|reconnect|DBG|tcp:10.0.2.153:6642:
entering ACTIVE
>
> 2020-04-29T17:03:16.308Z|41872|ovn_northd|INFO|ovn-northd lock lost. This
ovn-northd instance is now on standby.
>
> 2020-04-29T17:03:16.309Z|41873|ovn_northd|INFO|ovn-northd lock acquired.
This ovn-northd instance is now active.
>
> 2020-04-29T17:03:16.311Z|41874|ovsdb_idl|INFO|tcp:10.0.2.153:6642:
clustered database server is disconnected from cluster; trying another
server
>
> 2020-04-29T17:03:16.311Z|41875|reconnect|DBG|tcp:10.0.2.153:6642:
entering RECONNECT
>
> 2020-04-29T17:03:16.312Z|41876|reconnect|DBG|tcp:10.0.2.153:6642:
entering BACKOFF
>
> 2020-04-29T17:03:24.316Z|41877|reconnect|DBG|tcp:10.0.2.151:6642:
entering CONNECTING
>
> 2020-04-29T17:03:24.321Z|41878|reconnect|INFO|tcp:10.0.2.151:6642:
connected
>
> 2020-04-29T17:03:24.321Z|41879|reconnect|DBG|tcp:10.0.2.151:6642:
entering ACTIVE
>
> 2020-04-29T17:03:24.321Z|41880|ovn_northd|INFO|ovn-northd lock lost. This
ovn-northd instance is now on standby.
>
> 2020-04-29T17:03:24.354Z|41881|ovn_northd|INFO|ovn-northd lock acquired.
This ovn-northd instance is now active.
>
> 2020-04-29T17:03:24.358Z|41882|ovsdb_idl|INFO|tcp:10.0.2.151:6642:
clustered database server is not cluster leader; trying another server
>
> 2020-04-29T17:03:24.358Z|41883|reconnect|DBG|tcp:10.0.2.151:6642:
entering RECONNECT
>
> 2020-04-29T17:03:24.360Z|41884|reconnect|DBG|tcp:10.0.2.151:6642:
entering BACKOFF
>
> 2020-04-29T17:03:32.367Z|41885|reconnect|DBG|tcp:10.0.2.152:6642:
entering CONNECTING
>
> 2020-04-29T17:03:32.372Z|41886|reconnect|INFO|tcp:10.0.2.152:6642:
connected
>
> 2020-04-29T17:03:32.372Z|41887|reconnect|DBG|tcp:10.0.2.152:6642:
entering ACTIVE
>
> 2020-04-29T17:03:32.372Z|41888|ovn_northd|INFO|ovn-northd lock lost. This
ovn-northd instance is now on standby.
>
> 2020-04-29T17:03:32.373Z|41889|ovn_northd|INFO|ovn-northd lock acquired.
This ovn-northd instance is now active.
>
> 2020-04-29T17:03:32.376Z|41890|ovsdb_idl|INFO|tcp:10.0.2.152:6642:
clustered database server is not cluster leader; trying another server
>
> 2020-04-29T17:03:32.376Z|41891|reconnect|DBG|tcp:10.0.2.152:6642:
entering RECONNECT
>
> 2020-04-29T17:03:32.378Z|41892|reconnect|DBG|tcp:10.0.2.152:6642:
entering BACKOFF
>
> 2020-04-29T17:03:40.381Z|41893|reconnect|DBG|tcp:10.0.2.153:6642:
entering CONNECTING
>
> 2020-04-29T17:03:40.385Z|41894|reconnect|INFO|tcp:10.0.2.153:6642:
connected
>
> 2020-04-29T17:03:40.385Z|41895|reconnect|DBG|tcp:10.0.2.153:6642:
entering ACTIVE
>
> 2020-04-29T17:03:40.385Z|41896|ovn_northd|INFO|ovn-northd lock lost. This
ovn-northd instance is now on standby.
>
> 2020-04-29T17:03:40.385Z|41897|ovn_northd|INFO|ovn-northd lock acquired.
This ovn-northd instance is now active.
>
>
> --
> Winson

Hi Winson,

Since northd heavily writes to SB DB, it is implemented to connect to
leader only, for better performance (avoid the extra cost of a follower
forwarding writes to leader). When leader re-election happened, it has to
reconnect to the new leader. However, if the cluster is unstable, this step
would also take longer time than expected. I'd suggest to tune the election
timer to avoid re-election during heavy operations.

If the server is overloaded for too long and longer election timer is
unacceptable, the only way to solve the availability problem is to improve
ovsdb performance. How big is your transaction and what's your election
timer setting? The number of clients also impacts the performance since the
heavy update needs to be synced to 

[ovs-discuss] OVN Scale with RAFT: how to make ovn-northd more reliable when RAFT leader unstable

2020-04-29 Thread Winson Wang
Hello Experts,

I am doing stress with k8s cluster with ovn,  one thing I am seeing is that
when raft nodes
got update for large data in short time from ovn-northd,  3 raft nodes will
trigger voting and leader role switched from one node to another.

>From ovn-northd side,  I can see ovn-northd will trigger the BACKOFF,
RECONNECT...

Since ovn-northd only connect to NB/SB leader only and how can we make
ovn-northd more available  in most of the time?

Is it possible to make ovn-northd have established connections to all raft
nodes to avoid the
reconnect mechanism?
Since the backoff time 8s is not configurable for now.


Test logs:

2020-04-29T17:03:08.296Z|41861|ovsdb_idl|INFO|tcp:10.0.2.152:6642:
clustered database server is not cluster leader; trying another server

2020-04-29T17:03:08.296Z|41862|reconnect|DBG|tcp:10.0.2.152:6642: entering
RECONNECT

2020-04-29T17:03:08.304Z|41863|reconnect|DBG|tcp:10.0.2.152:6642: entering
BACKOFF

2020-04-29T17:03:09.708Z|41867|coverage|INFO|Dropped 2 log messages in last
78 seconds (most recently, 71 seconds ago) due to excessive rate

2020-04-29T17:03:09.708Z|41868|coverage|INFO|Skipping details of duplicate
event coverage for hash=ceada91f

2020-04-29T17:03:16.304Z|41869|reconnect|DBG|tcp:10.0.2.153:6642: entering
CONNECTING

2020-04-29T17:03:16.308Z|41870|reconnect|INFO|tcp:10.0.2.153:6642: connected

2020-04-29T17:03:16.308Z|41871|reconnect|DBG|tcp:10.0.2.153:6642: entering
ACTIVE

2020-04-29T17:03:16.308Z|41872|ovn_northd|INFO|ovn-northd lock lost. This
ovn-northd instance is now on standby.

2020-04-29T17:03:16.309Z|41873|ovn_northd|INFO|ovn-northd lock acquired.
This ovn-northd instance is now active.

2020-04-29T17:03:16.311Z|41874|ovsdb_idl|INFO|tcp:10.0.2.153:6642:
clustered database server is disconnected from cluster; trying another
server

2020-04-29T17:03:16.311Z|41875|reconnect|DBG|tcp:10.0.2.153:6642: entering
RECONNECT

2020-04-29T17:03:16.312Z|41876|reconnect|DBG|tcp:10.0.2.153:6642: entering
BACKOFF

2020-04-29T17:03:24.316Z|41877|reconnect|DBG|tcp:10.0.2.151:6642: entering
CONNECTING

2020-04-29T17:03:24.321Z|41878|reconnect|INFO|tcp:10.0.2.151:6642: connected

2020-04-29T17:03:24.321Z|41879|reconnect|DBG|tcp:10.0.2.151:6642: entering
ACTIVE

2020-04-29T17:03:24.321Z|41880|ovn_northd|INFO|ovn-northd lock lost. This
ovn-northd instance is now on standby.

2020-04-29T17:03:24.354Z|41881|ovn_northd|INFO|ovn-northd lock acquired.
This ovn-northd instance is now active.

2020-04-29T17:03:24.358Z|41882|ovsdb_idl|INFO|tcp:10.0.2.151:6642:
clustered database server is not cluster leader; trying another server

2020-04-29T17:03:24.358Z|41883|reconnect|DBG|tcp:10.0.2.151:6642: entering
RECONNECT

2020-04-29T17:03:24.360Z|41884|reconnect|DBG|tcp:10.0.2.151:6642: entering
BACKOFF

2020-04-29T17:03:32.367Z|41885|reconnect|DBG|tcp:10.0.2.152:6642: entering
CONNECTING

2020-04-29T17:03:32.372Z|41886|reconnect|INFO|tcp:10.0.2.152:6642: connected

2020-04-29T17:03:32.372Z|41887|reconnect|DBG|tcp:10.0.2.152:6642: entering
ACTIVE

2020-04-29T17:03:32.372Z|41888|ovn_northd|INFO|ovn-northd lock lost. This
ovn-northd instance is now on standby.

2020-04-29T17:03:32.373Z|41889|ovn_northd|INFO|ovn-northd lock acquired.
This ovn-northd instance is now active.

2020-04-29T17:03:32.376Z|41890|ovsdb_idl|INFO|tcp:10.0.2.152:6642:
clustered database server is not cluster leader; trying another server

2020-04-29T17:03:32.376Z|41891|reconnect|DBG|tcp:10.0.2.152:6642: entering
RECONNECT

2020-04-29T17:03:32.378Z|41892|reconnect|DBG|tcp:10.0.2.152:6642: entering
BACKOFF

2020-04-29T17:03:40.381Z|41893|reconnect|DBG|tcp:10.0.2.153:6642: entering
CONNECTING

2020-04-29T17:03:40.385Z|41894|reconnect|INFO|tcp:10.0.2.153:6642: connected

2020-04-29T17:03:40.385Z|41895|reconnect|DBG|tcp:10.0.2.153:6642: entering
ACTIVE

2020-04-29T17:03:40.385Z|41896|ovn_northd|INFO|ovn-northd lock lost. This
ovn-northd instance is now on standby.

2020-04-29T17:03:40.385Z|41897|ovn_northd|INFO|ovn-northd lock acquired.
This ovn-northd instance is now active.

-- 
Winson
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss