Re: [ovs-discuss] [OVN] no response to inactivity probe

2020-08-06 Thread Tony Liu
Hi,

There are still some connection errors from ovn-controller.
Is that connection drop will cause flows to be deleted from vswitchd?

..
2020-08-07T03:55:22.269Z|03988|jsonrpc|WARN|tcp:127.0.0.1:6640: send error: 
Broken pipe
..
2020-08-07T03:55:31.551Z|03996|reconnect|WARN|tcp:127.0.0.1:6640: connection 
dropped (Broken pipe)



2020-08-07T03:55:22.268Z|03986|poll_loop|INFO|wakeup due to [POLLIN] on fd 14 
(127.0.0.1:49514<->127.0.0.1:6640) at lib/stream-fd.c:157 (99% CPU usage)
2020-08-07T03:55:22.268Z|03987|poll_loop|INFO|wakeup due to [POLLIN] on fd 19 
(10.6.20.91:42854<->10.6.20.84:6642) at lib/stream-fd.c:157 (99% CPU usage)
2020-08-07T03:55:22.269Z|03988|jsonrpc|WARN|tcp:127.0.0.1:6640: send error: 
Broken pipe
2020-08-07T03:55:31.549Z|03989|timeval|WARN|Unreasonably long 9280ms poll 
interval (9220ms user, 1ms system)
2020-08-07T03:55:31.550Z|03990|timeval|WARN|disk: 0 reads, 8 writes
2020-08-07T03:55:31.550Z|03991|timeval|WARN|context switches: 0 voluntary, 5 
involuntary
2020-08-07T03:55:31.550Z|03992|coverage|INFO|Dropped 4 log messages in last 47 
seconds (most recently, 9 seconds ago) due to excessive rate
2020-08-07T03:55:31.551Z|03993|coverage|INFO|Skipping details of duplicate 
event coverage for hash=824dd6ab
2020-08-07T03:55:31.551Z|03994|poll_loop|INFO|wakeup due to [POLLIN] on fd 20 
(<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (100% CPU usage)
2020-08-07T03:55:31.551Z|03995|poll_loop|INFO|wakeup due to [POLLIN] on fd 19 
(10.6.20.91:42854<->10.6.20.84:6642) at lib/stream-fd.c:157 (100% CPU usage)
2020-08-07T03:55:31.551Z|03996|reconnect|WARN|tcp:127.0.0.1:6640: connection 
dropped (Broken pipe)
2020-08-07T03:55:31.552Z|03997|poll_loop|INFO|wakeup due to 0-ms timeout at 
controller/ovn-controller.c:2123 (100% CPU usage)
2020-08-07T03:55:40.752Z|03998|timeval|WARN|Unreasonably long 9176ms poll 
interval (9118ms user, 0ms system)
2020-08-07T03:55:40.752Z|03999|timeval|WARN|context switches: 0 voluntary, 7 
involuntary
2020-08-07T03:55:40.753Z|04000|poll_loop|INFO|Dropped 2 log messages in last 10 
seconds (most recently, 10 seconds ago) due to excessive rate
2020-08-07T03:55:40.753Z|04001|poll_loop|INFO|wakeup due to 0-ms timeout at 
lib/reconnect.c:643 (99% CPU usage)
2020-08-07T03:55:40.754Z|04002|reconnect|INFO|tcp:127.0.0.1:6640: connecting...
2020-08-07T03:55:40.771Z|04003|reconnect|INFO|tcp:127.0.0.1:6640: connected


Thanks!

Tony
> -Original Message-
> From: discuss  On Behalf Of Tony
> Liu
> Sent: Thursday, August 6, 2020 8:23 PM
> To: Han Zhou ; Numan Siddique 
> Cc: ovs-dev ; ovs-discuss  disc...@openvswitch.org>
> Subject: Re: [ovs-discuss] [OVN] no response to inactivity probe
> 
> Interesting...
> 
> with this configuration on gateway (chassis) node, 
> external_ids: {ovn-bridge-mappings="physnet1:br-ex", ovn-cms-
> options=enable-chassis-as-gw, ovn-encap-ip="10.6.30.91", ovn-encap-
> type=geneve, ovn-openflow-probe-interval="30", ovn-
> remote="tcp:10.6.20.84:6642,tcp:10.6.20.85:6642,tcp:10.6.20.86:6642",
> ovn-remote-probe-interval="3", system-id="gateway-1"}
> 
> 
> I still see error from ovn-controller.
> 
> 2020-08-07T03:17:48.186Z|02737|reconnect|ERR|tcp:127.0.0.1:6640: no
> response to inactivity probe after 8.74 seconds, disconnecting 
> That tcp:127.0.0.1:6640 is the connection between ovn-controller and
> local ovsdb-server.
> 
> Any settings I missed?
> 
> 
> Thanks!
> 
> Tony
> > -Original Message-
> > From: dev  On Behalf Of Tony Liu
> > Sent: Thursday, August 6, 2020 7:45 PM
> > To: Han Zhou ; Numan Siddique 
> > Cc: ovs-dev ; ovs-discuss  > disc...@openvswitch.org>
> > Subject: Re: [ovs-dev] [ovs-discuss] [OVN] no response to inactivity
> > probe
> >
> > Hi Han and Numan,
> >
> > I'd like to have a few more clarifications.
> >
> > For inactivity probe:
> > From ovn-controller to ovn-sb-db: ovn-remote-probe-interval
> >
> > From ovn-controller to ovs-vswitchd: ovn-openflow-probe-interval
> >
> > From ovn-controller to local ovsdb: which interval?
> >
> > From local ovsdb to ovn-controller: which interval?
> >
> > From ovs-vswitchd to ovn-controller: which interval?
> >
> >
> > Regarding to the connection between ovn-controller and local ovsdb-
> > server, I recall that UNIX socket is lighter than TCP socket and UNIX
> > socket is recommended for local communication.
> > Is that right?
> >
> >
> > Thanks!
> >
> > Tony
> >
> > > -Original Message-
> > > From: Han Zhou 
> > > Sent: Thursday, August 6, 2020 12:42 PM
> > > To: Tony Liu 
> > > Cc: Han Zhou ; Numan Siddique ;
> > > ovs-dev ; ovs-discuss
> > > 
> > > Subject: Re: [ovs-discuss] [OVN] no response to inactivity probe
> > >
> > >
> > >
> > > On Thu, Aug 6, 2020 at 12:07 PM Tony Liu  > >  > wrote:
> > > >
> > > > Inline...
> > > >
> > > > Thanks!
> > > >
> > > > Tony
> > > > > -Original Message-
> > > > > From: 

Re: [ovs-discuss] [OVN] no response to inactivity probe

2020-08-06 Thread Tony Liu
Interesting...

with this configuration on gateway (chassis) node,

external_ids: {ovn-bridge-mappings="physnet1:br-ex", 
ovn-cms-options=enable-chassis-as-gw, ovn-encap-ip="10.6.30.91", 
ovn-encap-type=geneve, ovn-openflow-probe-interval="30", 
ovn-remote="tcp:10.6.20.84:6642,tcp:10.6.20.85:6642,tcp:10.6.20.86:6642", 
ovn-remote-probe-interval="3", system-id="gateway-1"}


I still see error from ovn-controller.

2020-08-07T03:17:48.186Z|02737|reconnect|ERR|tcp:127.0.0.1:6640: no response to 
inactivity probe after 8.74 seconds, disconnecting

That tcp:127.0.0.1:6640 is the connection between ovn-controller
and local ovsdb-server.

Any settings I missed?


Thanks!

Tony
> -Original Message-
> From: dev  On Behalf Of Tony Liu
> Sent: Thursday, August 6, 2020 7:45 PM
> To: Han Zhou ; Numan Siddique 
> Cc: ovs-dev ; ovs-discuss  disc...@openvswitch.org>
> Subject: Re: [ovs-dev] [ovs-discuss] [OVN] no response to inactivity
> probe
> 
> Hi Han and Numan,
> 
> I'd like to have a few more clarifications.
> 
> For inactivity probe:
> From ovn-controller to ovn-sb-db: ovn-remote-probe-interval
> 
> From ovn-controller to ovs-vswitchd: ovn-openflow-probe-interval
> 
> From ovn-controller to local ovsdb: which interval?
> 
> From local ovsdb to ovn-controller: which interval?
> 
> From ovs-vswitchd to ovn-controller: which interval?
> 
> 
> Regarding to the connection between ovn-controller and local ovsdb-
> server, I recall that UNIX socket is lighter than TCP socket and UNIX
> socket is recommended for local communication.
> Is that right?
> 
> 
> Thanks!
> 
> Tony
> 
> > -Original Message-
> > From: Han Zhou 
> > Sent: Thursday, August 6, 2020 12:42 PM
> > To: Tony Liu 
> > Cc: Han Zhou ; Numan Siddique ; ovs-dev
> > ; ovs-discuss 
> > Subject: Re: [ovs-discuss] [OVN] no response to inactivity probe
> >
> >
> >
> > On Thu, Aug 6, 2020 at 12:07 PM Tony Liu  >  > wrote:
> > >
> > > Inline...
> > >
> > > Thanks!
> > >
> > > Tony
> > > > -Original Message-
> > > > From: Han Zhou mailto:hz...@ovn.org> >
> > > > Sent: Thursday, August 6, 2020 11:37 AM
> > > > To: Tony Liu  > > >  >
> > > > Cc: Han Zhou mailto:hz...@ovn.org> >; Numan
> > > > Siddique mailto:num...@ovn.org> >; ovs-dev
> > > > mailto:ovs-...@openvswitch.org> >;
> > > > ovs-discuss  > > >  >
> > > > Subject: Re: [ovs-discuss] [OVN] no response to inactivity probe
> > > >
> > > >
> > > >
> > > > On Thu, Aug 6, 2020 at 11:11 AM Tony Liu  > > >   >  > > wrote:
> > > > >
> > > > > Inline... (please read with monospaced font:))
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Tony
> > > > > > -Original Message-
> > > > > > From: Han Zhou mailto:hz...@ovn.org>
> > > > > >  > >
> > > > > > Sent: Wednesday, August 5, 2020 11:48 PM
> > > > > > To: Tony Liu  > > > > > 
> > > > > >  > > > > >  > >
> > > > > > Cc: Han Zhou mailto:hz...@ovn.org>
> > > > > >  > >; Numan
> > > > > > Siddique mailto:num...@ovn.org>
> > > > > >  > >; ovs-dev
> > > > > > mailto:ovs-...@openvswitch.org>
> > > > > >  > > > > > 
> > > > > > > >; ovs-discuss  > > > > > 
> > > > > >  > > > > >  > >
> > > > > > Subject: Re: [ovs-discuss] [OVN] no response to inactivity
> > > > > > probe
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Aug 5, 2020 at 9:14 PM Tony Liu
> > > > > > mailto:tonyliu0...@hotmail.com>
> > > > > >  > > > > >  >
> > > > > >  > > > > > 
> > > >  >  > > > wrote:
> > > > > >
> > > > > >
> > > > > >   I set the connection target="ptcp:6641:10.6.20.84" for
> > > > > > ovn-nb-
> > > > db
> > > > > >   and "ptcp:6642:10.6.20.84" for ovn-sb-db. .84 is the
> > > > > > first
> > > > node
> > > > > >   of cluster. Also ovn-openflow-probe-interval=30 on
> > > > > > compute
> > > > node.
> > > > > >   It seems helping. Not that many connect/drop/reconnect
> > > > > > in
> > > > logging.
> > > > > >   That "commit failure" is also gone.
> > > > > >   The issue I reported in another thread "packet drop"
> > > > > > seems
> > > > gone.
> > > > > >   And launching VM starts working.
> > > > > >
> > > > > >   How should I set connection table for all ovn-nb-db and
> > > > > > ovn-
> > > > sb-db
> 

Re: [ovs-discuss] [OVN] no response to inactivity probe

2020-08-06 Thread Tony Liu
Hi Han and Numan,

I'd like to have a few more clarifications.

For inactivity probe:
>From ovn-controller to ovn-sb-db: ovn-remote-probe-interval

>From ovn-controller to ovs-vswitchd: ovn-openflow-probe-interval

>From ovn-controller to local ovsdb: which interval?

>From local ovsdb to ovn-controller: which interval?

>From ovs-vswitchd to ovn-controller: which interval?


Regarding to the connection between ovn-controller and local
ovsdb-server, I recall that UNIX socket is lighter than TCP socket
and UNIX socket is recommended for local communication.
Is that right?


Thanks!

Tony

> -Original Message-
> From: Han Zhou 
> Sent: Thursday, August 6, 2020 12:42 PM
> To: Tony Liu 
> Cc: Han Zhou ; Numan Siddique ; ovs-dev
> ; ovs-discuss 
> Subject: Re: [ovs-discuss] [OVN] no response to inactivity probe
> 
> 
> 
> On Thu, Aug 6, 2020 at 12:07 PM Tony Liu   > wrote:
> >
> > Inline...
> >
> > Thanks!
> >
> > Tony
> > > -Original Message-
> > > From: Han Zhou mailto:hz...@ovn.org> >
> > > Sent: Thursday, August 6, 2020 11:37 AM
> > > To: Tony Liu  > >  >
> > > Cc: Han Zhou mailto:hz...@ovn.org> >; Numan Siddique
> > > mailto:num...@ovn.org> >; ovs-dev
> > > mailto:ovs-...@openvswitch.org> >;
> > > ovs-discuss  > >  >
> > > Subject: Re: [ovs-discuss] [OVN] no response to inactivity probe
> > >
> > >
> > >
> > > On Thu, Aug 6, 2020 at 11:11 AM Tony Liu  > >    > > wrote:
> > > >
> > > > Inline... (please read with monospaced font:))
> > > >
> > > > Thanks!
> > > >
> > > > Tony
> > > > > -Original Message-
> > > > > From: Han Zhou mailto:hz...@ovn.org>
> > > > >  > >
> > > > > Sent: Wednesday, August 5, 2020 11:48 PM
> > > > > To: Tony Liu  > > > >   > > > >  > >
> > > > > Cc: Han Zhou mailto:hz...@ovn.org>
> > > > >  > >; Numan Siddique
> > > > > mailto:num...@ovn.org>   > > > >  > >; ovs-dev  > > > > 
> > > > > 
> > > > > > >; ovs-discuss  > > > > 
> > > > >  > > > >  > >
> > > > > Subject: Re: [ovs-discuss] [OVN] no response to inactivity probe
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Aug 5, 2020 at 9:14 PM Tony Liu  > > > >   > > > >  >
> > > > > 
> > >   > > > wrote:
> > > > >
> > > > >
> > > > >   I set the connection target="ptcp:6641:10.6.20.84" for
> > > > > ovn-nb-
> > > db
> > > > >   and "ptcp:6642:10.6.20.84" for ovn-sb-db. .84 is the first
> > > node
> > > > >   of cluster. Also ovn-openflow-probe-interval=30 on compute
> > > node.
> > > > >   It seems helping. Not that many connect/drop/reconnect in
> > > logging.
> > > > >   That "commit failure" is also gone.
> > > > >   The issue I reported in another thread "packet drop" seems
> > > gone.
> > > > >   And launching VM starts working.
> > > > >
> > > > >   How should I set connection table for all ovn-nb-db and
> > > > > ovn-
> > > sb-db
> > > > >   nodes in the cluster to set inactivity_probe?
> > > > >   One row with address 0.0.0.0 seems not working.
> > > > >
> > > > > You can simply use 0.0.0.0 in the connection table, but don't
> > > > > specify the same connection method on the command line when
> > > > > starting
> > > > > ovsdb- server for NB/SB DB. Otherwise, these are conflicting and
> > > > > that's why you saw "Address already in use" error.
> > > >
> > > > Could you share a bit details how it works?
> > > > I thought the row in connection table only tells nbdb and sbdb the
> > > > probe interval. Isn't that right? Does nbdb and sbdb also create
> > > > socket based on target column?
> > >
> > > >
> > >
> > > In --remote option of ovsdb-server, you can specify either a
> > > connection method directly, or specify the db,table,column which
> > > contains the connection information.
> > > Please see manpage ovsdb-server(1).
> >
> > Here is how one of those 3 nbdb nodes invoked.
> > 
> > ovsdb-server -vconsole:off -vfile:info
> > --log-file=/var/log/kolla/openvswitch/ovn-sb-db.log
> > --remote=punix:/var/run/ovn/ovnsb_db.sock --
> pidfile=/run/ovn/ovnsb_db.pid --unixctl=/var/run/ovn/ovnsb_db.ctl --
> remote=db:OVN_Southbound,SB_Global,connections --private-
> key=db:OVN_Southbound,SSL,private_key --
> 

Re: [ovs-discuss] ovn-k8s scale: how to make new ovn-controller process keep the previous Open Flow in br-int

2020-08-06 Thread Han Zhou
On Thu, Aug 6, 2020 at 10:13 AM Han Zhou  wrote:

>
>
> On Thu, Aug 6, 2020 at 8:54 AM Venugopal Iyer 
> wrote:
>
>> Hi, Han:
>>
>>
>>
>> A comment inline:
>>
>>
>>
>> *From:* ovn-kuberne...@googlegroups.com 
>> *On Behalf Of *Han Zhou
>> *Sent:* Wednesday, August 5, 2020 3:36 PM
>> *To:* Winson Wang 
>> *Cc:* ovs-discuss@openvswitch.org; ovn-kuberne...@googlegroups.com;
>> Dumitru Ceara ; Han Zhou 
>> *Subject:* Re: ovn-k8s scale: how to make new ovn-controller process
>> keep the previous Open Flow in br-int
>>
>>
>>
>> *External email: Use caution opening links or attachments*
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Aug 5, 2020 at 12:58 PM Winson Wang 
>> wrote:
>>
>> Hello OVN Experts,
>>
>>
>> With ovn-k8s,  we need to keep the flows always on br-int which needed by
>> running pods on the k8s node.
>>
>> Is there an ongoing project to address this problem?
>>
>> If not,  I have one proposal not sure if it is doable.
>>
>> Please share your thoughts.
>> The issue:
>>
>> In large scale ovn-k8s cluster there are 200K+ Open Flows on br-int on
>> every K8s node.  When we restart ovn-controller for upgrade using
>> `ovs-appctl -t ovn-controller exit --restart`,  the remaining traffic still
>> works fine since br-int with flows still be Installed.
>>
>>
>>
>> However, when a new ovn-controller starts it will connect OVS IDL and do
>> an engine init run,  clearing all OpenFlow flows and install flows based on
>> SB DB.
>>
>> With open flows count above 200K+,  it took more than 15 seconds to get
>> all the flows installed br-int bridge again.
>>
>>
>> Proposal solution for the issue:
>>
>> When the ovn-controller gets “exit --start”,  it will write a
>> “ovs-cond-seqno” to OVS IDL and store the value to Open vSwitch table in
>> external-ids column. When new ovn-controller starts, it will check if the
>> “ovs-cond-seqno” exists in the Open_vSwitch table,  and get the seqno from
>> OVS IDL to decide if it will force a recomputing process?
>>
>>
>>
>>
>>
>> Hi Winson,
>>
>>
>>
>> Thanks for the proposal. Yes, the connection break during upgrading is a
>> real issue in a large scale environment. However, the proposal doesn't
>> work. The "ovs-cond-seqno" is for the OVSDB IDL for the local conf DB,
>> which is a completely different connection from the ovs-vswitchd open-flow
>> connection.
>>
>> To avoid clearing the open-flow table during ovn-controller startup, we
>> can find a way to postpone clearing the OVS flows after the recomputing in
>> ovn-controller is completed, right before ovn-controller replacing with the
>> new flows.
>>
>> *[vi> ] *
>>
>> *[vi> ] Seems like we force recompute today if the OVS IDL is
>> reconnected. Would it be possible to defer *
>>
>> *decision to  recompute the flows based on  the  SB’s nb_cfg we have
>>  sync’d with? i.e.  If  our nb_cfg is *
>>
>> *in sync with the SB’s global nb_cfg, we can skip the recompute?  At
>> least if nothing has changed since*
>>
>> *the restart, we won’t need to do anything.. We could stash nb_cfg in OVS
>> (once ovn-controller receives*
>>
>> *conformation from OVS that the physical flows for an nb_cfg update are
>> in place), which should be cleared if *
>>
>> *OVS itself is restarted.. (I mean currently, nb_cfg is used to check if
>> NB, SB and Chassis are in sync, we *
>>
>> *could extend this to OVS/physical flows?)*
>>
>
> nb_cfg is already used by ovn-controller to do that, with the help of
> "barrier" of OpenFlow, but I am not sure if it 100% working as expected.
>
> This basic idea should work, but in practice we need to take care of
> generating the "installed" flow table and "desired" flow table in
> ovn-controller.
> I'd start with "postpone clearing OVS flows" which seems a lower hanging
> fruit, and then see if any further improvement is needed.
>
>
(resend using my gmail so that it can reach the ovn-kubernetes group.)

I thought about it again and it seems the idea of remembering nb_cfg
doesn't work for the upgrading scenario. Even if nb_cfg is the same and we
are sure about the flow that's installed in OVS reflects the certain nb_cfg
version, we cannot say the OVS flows doesn't need any change, because the
new version of ovn-controller implementation may translate same SB data
into different OVS flows. So clearing the flow table is still the right
thing to do, in terms of upgrading. (syncing back OVS flows from
ovs-vswitchd to ovn-controller could avoid clearing the whole table, but
that's a different approach as mentioned by Numan, and nb_cfg is not
helpful anyway)

Thanks,
Han

>
>>
>> *Have not thought through this though .. so maybe I am missing something…*
>>
>>
>>
>> *Thanks,*
>>
>>
>>
>> *-venu*
>>
>> This should largely reduce the time of connection broken during
>> upgrading. Some changes in the ofctrl module's state machine are required,
>> but I am not 100% sure if this approach is applicable. Need to check more
>> details.
>>
>>
>>
>> Thanks,
>>
>> Han
>>
>> Test log:
>>
>> Check flow cnt on br-int every second:
>>
>>
>>
>> 

Re: [ovs-discuss] [OVN] no response to inactivity probe

2020-08-06 Thread Han Zhou
On Thu, Aug 6, 2020 at 12:07 PM Tony Liu  wrote:
>
> Inline...
>
> Thanks!
>
> Tony
> > -Original Message-
> > From: Han Zhou 
> > Sent: Thursday, August 6, 2020 11:37 AM
> > To: Tony Liu 
> > Cc: Han Zhou ; Numan Siddique ; ovs-dev
> > ; ovs-discuss 
> > Subject: Re: [ovs-discuss] [OVN] no response to inactivity probe
> >
> >
> >
> > On Thu, Aug 6, 2020 at 11:11 AM Tony Liu  >  > wrote:
> > >
> > > Inline... (please read with monospaced font:))
> > >
> > > Thanks!
> > >
> > > Tony
> > > > -Original Message-
> > > > From: Han Zhou mailto:hz...@ovn.org> >
> > > > Sent: Wednesday, August 5, 2020 11:48 PM
> > > > To: Tony Liu  > > >  >
> > > > Cc: Han Zhou mailto:hz...@ovn.org> >; Numan Siddique
> > > > mailto:num...@ovn.org> >; ovs-dev
> > > > mailto:ovs-...@openvswitch.org> >;
> > > > ovs-discuss  > > >  >
> > > > Subject: Re: [ovs-discuss] [OVN] no response to inactivity probe
> > > >
> > > >
> > > >
> > > > On Wed, Aug 5, 2020 at 9:14 PM Tony Liu  > > >   >  > > wrote:
> > > >
> > > >
> > > >   I set the connection target="ptcp:6641:10.6.20.84" for ovn-nb-
> > db
> > > >   and "ptcp:6642:10.6.20.84" for ovn-sb-db. .84 is the first
> > node
> > > >   of cluster. Also ovn-openflow-probe-interval=30 on compute
> > node.
> > > >   It seems helping. Not that many connect/drop/reconnect in
> > logging.
> > > >   That "commit failure" is also gone.
> > > >   The issue I reported in another thread "packet drop" seems
> > gone.
> > > >   And launching VM starts working.
> > > >
> > > >   How should I set connection table for all ovn-nb-db and ovn-
> > sb-db
> > > >   nodes in the cluster to set inactivity_probe?
> > > >   One row with address 0.0.0.0 seems not working.
> > > >
> > > > You can simply use 0.0.0.0 in the connection table, but don't
> > > > specify the same connection method on the command line when starting
> > > > ovsdb- server for NB/SB DB. Otherwise, these are conflicting and
> > > > that's why you saw "Address already in use" error.
> > >
> > > Could you share a bit details how it works?
> > > I thought the row in connection table only tells nbdb and sbdb the
> > > probe interval. Isn't that right? Does nbdb and sbdb also create
> > > socket based on target column?
> >
> > >
> >
> > In --remote option of ovsdb-server, you can specify either a connection
> > method directly, or specify the db,table,column which contains the
> > connection information.
> > Please see manpage ovsdb-server(1).
>
> Here is how one of those 3 nbdb nodes invoked.
> 
> ovsdb-server -vconsole:off -vfile:info
--log-file=/var/log/kolla/openvswitch/ovn-sb-db.log
--remote=punix:/var/run/ovn/ovnsb_db.sock --pidfile=/run/ovn/ovnsb_db.pid
--unixctl=/var/run/ovn/ovnsb_db.ctl
--remote=db:OVN_Southbound,SB_Global,connections
--private-key=db:OVN_Southbound,SSL,private_key
--certificate=db:OVN_Southbound,SSL,certificate
--ca-cert=db:OVN_Southbound,SSL,ca_cert
--ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
--ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers
--remote=ptcp:6642:10.6.20.84 /var/lib/openvswitch/ovn-sb/ov sb.db
> 
> It creates UNIX and TCP sockets, and takes configuration from DB.
> Does that look ok?
> Given that, what the target column should be for all nodes of the cluster?
> And whatever target is set, ovsdb-server will create socket, right?
> Oh... Should I do "--remote=ptcp:6642:0.0.0.0"? Then I can set the same
> in connection table, and it won't cause conflict?
> If --remote and connection target are the same, whoever comes in later
> will be ignored, right?
> In coding, does ovsdb-server create a connection object for each of
> --remote and connection target, or it's one single connection object
> for both of them because method:port:address is the same? I'd expect
> the single object.
>

--remote=ptcp:6642:10.6.20.84 should be removed from the command.
You already specifies --remote=db:OVN_Southbound,SB_Global,connections
which should contain ptcp:6642:0.0.0.0
If you set both, it will result in conflict when ovsdb-server tries to
create both sockets with same port.

> > > >   Is "external_ids:ovn-remote-probe-interval" in ovsdb-server on
> > > >   compute node for ovn-controller to probe ovn-sb-db?
> > > >
> > > > OVSDB probe is bidirectional, so you need to set this value, too, if
> > > > you don't want too many probes handled by the SB server. (setting
> > > > the connection table for SB only changes the server side).
> > >
> > > In that case, how do I set probe interval for ovn-controller?
> > > My understanding is that, ovn-controller reads configuration from
> > > ovsdb-server on the local compute node. Isn't that right?
> >
> > >
> >
> > The configuration you mentioned "external_ids:ovn-remote-probe-interval"
> > is exactly the way 

Re: [ovs-discuss] [ovs-dev] packet drop

2020-08-06 Thread Tony Liu
Inline...

Thanks!

Tony
> -Original Message-
> From: Numan Siddique 
> Sent: Thursday, August 6, 2020 11:49 AM
> To: Tony Liu 
> Cc: ovs-...@openvswitch.org; ovs-discuss@openvswitch.org
> Subject: Re: [ovs-discuss] [ovs-dev] packet drop
> 
> On Fri, Aug 7, 2020 at 12:10 AM Tony Liu  wrote:
> >
> > Inline...
> >
> > Thanks!
> >
> > Tony
> > > -Original Message-
> > > From: Numan Siddique 
> > > Sent: Thursday, August 6, 2020 10:03 AM
> > > To: Tony Liu 
> > > Cc: ovs-discuss@openvswitch.org; ovs-...@openvswitch.org
> > > Subject: Re: [ovs-dev] packet drop
> > >
> > >
> > >
> > > On Thu, Aug 6, 2020 at 4:05 AM Tony Liu  > >  > wrote:
> > >
> > >
> > >
> > >   The drop is caused by flow change.
> > >
> > >   When packet is dropped.
> > >   
> > >
> > > recirc_id(0),tunnel(tun_id=0x19aca,src=10.6.30.92,dst=10.6.30.22,ge
> > > neve({class=0x102,type=0x80,len=4,0x20003/0x7fff}),flags(-
> > > df+csum+key)),in_port(3),eth(src=fa:16:3e:df:1e:85,dst=00:00:00:00:0
> > > df+csum+0:00
> > > /01:00:00:00:00:00),eth_type(0x0800),ipv4(proto=1,frag=no),icmp(type
> > > =8/0 xf8), packets:14, bytes:1372, used:0.846s, actions:drop
> > >
> recirc_id(0),in_port(12),eth(src=fa:16:3e:7d:bb:85,dst=fa:16:3e:df:
> > >
> 1e:85),eth_type(0x0800),ipv4(src=192.168.236.152/255.255.255.252,dst=10.
> > > 6.40.9,proto=1,tos=0/0x3,ttl=64,frag=no
> > >  > > 0x3, ttl=64,frag=no> ),icmp(type=0), packets:6, bytes:588,
> > > used:8.983s, actions:drop
> > >   
> > >
> > >   When packet goes through.
> > >   
> > >
> > > recirc_id(0),tunnel(tun_id=0x19aca,src=10.6.30.92,dst=10.6.30.22,ge
> > > neve({class=0x102,type=0x80,len=4,0x20003/0x7fff}),flags(-
> > > df+csum+key)),in_port(3),eth(src=fa:16:3e:df:1e:85,dst=00:00:00:00:0
> > > df+csum+0:00
> > > /01:00:00:00:00:00),eth_type(0x0800),ipv4(proto=1,frag=no),icmp(type
> > > =8/0 xf8), packets:3, bytes:294, used:0.104s, actions:12
> > >
> recirc_id(0),in_port(12),eth(src=fa:16:3e:7d:bb:85,dst=fa:16:3e:df:
> > >
> 1e:85),eth_type(0x0800),ipv4(src=192.168.236.152/255.255.255.252,dst=10.
> > > 6.40.9,proto=1,tos=0/0x3,ttl=64,frag=no
> > >  > > 0x3, ttl=64,frag=no> ),icmp(type=0), packets:3, bytes:294,
> > > used:0.103s,
> > > actions:ct_clear,set(tunnel(tun_id=0x1a8ee,dst=10.6.30.92,ttl=64,tp_
> > > dst=
> > > 6081,geneve({class=0x102,type=0x80,len=4,0x1000b}),flags(df|csum|key
> > > ))),
> > > set(eth(src=fa:16:3e:75:b7:e5,dst=52:54:00:0c:ef:b9)),set(ipv4(ttl=6
> > > 3)),
> > > 3
> > >   
> > >
> > >   Is that flow programmed by ovn-controller via ovs-vswitchd?
> > >
> > > What version of OVN and OVS are you using ?
> >
> > ovn-20.03.0-2.el8.x86_64
> > openvswitch-2.12.0-1.el8.x86_64
> >
> > > Can you share your OVN NB DB ?
> >
> > Yes, I can. Let me know how.
> >
> > > If I understand correctly the packet is received from the patch port
> > > to br-int on the gateway node and then tunnelled to the compute node
> right ?
> > > And the packet is dropped on the compute node ?
> >
> > Yes and yes.
> >
> > > If you could share your NB DB if it's fine with you and tell the
> > > destination logical port, I can try it out locally.
> >
> > Here is what I had.
> > On compute node, ovn-controller is very busy. It keeps saying "commit
> > failed".
> > 
> > 2020-08-05T02:44:23.927Z|04125|reconnect|INFO|tcp:10.6.20.84:6642:
> > connected 2020-08-05T02:44:23.936Z|04126|main|INFO|OVNSB commit failed,
> force recompute next time.
> > 2020-08-05T02:44:23.938Z|04127|ovsdb_idl|INFO|tcp:10.6.20.84:6642:
> > clustered database server is disconnected from cluster; trying another
> > server
> > 2020-08-05T02:44:23.939Z|04128|reconnect|INFO|tcp:10.6.20.84:6642:
> > connection attempt timed out
> > 2020-08-05T02:44:23.939Z|04129|reconnect|INFO|tcp:10.6.20.84:6642:
> > waiting 2 seconds before reconnect 
> >
> > The connection to local OVSDB keeps being dropped, because no probe
> > response.
> > 
> > 2020-08-05T02:47:15.437Z|04351|poll_loop|INFO|wakeup due to [POLLIN]
> > on fd 20 (10.6.20.22:42362<->10.6.20.86:6642) at lib/stream-fd.c:157
> > (100% CPU usage)
> > 2020-08-05T02:47:15.438Z|04352|reconnect|WARN|tcp:127.0.0.1:6640:
> > connection dropped (Broken pipe)
> > 2020-08-
> 05T02:47:15.438Z|04353|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt:
> connecting...
> > 2020-08-05T02:47:15.449Z|04354|rconn|INFO|unix:/var/run/openvswitch/br
> > -int.mgmt: connected 
> >
> > After set probe interval to 30s, the problem seems gone. I mentioned
> > this in another thread "[OVN] no response to inactivity probe".
> >
> > I can restore probe interval back to default 5s, see if the problem
> > can be reproduced. For me, it's important to understand what happens
> > behind it.
> 
> Ok. This makes sense. Since the 

Re: [ovs-discuss] [OVN] no response to inactivity probe

2020-08-06 Thread Tony Liu
Inline...

Thanks!

Tony
> -Original Message-
> From: Han Zhou 
> Sent: Thursday, August 6, 2020 11:37 AM
> To: Tony Liu 
> Cc: Han Zhou ; Numan Siddique ; ovs-dev
> ; ovs-discuss 
> Subject: Re: [ovs-discuss] [OVN] no response to inactivity probe
> 
> 
> 
> On Thu, Aug 6, 2020 at 11:11 AM Tony Liu   > wrote:
> >
> > Inline... (please read with monospaced font:))
> >
> > Thanks!
> >
> > Tony
> > > -Original Message-
> > > From: Han Zhou mailto:hz...@ovn.org> >
> > > Sent: Wednesday, August 5, 2020 11:48 PM
> > > To: Tony Liu  > >  >
> > > Cc: Han Zhou mailto:hz...@ovn.org> >; Numan Siddique
> > > mailto:num...@ovn.org> >; ovs-dev
> > > mailto:ovs-...@openvswitch.org> >;
> > > ovs-discuss  > >  >
> > > Subject: Re: [ovs-discuss] [OVN] no response to inactivity probe
> > >
> > >
> > >
> > > On Wed, Aug 5, 2020 at 9:14 PM Tony Liu  > >    > > wrote:
> > >
> > >
> > >   I set the connection target="ptcp:6641:10.6.20.84" for ovn-nb-
> db
> > >   and "ptcp:6642:10.6.20.84" for ovn-sb-db. .84 is the first
> node
> > >   of cluster. Also ovn-openflow-probe-interval=30 on compute
> node.
> > >   It seems helping. Not that many connect/drop/reconnect in
> logging.
> > >   That "commit failure" is also gone.
> > >   The issue I reported in another thread "packet drop" seems
> gone.
> > >   And launching VM starts working.
> > >
> > >   How should I set connection table for all ovn-nb-db and ovn-
> sb-db
> > >   nodes in the cluster to set inactivity_probe?
> > >   One row with address 0.0.0.0 seems not working.
> > >
> > > You can simply use 0.0.0.0 in the connection table, but don't
> > > specify the same connection method on the command line when starting
> > > ovsdb- server for NB/SB DB. Otherwise, these are conflicting and
> > > that's why you saw "Address already in use" error.
> >
> > Could you share a bit details how it works?
> > I thought the row in connection table only tells nbdb and sbdb the
> > probe interval. Isn't that right? Does nbdb and sbdb also create
> > socket based on target column?
> 
> >
> 
> In --remote option of ovsdb-server, you can specify either a connection
> method directly, or specify the db,table,column which contains the
> connection information.
> Please see manpage ovsdb-server(1).

Here is how one of those 3 nbdb nodes invoked.

ovsdb-server -vconsole:off -vfile:info 
--log-file=/var/log/kolla/openvswitch/ovn-sb-db.log 
--remote=punix:/var/run/ovn/ovnsb_db.sock --pidfile=/run/ovn/ovnsb_db.pid 
--unixctl=/var/run/ovn/ovnsb_db.ctl 
--remote=db:OVN_Southbound,SB_Global,connections 
--private-key=db:OVN_Southbound,SSL,private_key 
--certificate=db:OVN_Southbound,SSL,certificate 
--ca-cert=db:OVN_Southbound,SSL,ca_cert 
--ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols 
--ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers --remote=ptcp:6642:10.6.20.84 
/var/lib/openvswitch/ovn-sb/ov sb.db

It creates UNIX and TCP sockets, and takes configuration from DB.
Does that look ok?
Given that, what the target column should be for all nodes of the cluster?
And whatever target is set, ovsdb-server will create socket, right?
Oh... Should I do "--remote=ptcp:6642:0.0.0.0"? Then I can set the same
in connection table, and it won't cause conflict?
If --remote and connection target are the same, whoever comes in later
will be ignored, right?
In coding, does ovsdb-server create a connection object for each of
--remote and connection target, or it's one single connection object
for both of them because method:port:address is the same? I'd expect
the single object.

> > >   Is "external_ids:ovn-remote-probe-interval" in ovsdb-server on
> > >   compute node for ovn-controller to probe ovn-sb-db?
> > >
> > > OVSDB probe is bidirectional, so you need to set this value, too, if
> > > you don't want too many probes handled by the SB server. (setting
> > > the connection table for SB only changes the server side).
> >
> > In that case, how do I set probe interval for ovn-controller?
> > My understanding is that, ovn-controller reads configuration from
> > ovsdb-server on the local compute node. Isn't that right?
> 
> >
> 
> The configuration you mentioned "external_ids:ovn-remote-probe-interval"
> is exactly the way to set the ovn-controller -> SB probe interval.
> (SB -> ovn-controller probe is set in the connection table of SB)
> 
> 
> You are right that ovn-controller reads configuration from the local
> ovsdb-server. This setting is in local ovsdb-server.
> 
> 
> > >   Is "external_ids:ovn-openflow-probe-interval" in ovsdb-server
> on
> > >   compute node for ovn-controller to probe ovsdb-server?
> > >
> > > It is for the OpenFlow connection between ovn-controller and ovs-
> > > vswitchd, which is part of the OpenFlow protocol.

Re: [ovs-discuss] [ovs-dev] packet drop

2020-08-06 Thread Numan Siddique
On Fri, Aug 7, 2020 at 12:10 AM Tony Liu  wrote:
>
> Inline...
>
> Thanks!
>
> Tony
> > -Original Message-
> > From: Numan Siddique 
> > Sent: Thursday, August 6, 2020 10:03 AM
> > To: Tony Liu 
> > Cc: ovs-discuss@openvswitch.org; ovs-...@openvswitch.org
> > Subject: Re: [ovs-dev] packet drop
> >
> >
> >
> > On Thu, Aug 6, 2020 at 4:05 AM Tony Liu  >  > wrote:
> >
> >
> >
> >   The drop is caused by flow change.
> >
> >   When packet is dropped.
> >   
> >   recirc_id(0),tunnel(tun_id=0x19aca,src=10.6.30.92,dst=10.6.30.22,ge
> > neve({class=0x102,type=0x80,len=4,0x20003/0x7fff}),flags(-
> > df+csum+key)),in_port(3),eth(src=fa:16:3e:df:1e:85,dst=00:00:00:00:00:00
> > /01:00:00:00:00:00),eth_type(0x0800),ipv4(proto=1,frag=no),icmp(type=8/0
> > xf8), packets:14, bytes:1372, used:0.846s, actions:drop
> >   recirc_id(0),in_port(12),eth(src=fa:16:3e:7d:bb:85,dst=fa:16:3e:df:
> > 1e:85),eth_type(0x0800),ipv4(src=192.168.236.152/255.255.255.252,dst=10.
> > 6.40.9,proto=1,tos=0/0x3,ttl=64,frag=no
> >  > ttl=64,frag=no> ),icmp(type=0), packets:6, bytes:588, used:8.983s,
> > actions:drop
> >   
> >
> >   When packet goes through.
> >   
> >   recirc_id(0),tunnel(tun_id=0x19aca,src=10.6.30.92,dst=10.6.30.22,ge
> > neve({class=0x102,type=0x80,len=4,0x20003/0x7fff}),flags(-
> > df+csum+key)),in_port(3),eth(src=fa:16:3e:df:1e:85,dst=00:00:00:00:00:00
> > /01:00:00:00:00:00),eth_type(0x0800),ipv4(proto=1,frag=no),icmp(type=8/0
> > xf8), packets:3, bytes:294, used:0.104s, actions:12
> >   recirc_id(0),in_port(12),eth(src=fa:16:3e:7d:bb:85,dst=fa:16:3e:df:
> > 1e:85),eth_type(0x0800),ipv4(src=192.168.236.152/255.255.255.252,dst=10.
> > 6.40.9,proto=1,tos=0/0x3,ttl=64,frag=no
> >  > ttl=64,frag=no> ),icmp(type=0), packets:3, bytes:294, used:0.103s,
> > actions:ct_clear,set(tunnel(tun_id=0x1a8ee,dst=10.6.30.92,ttl=64,tp_dst=
> > 6081,geneve({class=0x102,type=0x80,len=4,0x1000b}),flags(df|csum|key))),
> > set(eth(src=fa:16:3e:75:b7:e5,dst=52:54:00:0c:ef:b9)),set(ipv4(ttl=63)),
> > 3
> >   
> >
> >   Is that flow programmed by ovn-controller via ovs-vswitchd?
> >
> > What version of OVN and OVS are you using ?
>
> ovn-20.03.0-2.el8.x86_64
> openvswitch-2.12.0-1.el8.x86_64
>
> > Can you share your OVN NB DB ?
>
> Yes, I can. Let me know how.
>
> > If I understand correctly the packet is received from the patch port to
> > br-int on the gateway node and then tunnelled to the compute node right ?
> > And the packet is dropped on the compute node ?
>
> Yes and yes.
>
> > If you could share your NB DB if it's fine with you and tell the
> > destination logical port, I can try it out locally.
>
> Here is what I had.
> On compute node, ovn-controller is very busy. It keeps saying
> "commit failed".
> 
> 2020-08-05T02:44:23.927Z|04125|reconnect|INFO|tcp:10.6.20.84:6642: connected
> 2020-08-05T02:44:23.936Z|04126|main|INFO|OVNSB commit failed, force recompute 
> next time.
> 2020-08-05T02:44:23.938Z|04127|ovsdb_idl|INFO|tcp:10.6.20.84:6642: clustered 
> database server is disconnected from cluster; trying another server
> 2020-08-05T02:44:23.939Z|04128|reconnect|INFO|tcp:10.6.20.84:6642: connection 
> attempt timed out
> 2020-08-05T02:44:23.939Z|04129|reconnect|INFO|tcp:10.6.20.84:6642: waiting 2 
> seconds before reconnect
> 
>
> The connection to local OVSDB keeps being dropped, because no probe
> response.
> 
> 2020-08-05T02:47:15.437Z|04351|poll_loop|INFO|wakeup due to [POLLIN] on fd 20 
> (10.6.20.22:42362<->10.6.20.86:6642) at lib/stream-fd.c:157 (100% CPU usage)
> 2020-08-05T02:47:15.438Z|04352|reconnect|WARN|tcp:127.0.0.1:6640: connection 
> dropped (Broken pipe)
> 2020-08-05T02:47:15.438Z|04353|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt:
>  connecting...
> 2020-08-05T02:47:15.449Z|04354|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt:
>  connected
> 
>
> After set probe interval to 30s, the problem seems gone. I mentioned
> this in another thread "[OVN] no response to inactivity probe".
>
> I can restore probe interval back to default 5s, see if the problem
> can be reproduced. For me, it's important to understand what happens
> behind it.

Ok. This makes sense. Since the connection between ovn-controller and
ovs-vswitchd is braking, the flows are deleted and reprogrammed
and this results in a loop. And the packet gets dropped if flows are missing.

I think increasing the probe interval - ovn-openflow-probe-interval
seems fine to me. It indicates that ovn-controller is taking more than
5 seconds to do a full recompute.

If you take ovn-20.06.01 then ovn-controller should not break often as
there are many optimizations in the latest version to handle the DB
changes more
efficiently.

Thanks

Re: [ovs-discuss] [ovs-dev] packet drop

2020-08-06 Thread Tony Liu
Inline...

Thanks!

Tony
> -Original Message-
> From: Numan Siddique 
> Sent: Thursday, August 6, 2020 10:03 AM
> To: Tony Liu 
> Cc: ovs-discuss@openvswitch.org; ovs-...@openvswitch.org
> Subject: Re: [ovs-dev] packet drop
> 
> 
> 
> On Thu, Aug 6, 2020 at 4:05 AM Tony Liu   > wrote:
> 
> 
> 
>   The drop is caused by flow change.
> 
>   When packet is dropped.
>   
>   recirc_id(0),tunnel(tun_id=0x19aca,src=10.6.30.92,dst=10.6.30.22,ge
> neve({class=0x102,type=0x80,len=4,0x20003/0x7fff}),flags(-
> df+csum+key)),in_port(3),eth(src=fa:16:3e:df:1e:85,dst=00:00:00:00:00:00
> /01:00:00:00:00:00),eth_type(0x0800),ipv4(proto=1,frag=no),icmp(type=8/0
> xf8), packets:14, bytes:1372, used:0.846s, actions:drop
>   recirc_id(0),in_port(12),eth(src=fa:16:3e:7d:bb:85,dst=fa:16:3e:df:
> 1e:85),eth_type(0x0800),ipv4(src=192.168.236.152/255.255.255.252,dst=10.
> 6.40.9,proto=1,tos=0/0x3,ttl=64,frag=no
>  ttl=64,frag=no> ),icmp(type=0), packets:6, bytes:588, used:8.983s,
> actions:drop
>   
> 
>   When packet goes through.
>   
>   recirc_id(0),tunnel(tun_id=0x19aca,src=10.6.30.92,dst=10.6.30.22,ge
> neve({class=0x102,type=0x80,len=4,0x20003/0x7fff}),flags(-
> df+csum+key)),in_port(3),eth(src=fa:16:3e:df:1e:85,dst=00:00:00:00:00:00
> /01:00:00:00:00:00),eth_type(0x0800),ipv4(proto=1,frag=no),icmp(type=8/0
> xf8), packets:3, bytes:294, used:0.104s, actions:12
>   recirc_id(0),in_port(12),eth(src=fa:16:3e:7d:bb:85,dst=fa:16:3e:df:
> 1e:85),eth_type(0x0800),ipv4(src=192.168.236.152/255.255.255.252,dst=10.
> 6.40.9,proto=1,tos=0/0x3,ttl=64,frag=no
>  ttl=64,frag=no> ),icmp(type=0), packets:3, bytes:294, used:0.103s,
> actions:ct_clear,set(tunnel(tun_id=0x1a8ee,dst=10.6.30.92,ttl=64,tp_dst=
> 6081,geneve({class=0x102,type=0x80,len=4,0x1000b}),flags(df|csum|key))),
> set(eth(src=fa:16:3e:75:b7:e5,dst=52:54:00:0c:ef:b9)),set(ipv4(ttl=63)),
> 3
>   
> 
>   Is that flow programmed by ovn-controller via ovs-vswitchd?
> 
> What version of OVN and OVS are you using ?

ovn-20.03.0-2.el8.x86_64
openvswitch-2.12.0-1.el8.x86_64

> Can you share your OVN NB DB ?

Yes, I can. Let me know how.

> If I understand correctly the packet is received from the patch port to
> br-int on the gateway node and then tunnelled to the compute node right ?
> And the packet is dropped on the compute node ?

Yes and yes.

> If you could share your NB DB if it's fine with you and tell the
> destination logical port, I can try it out locally.

Here is what I had.
On compute node, ovn-controller is very busy. It keeps saying
"commit failed".

2020-08-05T02:44:23.927Z|04125|reconnect|INFO|tcp:10.6.20.84:6642: connected
2020-08-05T02:44:23.936Z|04126|main|INFO|OVNSB commit failed, force recompute 
next time.
2020-08-05T02:44:23.938Z|04127|ovsdb_idl|INFO|tcp:10.6.20.84:6642: clustered 
database server is disconnected from cluster; trying another server
2020-08-05T02:44:23.939Z|04128|reconnect|INFO|tcp:10.6.20.84:6642: connection 
attempt timed out
2020-08-05T02:44:23.939Z|04129|reconnect|INFO|tcp:10.6.20.84:6642: waiting 2 
seconds before reconnect


The connection to local OVSDB keeps being dropped, because no probe
response.

2020-08-05T02:47:15.437Z|04351|poll_loop|INFO|wakeup due to [POLLIN] on fd 20 
(10.6.20.22:42362<->10.6.20.86:6642) at lib/stream-fd.c:157 (100% CPU usage)
2020-08-05T02:47:15.438Z|04352|reconnect|WARN|tcp:127.0.0.1:6640: connection 
dropped (Broken pipe)
2020-08-05T02:47:15.438Z|04353|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt:
 connecting...
2020-08-05T02:47:15.449Z|04354|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt:
 connected


After set probe interval to 30s, the problem seems gone. I mentioned
this in another thread "[OVN] no response to inactivity probe".

I can restore probe interval back to default 5s, see if the problem
can be reproduced. For me, it's important to understand what happens
behind it.

> 
> Thanks
> Numan
> 
> 
> 
> 
> 
>   Thanks!
> 
>   Tony
> 
>   > -Original Message-
>   > From: discuss mailto:ovs-
> discuss-boun...@openvswitch.org> > On Behalf Of Tony
>   > Liu
>   > Sent: Wednesday, August 5, 2020 2:48 PM
>   > To: ovs-discuss@openvswitch.org  disc...@openvswitch.org> ; ovs-...@openvswitch.org  d...@openvswitch.org>
>   > Subject: [ovs-discuss] packet drop
>   >
>   > Hi,
>   >
>   > I am running ping from external to VM via OVN gateway.
>   > On the compute node, ICMP request packet is consistently coming
> into
>   > interface "ovn-gatewa-1". But there is about 10 out of 25 packet
> loss on
>   > tap interface. It's like the switch pauses 10s after every 15s.
>   >
>   

Re: [ovs-discuss] [OVN] no response to inactivity probe

2020-08-06 Thread Han Zhou
On Thu, Aug 6, 2020 at 11:11 AM Tony Liu  wrote:
>
> Inline... (please read with monospaced font:))
>
> Thanks!
>
> Tony
> > -Original Message-
> > From: Han Zhou 
> > Sent: Wednesday, August 5, 2020 11:48 PM
> > To: Tony Liu 
> > Cc: Han Zhou ; Numan Siddique ; ovs-dev
> > ; ovs-discuss 
> > Subject: Re: [ovs-discuss] [OVN] no response to inactivity probe
> >
> >
> >
> > On Wed, Aug 5, 2020 at 9:14 PM Tony Liu  >  > wrote:
> >
> >
> >   I set the connection target="ptcp:6641:10.6.20.84" for ovn-nb-db
> >   and "ptcp:6642:10.6.20.84" for ovn-sb-db. .84 is the first node
> >   of cluster. Also ovn-openflow-probe-interval=30 on compute node.
> >   It seems helping. Not that many connect/drop/reconnect in logging.
> >   That "commit failure" is also gone.
> >   The issue I reported in another thread "packet drop" seems gone.
> >   And launching VM starts working.
> >
> >   How should I set connection table for all ovn-nb-db and ovn-sb-db
> >   nodes in the cluster to set inactivity_probe?
> >   One row with address 0.0.0.0 seems not working.
> >
> > You can simply use 0.0.0.0 in the connection table, but don't specify
> > the same connection method on the command line when starting ovsdb-
> > server for NB/SB DB. Otherwise, these are conflicting and that's why you
> > saw "Address already in use" error.
>
> Could you share a bit details how it works?
> I thought the row in connection table only tells nbdb and sbdb the
> probe interval. Isn't that right? Does nbdb and sbdb also create
> socket based on target column?
>

In --remote option of ovsdb-server, you can specify either a connection
method directly, or specify the db,table,column which contains the
connection information.
Please see manpage ovsdb-server(1).

> >
> >   Is "external_ids:ovn-remote-probe-interval" in ovsdb-server on
> >   compute node for ovn-controller to probe ovn-sb-db?
> >
> > OVSDB probe is bidirectional, so you need to set this value, too, if you
> > don't want too many probes handled by the SB server. (setting the
> > connection table for SB only changes the server side).
>
> In that case, how do I set probe interval for ovn-controller?
> My understanding is that, ovn-controller reads configuration from
> ovsdb-server on the local compute node. Isn't that right?
>

The configuration you mentioned "external_ids:ovn-remote-probe-interval" is
exactly the way to set the ovn-controller -> SB probe interval.
(SB -> ovn-controller probe is set in the connection table of SB)

You are right that ovn-controller reads configuration from the local
ovsdb-server. This setting is in local ovsdb-server.

> >   Is "external_ids:ovn-openflow-probe-interval" in ovsdb-server on
> >   compute node for ovn-controller to probe ovsdb-server?
> >
> > It is for the OpenFlow connection between ovn-controller and ovs-
> > vswitchd, which is part of the OpenFlow protocol.
> >
> >   What's probe interval for ovsdb-server to probe ovn-controller?
> >
> > The local ovsdb connection uses unix socket, which doesn't send probe by
> > default (if I remember correctly).
>
> Here is how ovsdb-server and ovn-controller is invoked on compute node.
> 
> root 41129  0.0  0.0 157556 20532 ?SJul30   1:51
/usr/sbin/ovsdb-server /var/lib/openvswitch/conf.db -vconsole:emer
-vsyslog:err -vfile:info --remote=punix:/run/openvswitch/db.sock
--remote=ptcp:6640:127.0.0.1
--remote=db:Open_vSwitch,Open_vSwitch,manager_options
--log-file=/var/log/kolla/openvswitch/ovsdb-server.log --pidfile
>
> root 63775 55.9  0.4 1477796 1224324 ? Sl   Aug04 1360:55
/usr/bin/ovn-controller --pidfile=/run/ovn/ovn-controller.pid
--log-file=/var/log/kolla/openvswitch/ovn-controller.log tcp:127.0.0.1:6640
> 
> Is that OK? Or UNIX socket method is recommended for ovn-controller
> to connect to ovsdb-server?

If using TCP, by default it is 5s probe interval. I think it is better to
use unix socket. (but maybe it doesn't matter that much)

>
> Here is the configuration in open_vswitch table in ovsdb-server.
> 
> external_ids: {ovn-encap-ip="10.6.30.22", ovn-encap-type=geneve,
ovn-openflow-probe-interval="30", ovn-remote="tcp:10.6.20.84:6642,tcp:
10.6.20.85:6642,tcp:10.6.20.86:6642", ovn-remote-probe-interval="6",
system-id="compute-3"}
> 
> ovn-controller connects to ovsdb-server and reads this configuration,
> so it knows how to connect to all sbdb nodes, right?
>
Yes, you are right.

> If it's TCP between ovn-controller and ovsdb-server, is that probe
> interval setting will also apply to the probe from ovn-controller to
> ovsdb-server?
>
No. They are not related.

> ovn-controller connects to ovs-vswitchd by UNIX socket to program
> open-flow. ovs-vswitchd and ovsdb-server are connected by UNIX too.
> So, is that ovn-openflow-probe-interval for the probe from ovn-controller
> to ovs-vswitchd via UNIX?
>
> As a summary for the probe 

Re: [ovs-discuss] [OVN] no response to inactivity probe

2020-08-06 Thread Tony Liu
Inline... (please read with monospaced font:))

Thanks!

Tony
> -Original Message-
> From: Han Zhou 
> Sent: Wednesday, August 5, 2020 11:48 PM
> To: Tony Liu 
> Cc: Han Zhou ; Numan Siddique ; ovs-dev
> ; ovs-discuss 
> Subject: Re: [ovs-discuss] [OVN] no response to inactivity probe
> 
> 
> 
> On Wed, Aug 5, 2020 at 9:14 PM Tony Liu   > wrote:
> 
> 
>   I set the connection target="ptcp:6641:10.6.20.84" for ovn-nb-db
>   and "ptcp:6642:10.6.20.84" for ovn-sb-db. .84 is the first node
>   of cluster. Also ovn-openflow-probe-interval=30 on compute node.
>   It seems helping. Not that many connect/drop/reconnect in logging.
>   That "commit failure" is also gone.
>   The issue I reported in another thread "packet drop" seems gone.
>   And launching VM starts working.
> 
>   How should I set connection table for all ovn-nb-db and ovn-sb-db
>   nodes in the cluster to set inactivity_probe?
>   One row with address 0.0.0.0 seems not working.
> 
> You can simply use 0.0.0.0 in the connection table, but don't specify
> the same connection method on the command line when starting ovsdb-
> server for NB/SB DB. Otherwise, these are conflicting and that's why you
> saw "Address already in use" error.

Could you share a bit details how it works?
I thought the row in connection table only tells nbdb and sbdb the
probe interval. Isn't that right? Does nbdb and sbdb also create
socket based on target column?

> 
>   Is "external_ids:ovn-remote-probe-interval" in ovsdb-server on
>   compute node for ovn-controller to probe ovn-sb-db?
> 
> OVSDB probe is bidirectional, so you need to set this value, too, if you
> don't want too many probes handled by the SB server. (setting the
> connection table for SB only changes the server side).

In that case, how do I set probe interval for ovn-controller?
My understanding is that, ovn-controller reads configuration from
ovsdb-server on the local compute node. Isn't that right?

>   Is "external_ids:ovn-openflow-probe-interval" in ovsdb-server on
>   compute node for ovn-controller to probe ovsdb-server?
> 
> It is for the OpenFlow connection between ovn-controller and ovs-
> vswitchd, which is part of the OpenFlow protocol.
> 
>   What's probe interval for ovsdb-server to probe ovn-controller?
> 
> The local ovsdb connection uses unix socket, which doesn't send probe by
> default (if I remember correctly).

Here is how ovsdb-server and ovn-controller is invoked on compute node.

root 41129  0.0  0.0 157556 20532 ?SJul30   1:51 
/usr/sbin/ovsdb-server /var/lib/openvswitch/conf.db -vconsole:emer -vsyslog:err 
-vfile:info --remote=punix:/run/openvswitch/db.sock 
--remote=ptcp:6640:127.0.0.1 
--remote=db:Open_vSwitch,Open_vSwitch,manager_options 
--log-file=/var/log/kolla/openvswitch/ovsdb-server.log --pidfile

root 63775 55.9  0.4 1477796 1224324 ? Sl   Aug04 1360:55 
/usr/bin/ovn-controller --pidfile=/run/ovn/ovn-controller.pid 
--log-file=/var/log/kolla/openvswitch/ovn-controller.log tcp:127.0.0.1:6640

Is that OK? Or UNIX socket method is recommended for ovn-controller
to connect to ovsdb-server?

Here is the configuration in open_vswitch table in ovsdb-server.

external_ids: {ovn-encap-ip="10.6.30.22", ovn-encap-type=geneve, 
ovn-openflow-probe-interval="30", 
ovn-remote="tcp:10.6.20.84:6642,tcp:10.6.20.85:6642,tcp:10.6.20.86:6642", 
ovn-remote-probe-interval="6", system-id="compute-3"}

ovn-controller connects to ovsdb-server and reads this configuration,
so it knows how to connect to all sbdb nodes, right?

If it's TCP between ovn-controller and ovsdb-server, is that probe
interval setting will also apply to the probe from ovn-controller to
ovsdb-server?

ovn-controller connects to ovs-vswitchd by UNIX socket to program
open-flow. ovs-vswitchd and ovsdb-server are connected by UNIX too.
So, is that ovn-openflow-probe-interval for the probe from ovn-controller
to ovs-vswitchd via UNIX?

As a summary for the probe setting,

+--+  driver configuration
|  ovn-driver  |
+--+
^|
|v
+--+  inactivity_probe in table "Connection"
|  ovn-nb-db   |
+--+
^|
|v
+--+  options:northd_probe_interval in table "NB_Global"
|  ovn-northd  |  in nbdb.
+--+
^|
|v
+--+  inactivity_probe in table "Connection"
|  ovn-sb-db   |
+--+
^|
|v
++  in table "Open_vSwitch" in ovsdb-server
|ovn-controller  |  ovn-remote-probe-interval for TCP
++  probe to ovsdb-server,
^|^|ovn-openflow-probe-interval for UNIX
|v TCP|v UNIX   probe to ovs-vswitchd
+--+  +--+
| ovsdb-server |  | ovs-vswitchd |
+--+  +--+

Is that 

Re: [ovs-discuss] ovn-k8s scale: how to make new ovn-controller process keep the previous Open Flow in br-int

2020-08-06 Thread Han Zhou
On Thu, Aug 6, 2020 at 9:15 AM Numan Siddique  wrote:

>
>
> On Thu, Aug 6, 2020 at 9:25 PM Venugopal Iyer 
> wrote:
>
>> Hi, Han:
>>
>>
>>
>> A comment inline:
>>
>>
>>
>> *From:* ovn-kuberne...@googlegroups.com 
>> *On Behalf Of *Han Zhou
>> *Sent:* Wednesday, August 5, 2020 3:36 PM
>> *To:* Winson Wang 
>> *Cc:* ovs-discuss@openvswitch.org; ovn-kuberne...@googlegroups.com;
>> Dumitru Ceara ; Han Zhou 
>> *Subject:* Re: ovn-k8s scale: how to make new ovn-controller process
>> keep the previous Open Flow in br-int
>>
>>
>>
>> *External email: Use caution opening links or attachments*
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Aug 5, 2020 at 12:58 PM Winson Wang 
>> wrote:
>>
>> Hello OVN Experts,
>>
>>
>> With ovn-k8s,  we need to keep the flows always on br-int which needed by
>> running pods on the k8s node.
>>
>> Is there an ongoing project to address this problem?
>>
>> If not,  I have one proposal not sure if it is doable.
>>
>> Please share your thoughts.
>> The issue:
>>
>> In large scale ovn-k8s cluster there are 200K+ Open Flows on br-int on
>> every K8s node.  When we restart ovn-controller for upgrade using
>> `ovs-appctl -t ovn-controller exit --restart`,  the remaining traffic still
>> works fine since br-int with flows still be Installed.
>>
>>
>>
>> However, when a new ovn-controller starts it will connect OVS IDL and do
>> an engine init run,  clearing all OpenFlow flows and install flows based on
>> SB DB.
>>
>> With open flows count above 200K+,  it took more than 15 seconds to get
>> all the flows installed br-int bridge again.
>>
>>
>> Proposal solution for the issue:
>>
>> When the ovn-controller gets “exit --start”,  it will write a
>> “ovs-cond-seqno” to OVS IDL and store the value to Open vSwitch table in
>> external-ids column. When new ovn-controller starts, it will check if the
>> “ovs-cond-seqno” exists in the Open_vSwitch table,  and get the seqno from
>> OVS IDL to decide if it will force a recomputing process?
>>
>>
>>
>>
>>
>> Hi Winson,
>>
>>
>>
>> Thanks for the proposal. Yes, the connection break during upgrading is a
>> real issue in a large scale environment. However, the proposal doesn't
>> work. The "ovs-cond-seqno" is for the OVSDB IDL for the local conf DB,
>> which is a completely different connection from the ovs-vswitchd open-flow
>> connection.
>>
>> To avoid clearing the open-flow table during ovn-controller startup, we
>> can find a way to postpone clearing the OVS flows after the recomputing in
>> ovn-controller is completed, right before ovn-controller replacing with the
>> new flows.
>>
>> *[vi> ] *
>>
>> *[vi> ] Seems like we force recompute today if the OVS IDL is
>> reconnected. Would it be possible to defer *
>>
>> *decision to  recompute the flows based on  the  SB’s nb_cfg we have
>>  sync’d with? i.e.  If  our nb_cfg is *
>>
>> *in sync with the SB’s global nb_cfg, we can skip the recompute?  At
>> least if nothing has changed since*
>>
>> *the restart, we won’t need to do anything.. We could stash nb_cfg in OVS
>> (once ovn-controller receives*
>>
>> *conformation from OVS that the physical flows for an nb_cfg update are
>> in place), which should be cleared if *
>>
>> *OVS itself is restarted.. (I mean currently, nb_cfg is used to check if
>> NB, SB and Chassis are in sync, we *
>>
>> *could extend this to OVS/physical flows?)*
>>
>>
>>
>> *Have not thought through this though .. so maybe I am missing something…*
>>
>>
>>
>> *Thanks,*
>>
>>
>>
>> *-venu*
>>
>> This should largely reduce the time of connection broken during
>> upgrading. Some changes in the ofctrl module's state machine are required,
>> but I am not 100% sure if this approach is applicable. Need to check more
>> details.
>>
>
>
> We can also think if its possible to do the below way
>- When ovn-controller starts, it will not clear the flows, but instead
> will get the dump of flows  from the br-int and populate these flows in its
> installed flows
> - And then when it connects to the SB DB and computes the desired
> flows, it will anyway sync up with the installed flows with the desired
> flows
> - And if there is no difference between desired flows and installed
> flows, there will be no impact on the datapath at all.
>
> Although this would require a careful thought and proper handling.
>

Numan, as I responded to Girish, this avoids the time spent on the one-time
flow installation after restart (the < 10% part of the connection broken
time), but I think currently the major problem is that > 90% of the time is
spent on waiting for computing to finish while the OVS flows are already
cleared. It is surely an optimization, but the most important one now is to
avoid the 90% time. I will look at postpone clearing flows first.


>
> Thanks
> Numan
>
>
>>
>> Thanks,
>>
>> Han
>>
>> Test log:
>>
>> Check flow cnt on br-int every second:
>>
>>
>>
>> packet_count=0 byte_count=0 flow_count=0
>>
>> packet_count=0 byte_count=0 flow_count=0
>>
>> packet_count=0 byte_count=0 flow_count=0
>>

Re: [ovs-discuss] ovn-k8s scale: how to make new ovn-controller process keep the previous Open Flow in br-int

2020-08-06 Thread Han Zhou
On Thu, Aug 6, 2020 at 8:54 AM Venugopal Iyer  wrote:

> Hi, Han:
>
>
>
> A comment inline:
>
>
>
> *From:* ovn-kuberne...@googlegroups.com  *On
> Behalf Of *Han Zhou
> *Sent:* Wednesday, August 5, 2020 3:36 PM
> *To:* Winson Wang 
> *Cc:* ovs-discuss@openvswitch.org; ovn-kuberne...@googlegroups.com;
> Dumitru Ceara ; Han Zhou 
> *Subject:* Re: ovn-k8s scale: how to make new ovn-controller process keep
> the previous Open Flow in br-int
>
>
>
> *External email: Use caution opening links or attachments*
>
>
>
>
>
>
>
> On Wed, Aug 5, 2020 at 12:58 PM Winson Wang 
> wrote:
>
> Hello OVN Experts,
>
>
> With ovn-k8s,  we need to keep the flows always on br-int which needed by
> running pods on the k8s node.
>
> Is there an ongoing project to address this problem?
>
> If not,  I have one proposal not sure if it is doable.
>
> Please share your thoughts.
> The issue:
>
> In large scale ovn-k8s cluster there are 200K+ Open Flows on br-int on
> every K8s node.  When we restart ovn-controller for upgrade using
> `ovs-appctl -t ovn-controller exit --restart`,  the remaining traffic still
> works fine since br-int with flows still be Installed.
>
>
>
> However, when a new ovn-controller starts it will connect OVS IDL and do
> an engine init run,  clearing all OpenFlow flows and install flows based on
> SB DB.
>
> With open flows count above 200K+,  it took more than 15 seconds to get
> all the flows installed br-int bridge again.
>
>
> Proposal solution for the issue:
>
> When the ovn-controller gets “exit --start”,  it will write a
> “ovs-cond-seqno” to OVS IDL and store the value to Open vSwitch table in
> external-ids column. When new ovn-controller starts, it will check if the
> “ovs-cond-seqno” exists in the Open_vSwitch table,  and get the seqno from
> OVS IDL to decide if it will force a recomputing process?
>
>
>
>
>
> Hi Winson,
>
>
>
> Thanks for the proposal. Yes, the connection break during upgrading is a
> real issue in a large scale environment. However, the proposal doesn't
> work. The "ovs-cond-seqno" is for the OVSDB IDL for the local conf DB,
> which is a completely different connection from the ovs-vswitchd open-flow
> connection.
>
> To avoid clearing the open-flow table during ovn-controller startup, we
> can find a way to postpone clearing the OVS flows after the recomputing in
> ovn-controller is completed, right before ovn-controller replacing with the
> new flows.
>
> *[vi> ] *
>
> *[vi> ] Seems like we force recompute today if the OVS IDL is reconnected.
> Would it be possible to defer *
>
> *decision to  recompute the flows based on  the  SB’s nb_cfg we have
>  sync’d with? i.e.  If  our nb_cfg is *
>
> *in sync with the SB’s global nb_cfg, we can skip the recompute?  At least
> if nothing has changed since*
>
> *the restart, we won’t need to do anything.. We could stash nb_cfg in OVS
> (once ovn-controller receives*
>
> *conformation from OVS that the physical flows for an nb_cfg update are in
> place), which should be cleared if *
>
> *OVS itself is restarted.. (I mean currently, nb_cfg is used to check if
> NB, SB and Chassis are in sync, we *
>
> *could extend this to OVS/physical flows?)*
>

nb_cfg is already used by ovn-controller to do that, with the help of
"barrier" of OpenFlow, but I am not sure if it 100% working as expected.

This basic idea should work, but in practice we need to take care of
generating the "installed" flow table and "desired" flow table in
ovn-controller.
I'd start with "postpone clearing OVS flows" which seems a lower hanging
fruit, and then see if any further improvement is needed.


>
> *Have not thought through this though .. so maybe I am missing something…*
>
>
>
> *Thanks,*
>
>
>
> *-venu*
>
> This should largely reduce the time of connection broken during upgrading.
> Some changes in the ofctrl module's state machine are required, but I am
> not 100% sure if this approach is applicable. Need to check more details.
>
>
>
> Thanks,
>
> Han
>
> Test log:
>
> Check flow cnt on br-int every second:
>
>
>
> packet_count=0 byte_count=0 flow_count=0
>
> packet_count=0 byte_count=0 flow_count=0
>
> packet_count=0 byte_count=0 flow_count=0
>
> packet_count=0 byte_count=0 flow_count=0
>
> packet_count=0 byte_count=0 flow_count=0
>
> packet_count=0 byte_count=0 flow_count=0
>
> packet_count=0 byte_count=0 flow_count=10322
>
> packet_count=0 byte_count=0 flow_count=34220
>
> packet_count=0 byte_count=0 flow_count=60425
>
> packet_count=0 byte_count=0 flow_count=82506
>
> packet_count=0 byte_count=0 flow_count=106771
>
> packet_count=0 byte_count=0 flow_count=131648
>
> packet_count=2 byte_count=120 flow_count=158303
>
> packet_count=29 byte_count=1693 flow_count=185999
>
> packet_count=188 byte_count=12455 flow_count=212764
>
>
>
>
>
> --
>
> Winson
>
> --
> You received this message because you are subscribed to the Google Groups
> "ovn-kubernetes" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to 

Re: [ovs-discuss] [ovs-dev] packet drop

2020-08-06 Thread Numan Siddique
On Thu, Aug 6, 2020 at 4:05 AM Tony Liu  wrote:

>
> The drop is caused by flow change.
>
> When packet is dropped.
> 
> recirc_id(0),tunnel(tun_id=0x19aca,src=10.6.30.92,dst=10.6.30.22,geneve({class=0x102,type=0x80,len=4,0x20003/0x7fff}),flags(-df+csum+key)),in_port(3),eth(src=fa:16:3e:df:1e:85,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(proto=1,frag=no),icmp(type=8/0xf8),
> packets:14, bytes:1372, used:0.846s, actions:drop
>
> recirc_id(0),in_port(12),eth(src=fa:16:3e:7d:bb:85,dst=fa:16:3e:df:1e:85),eth_type(0x0800),ipv4(src=
> 192.168.236.152/255.255.255.252,dst=10.6.40.9,proto=1,tos=0/0x3,ttl=64,frag=no),icmp(type=0),
> packets:6, bytes:588, used:8.983s, actions:drop
> 
>
> When packet goes through.
> 
> recirc_id(0),tunnel(tun_id=0x19aca,src=10.6.30.92,dst=10.6.30.22,geneve({class=0x102,type=0x80,len=4,0x20003/0x7fff}),flags(-df+csum+key)),in_port(3),eth(src=fa:16:3e:df:1e:85,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(proto=1,frag=no),icmp(type=8/0xf8),
> packets:3, bytes:294, used:0.104s, actions:12
>
> recirc_id(0),in_port(12),eth(src=fa:16:3e:7d:bb:85,dst=fa:16:3e:df:1e:85),eth_type(0x0800),ipv4(src=
> 192.168.236.152/255.255.255.252,dst=10.6.40.9,proto=1,tos=0/0x3,ttl=64,frag=no),icmp(type=0),
> packets:3, bytes:294, used:0.103s,
> actions:ct_clear,set(tunnel(tun_id=0x1a8ee,dst=10.6.30.92,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x1000b}),flags(df|csum|key))),set(eth(src=fa:16:3e:75:b7:e5,dst=52:54:00:0c:ef:b9)),set(ipv4(ttl=63)),3
> 
>
> Is that flow programmed by ovn-controller via ovs-vswitchd?
>
>
What version of OVN and OVS are you using ?

Can you share your OVN NB DB ?

If I understand correctly the packet is received from the patch port to
br-int on the gateway node and then tunnelled to the compute node right ?
And the packet is dropped on the compute node ?

If you could share your NB DB if it's fine with you and tell the
destination logical port, I can try it out locally.

Thanks
Numan




>
> Thanks!
>
> Tony
>
> > -Original Message-
> > From: discuss  On Behalf Of Tony
> > Liu
> > Sent: Wednesday, August 5, 2020 2:48 PM
> > To: ovs-discuss@openvswitch.org; ovs-...@openvswitch.org
> > Subject: [ovs-discuss] packet drop
> >
> > Hi,
> >
> > I am running ping from external to VM via OVN gateway.
> > On the compute node, ICMP request packet is consistently coming into
> > interface "ovn-gatewa-1". But there is about 10 out of 25 packet loss on
> > tap interface. It's like the switch pauses 10s after every 15s.
> >
> > Has anyone experiences such issue?
> > Any advice how to look into it?
> >
> > 
> > 21fed09f-909e-4efc-b117-f5d5fcb636c9
> > Bridge br-int
> > fail_mode: secure
> > datapath_type: system
> > Port "ovn-gatewa-0"
> > Interface "ovn-gatewa-0"
> > type: geneve
> > options: {csum="true", key=flow, remote_ip="10.6.30.91"}
> > bfd_status: {diagnostic="No Diagnostic", flap_count="1",
> > forwarding="true", remote_diagnostic="No Diagnostic", remote_state=up,
> > state=up}
> > Port "tap2588bb4e-35"
> > Interface "tap2588bb4e-35"
> > Port "ovn-gatewa-1"
> > Interface "ovn-gatewa-1"
> > type: geneve
> > options: {csum="true", key=flow, remote_ip="10.6.30.92"}
> > bfd_status: {diagnostic="No Diagnostic", flap_count="1",
> > forwarding="true", remote_diagnostic="No Diagnostic", remote_state=up,
> > state=up}
> > Port "tap37f6b2d7-cc"
> > Interface "tap37f6b2d7-cc"
> > Port "tap2c4b3b0f-8b"
> > Interface "tap2c4b3b0f-8b"
> > Port "tap23245491-a4"
> > Interface "tap23245491-a4"
> > Port "tap51660269-2c"
> > Interface "tap51660269-2c"
> > Port "tap276cd1ef-e1"
> > Interface "tap276cd1ef-e1"
> > Port "tap138526d3-b3"
> > Interface "tap138526d3-b3"
> > Port "tapd1ae48a1-2d"
> > Interface "tapd1ae48a1-2d"
> > Port br-int
> > Interface br-int
> > type: internal
> > Port "tapdd08f476-94"
> > Interface "tapdd08f476-94"
> > 
> >
> >
> > Thanks!
> >
> > Tony
> >
> > ___
> > discuss mailing list
> > disc...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] SB flows not being created in OVN K8 Stateful set

2020-08-06 Thread Brendan Doyle


OK, thanks for the pointers, I think we will eventually move to an OVN CNI
But for now I need to get this working.


On 06/08/2020 16:49, Girish Moodalbail wrote:



On Thu, Aug 6, 2020 at 8:23 AM Brendan Doyle > wrote:




On 06/08/2020 16:19, Girish Moodalbail wrote:



On Thu, Aug 6, 2020 at 7:36 AM Brendan Doyle
mailto:brendan.do...@oracle.com>> wrote:

OK thanks, perhaps Girish can comment, I thinking that the
steps are

|# Create OVN namespace, service accounts, ovnkube-db
headless service, configmap, and policies kubectl create -f

$HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovn-setup.yaml

# Run ovnkube-db deployment. kubectl apply -f
$HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/

||ovnkube-db-raft.yaml # 
Run ovnkube-master deployment.
kubectl create -f

$HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovnkube-master.yaml

# Run ovnkube daemonset for nodes kubectl create -f

$HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovnkube-node.yaml
|


Yes, those are the steps to get OVN K8s CNI up and running with
OVN DB running in clustered mode.

However, you also say below


Note I don't want to replace flannel with OVN as the CNI, I just want to
run OVN central in a k8
StatefulSet, that use flannel as the CNI.


So, my question is - What are you trying to do? How are you
mixing Flannel and OVN DBs?

Do you want to run OVN DBs in clustered mode as a service (or K8s
application) using Flannel as the CNI for your K8s cluster?


Yes I want to use Flannel as the CNI, and just have the clustered
OVN DBs as a k8s Service. providing
a HA OVN Central for ovn-controllers on hypervisors in my network.
So it sounds like the above steps
won't work for me and I have to hand craft/modify the raft yaml to
start northd, but not use the
rest of the yamls ?


Providing Clustered OVN DBs as a service is not the goal of the 
ovn-kubernetes project. However, you can re-use a lot of the project's 
yamls and container entrypoint scripts to achieve what you want to do.


1. Apply the ovn-setup.yaml and ovnkube-db-raft.yaml like you captured 
above
2. Edit the ovnkube-master.yaml to only have ovn-northd container and 
nothing else and name it ovn-north.yaml. Apply this ovn-north.yaml.
3. Have all the ovn-controller instances in your network point to the 
OVN SB DB instances. The OVN DB Pods run with hostNetwork set to 
`true`, so they will not be on a flannel network and therefore will be 
accessible from your hypervisors directly.





___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] ovn-k8s scale: how to make new ovn-controller process keep the previous Open Flow in br-int

2020-08-06 Thread Numan Siddique
On Thu, Aug 6, 2020 at 9:25 PM Venugopal Iyer  wrote:

> Hi, Han:
>
>
>
> A comment inline:
>
>
>
> *From:* ovn-kuberne...@googlegroups.com  *On
> Behalf Of *Han Zhou
> *Sent:* Wednesday, August 5, 2020 3:36 PM
> *To:* Winson Wang 
> *Cc:* ovs-discuss@openvswitch.org; ovn-kuberne...@googlegroups.com;
> Dumitru Ceara ; Han Zhou 
> *Subject:* Re: ovn-k8s scale: how to make new ovn-controller process keep
> the previous Open Flow in br-int
>
>
>
> *External email: Use caution opening links or attachments*
>
>
>
>
>
>
>
> On Wed, Aug 5, 2020 at 12:58 PM Winson Wang 
> wrote:
>
> Hello OVN Experts,
>
>
> With ovn-k8s,  we need to keep the flows always on br-int which needed by
> running pods on the k8s node.
>
> Is there an ongoing project to address this problem?
>
> If not,  I have one proposal not sure if it is doable.
>
> Please share your thoughts.
> The issue:
>
> In large scale ovn-k8s cluster there are 200K+ Open Flows on br-int on
> every K8s node.  When we restart ovn-controller for upgrade using
> `ovs-appctl -t ovn-controller exit --restart`,  the remaining traffic still
> works fine since br-int with flows still be Installed.
>
>
>
> However, when a new ovn-controller starts it will connect OVS IDL and do
> an engine init run,  clearing all OpenFlow flows and install flows based on
> SB DB.
>
> With open flows count above 200K+,  it took more than 15 seconds to get
> all the flows installed br-int bridge again.
>
>
> Proposal solution for the issue:
>
> When the ovn-controller gets “exit --start”,  it will write a
> “ovs-cond-seqno” to OVS IDL and store the value to Open vSwitch table in
> external-ids column. When new ovn-controller starts, it will check if the
> “ovs-cond-seqno” exists in the Open_vSwitch table,  and get the seqno from
> OVS IDL to decide if it will force a recomputing process?
>
>
>
>
>
> Hi Winson,
>
>
>
> Thanks for the proposal. Yes, the connection break during upgrading is a
> real issue in a large scale environment. However, the proposal doesn't
> work. The "ovs-cond-seqno" is for the OVSDB IDL for the local conf DB,
> which is a completely different connection from the ovs-vswitchd open-flow
> connection.
>
> To avoid clearing the open-flow table during ovn-controller startup, we
> can find a way to postpone clearing the OVS flows after the recomputing in
> ovn-controller is completed, right before ovn-controller replacing with the
> new flows.
>
> *[vi> ] *
>
> *[vi> ] Seems like we force recompute today if the OVS IDL is reconnected.
> Would it be possible to defer *
>
> *decision to  recompute the flows based on  the  SB’s nb_cfg we have
>  sync’d with? i.e.  If  our nb_cfg is *
>
> *in sync with the SB’s global nb_cfg, we can skip the recompute?  At least
> if nothing has changed since*
>
> *the restart, we won’t need to do anything.. We could stash nb_cfg in OVS
> (once ovn-controller receives*
>
> *conformation from OVS that the physical flows for an nb_cfg update are in
> place), which should be cleared if *
>
> *OVS itself is restarted.. (I mean currently, nb_cfg is used to check if
> NB, SB and Chassis are in sync, we *
>
> *could extend this to OVS/physical flows?)*
>
>
>
> *Have not thought through this though .. so maybe I am missing something…*
>
>
>
> *Thanks,*
>
>
>
> *-venu*
>
> This should largely reduce the time of connection broken during upgrading.
> Some changes in the ofctrl module's state machine are required, but I am
> not 100% sure if this approach is applicable. Need to check more details.
>


We can also think if its possible to do the below way
   - When ovn-controller starts, it will not clear the flows, but instead
will get the dump of flows  from the br-int and populate these flows in its
installed flows
- And then when it connects to the SB DB and computes the desired
flows, it will anyway sync up with the installed flows with the desired
flows
- And if there is no difference between desired flows and installed
flows, there will be no impact on the datapath at all.

Although this would require a careful thought and proper handling.

Thanks
Numan


>
> Thanks,
>
> Han
>
> Test log:
>
> Check flow cnt on br-int every second:
>
>
>
> packet_count=0 byte_count=0 flow_count=0
>
> packet_count=0 byte_count=0 flow_count=0
>
> packet_count=0 byte_count=0 flow_count=0
>
> packet_count=0 byte_count=0 flow_count=0
>
> packet_count=0 byte_count=0 flow_count=0
>
> packet_count=0 byte_count=0 flow_count=0
>
> packet_count=0 byte_count=0 flow_count=10322
>
> packet_count=0 byte_count=0 flow_count=34220
>
> packet_count=0 byte_count=0 flow_count=60425
>
> packet_count=0 byte_count=0 flow_count=82506
>
> packet_count=0 byte_count=0 flow_count=106771
>
> packet_count=0 byte_count=0 flow_count=131648
>
> packet_count=2 byte_count=120 flow_count=158303
>
> packet_count=29 byte_count=1693 flow_count=185999
>
> packet_count=188 byte_count=12455 flow_count=212764
>
>
>
>
>
> --
>
> Winson
>
> --
> You received this message because you are subscribed to 

Re: [ovs-discuss] ovn-k8s scale: how to make new ovn-controller process keep the previous Open Flow in br-int

2020-08-06 Thread Venugopal Iyer
Hi, Han:

A comment inline:

From: ovn-kuberne...@googlegroups.com  On 
Behalf Of Han Zhou
Sent: Wednesday, August 5, 2020 3:36 PM
To: Winson Wang 
Cc: ovs-discuss@openvswitch.org; ovn-kuberne...@googlegroups.com; Dumitru Ceara 
; Han Zhou 
Subject: Re: ovn-k8s scale: how to make new ovn-controller process keep the 
previous Open Flow in br-int

External email: Use caution opening links or attachments



On Wed, Aug 5, 2020 at 12:58 PM Winson Wang 
mailto:windson.w...@gmail.com>> wrote:
Hello OVN Experts,

With ovn-k8s,  we need to keep the flows always on br-int which needed by 
running pods on the k8s node.
Is there an ongoing project to address this problem?
If not,  I have one proposal not sure if it is doable.
Please share your thoughts.
The issue:

In large scale ovn-k8s cluster there are 200K+ Open Flows on br-int on every 
K8s node.  When we restart ovn-controller for upgrade using `ovs-appctl -t 
ovn-controller exit --restart`,  the remaining traffic still works fine since 
br-int with flows still be Installed.


However, when a new ovn-controller starts it will connect OVS IDL and do an 
engine init run,  clearing all OpenFlow flows and install flows based on SB DB.

With open flows count above 200K+,  it took more than 15 seconds to get all the 
flows installed br-int bridge again.

Proposal solution for the issue:

When the ovn-controller gets “exit --start”,  it will write a  “ovs-cond-seqno” 
to OVS IDL and store the value to Open vSwitch table in external-ids column. 
When new ovn-controller starts, it will check if the “ovs-cond-seqno” exists in 
the Open_vSwitch table,  and get the seqno from OVS IDL to decide if it will 
force a recomputing process?



Hi Winson,

Thanks for the proposal. Yes, the connection break during upgrading is a real 
issue in a large scale environment. However, the proposal doesn't work. The 
"ovs-cond-seqno" is for the OVSDB IDL for the local conf DB, which is a 
completely different connection from the ovs-vswitchd open-flow connection.
To avoid clearing the open-flow table during ovn-controller startup, we can 
find a way to postpone clearing the OVS flows after the recomputing in 
ovn-controller is completed, right before ovn-controller replacing with the new 
flows.
[vi> ]
[vi> ] Seems like we force recompute today if the OVS IDL is reconnected. Would 
it be possible to defer
decision to  recompute the flows based on  the  SB’s nb_cfg we have  sync’d 
with? i.e.  If  our nb_cfg is
in sync with the SB’s global nb_cfg, we can skip the recompute?  At least if 
nothing has changed since
the restart, we won’t need to do anything.. We could stash nb_cfg in OVS (once 
ovn-controller receives
conformation from OVS that the physical flows for an nb_cfg update are in 
place), which should be cleared if
OVS itself is restarted.. (I mean currently, nb_cfg is used to check if NB, SB 
and Chassis are in sync, we
could extend this to OVS/physical flows?)

Have not thought through this though .. so maybe I am missing something…

Thanks,

-venu
This should largely reduce the time of connection broken during upgrading. Some 
changes in the ofctrl module's state machine are required, but I am not 100% 
sure if this approach is applicable. Need to check more details.

Thanks,
Han

Test log:

Check flow cnt on br-int every second:


packet_count=0 byte_count=0 flow_count=0

packet_count=0 byte_count=0 flow_count=0

packet_count=0 byte_count=0 flow_count=0

packet_count=0 byte_count=0 flow_count=0

packet_count=0 byte_count=0 flow_count=0

packet_count=0 byte_count=0 flow_count=0

packet_count=0 byte_count=0 flow_count=10322

packet_count=0 byte_count=0 flow_count=34220

packet_count=0 byte_count=0 flow_count=60425

packet_count=0 byte_count=0 flow_count=82506

packet_count=0 byte_count=0 flow_count=106771

packet_count=0 byte_count=0 flow_count=131648

packet_count=2 byte_count=120 flow_count=158303

packet_count=29 byte_count=1693 flow_count=185999

packet_count=188 byte_count=12455 flow_count=212764


--
Winson
--
You received this message because you are subscribed to the Google Groups 
"ovn-kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
ovn-kubernetes+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS8eC2EtMJbqBccGD0hyvLFBkzkeJ9sXOsT_TVF3Ltm2hA%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups 
"ovn-kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
ovn-kubernetes+unsubscr...@googlegroups.com.
To view this discussion on the web visit 

Re: [ovs-discuss] SB flows not being created in OVN K8 Stateful set

2020-08-06 Thread Girish Moodalbail
On Thu, Aug 6, 2020 at 8:23 AM Brendan Doyle 
wrote:

>
>
> On 06/08/2020 16:19, Girish Moodalbail wrote:
>
>
>
> On Thu, Aug 6, 2020 at 7:36 AM Brendan Doyle 
> wrote:
>
>> OK thanks, perhaps Girish can comment, I thinking that the steps are
>>
>> # Create OVN namespace, service accounts, ovnkube-db headless service, 
>> configmap, and policies
>> kubectl create -f 
>> $HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovn-setup.yaml
>>
>> # Run ovnkube-db deployment.
>> kubectl apply -f 
>> $HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovnkube-db-raft.yaml
>>
>> # Run ovnkube-master deployment.
>> kubectl create -f 
>> $HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovnkube-master.yaml
>>
>> # Run ovnkube daemonset for nodes
>> kubectl create -f 
>> $HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovnkube-node.yaml
>>
>>
> Yes, those are the steps to get OVN K8s CNI up and running with OVN DB
> running in clustered mode.
>
> However, you also say below
>
>
>
>> Note I don't want to replace flannel with OVN as the CNI, I just want to
>> run OVN central in a k8
>> StatefulSet, that use flannel as the CNI.
>>
>> So, my question is - What are you trying to do? How are you mixing
> Flannel and OVN DBs?
>
> Do you want to run OVN DBs in clustered mode as a service (or K8s
> application) using Flannel as the CNI for your K8s cluster?
>
>
> Yes I want to use Flannel as the CNI, and just have the clustered OVN DBs
> as a k8s Service. providing
> a HA OVN Central for ovn-controllers on hypervisors in my network. So it
> sounds like the above steps
> won't work for me and I have to hand craft/modify the raft yaml to start
> northd, but not use the
> rest of the yamls ?
>

Providing Clustered OVN DBs as a service is not the goal of the
ovn-kubernetes project. However, you can re-use a lot of the project's
yamls and container entrypoint scripts to achieve what you want to do.

1. Apply the ovn-setup.yaml and ovnkube-db-raft.yaml like you captured above
2. Edit the ovnkube-master.yaml to only have ovn-northd container and
nothing else and name it ovn-north.yaml. Apply this ovn-north.yaml.
3. Have all the ovn-controller instances in your network point to the OVN
SB DB instances. The OVN DB Pods run with hostNetwork set to `true`, so
they will not be on a flannel network and therefore will be accessible from
your hypervisors directly.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] SB flows not being created in OVN K8 Stateful set

2020-08-06 Thread Brendan Doyle



On 06/08/2020 16:19, Girish Moodalbail wrote:



On Thu, Aug 6, 2020 at 7:36 AM Brendan Doyle > wrote:


OK thanks, perhaps Girish can comment, I thinking that the steps are

|# Create OVN namespace, service accounts, ovnkube-db headless
service, configmap, and policies kubectl create -f
$HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovn-setup.yaml

# Run ovnkube-db deployment. kubectl apply -f
$HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/
||ovnkube-db-raft.yaml 
# Run ovnkube-master deployment. kubectl
create -f

$HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovnkube-master.yaml

# Run ovnkube daemonset for nodes kubectl create -f
$HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovnkube-node.yaml
|


Yes, those are the steps to get OVN K8s CNI up and running with OVN DB 
running in clustered mode.


However, you also say below


Note I don't want to replace flannel with OVN as the CNI, I just want to
run OVN central in a k8
StatefulSet, that use flannel as the CNI.


So, my question is - What are you trying to do? How are you mixing 
Flannel and OVN DBs?


Do you want to run OVN DBs in clustered mode as a service (or K8s 
application) using Flannel as the CNI for your K8s cluster?


Yes I want to use Flannel as the CNI, and just have the clustered OVN 
DBs as a k8s Service. providing
a HA OVN Central for ovn-controllers on hypervisors in my network. So it 
sounds like the above steps
won't work for me and I have to hand craft/modify the raft yaml to start 
northd, but not use the

rest of the yamls ?





Regards,
~Girish


___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] SB flows not being created in OVN K8 Stateful set

2020-08-06 Thread Girish Moodalbail
On Thu, Aug 6, 2020 at 7:36 AM Brendan Doyle 
wrote:

> OK thanks, perhaps Girish can comment, I thinking that the steps are
>
> # Create OVN namespace, service accounts, ovnkube-db headless service, 
> configmap, and policies
> kubectl create -f 
> $HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovn-setup.yaml
>
> # Run ovnkube-db deployment.
> kubectl apply -f 
> $HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovnkube-db-raft.yaml
>
> # Run ovnkube-master deployment.
> kubectl create -f 
> $HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovnkube-master.yaml
>
> # Run ovnkube daemonset for nodes
> kubectl create -f 
> $HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovnkube-node.yaml
>
>
Yes, those are the steps to get OVN K8s CNI up and running with OVN DB
running in clustered mode.

However, you also say below



> Note I don't want to replace flannel with OVN as the CNI, I just want to
> run OVN central in a k8
> StatefulSet, that use flannel as the CNI.
>
> So, my question is - What are you trying to do? How are you mixing Flannel
and OVN DBs?

Do you want to run OVN DBs in clustered mode as a service (or K8s
application) using Flannel as the CNI for your K8s cluster?

Regards,
~Girish
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] SB flows not being created in OVN K8 Stateful set

2020-08-06 Thread Brendan Doyle

OK thanks, perhaps Girish can comment, I thinking that the steps are

|# Create OVN namespace, service accounts, ovnkube-db headless service, 
configmap, and policies kubectl create -f 
$HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovn-setup.yaml 
# Run ovnkube-db deployment. kubectl apply -f 
$HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/||ovnkube-db-raft.yaml # Run ovnkube-master deployment. kubectl create -f 
$HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovnkube-master.yaml 
# Run ovnkube daemonset for nodes kubectl create -f 
$HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovnkube-node.yaml|



Brendan


On 06/08/2020 14:44, Dumitru Ceara wrote:

On 8/6/20 2:03 PM, Brendan Doyle wrote:


On 06/08/2020 12:31, Dumitru Ceara wrote:

On 8/6/20 11:54 AM, Brendan Doyle wrote:

I don't see any ovn-northd.log log, I only see those when I'm running
OVN outside the k8s cluster.
Before I start the Satefulset on my k8 nodes I run:

ovn-ctl stop_northd
ovn-ctl stop_ovsdb
rm -rf /usr/etc/ovn/*.db


The only logs I see are ovn-controller.log  (I'm running ovn controller
on the K8 nodes) ovsdb-server-nb.log ovsdb-server-sb.log
These logs look normal


ovn-northd is the daemon that translates OVN_Northbound DB records to
OVN_Southbound DB records (including logical flows). If your deployment
doesn't start ovn-northd SB won't get populated.

Well I use the ovnkube-db-raft.yaml and ovn-setup.yaml as per the
documentation, and they call
ovndb-raft-functions.sh and start the OVN procs as:

run_as_ovs_user_if_needed \
   ${OVNCTL_PATH} run_${db}_ovsdb --no-monitor \
   --db-${db}-cluster-local-addr=[${ovn_db_host}] \
   --db-${db}-cluster-local-port=${raft_port} \
   --db-${db}-cluster-local-proto=${transport} \
   ${db_ssl_opts} \
   --ovn-${db}-log="${ovn_loglevel_db}" &


This seems to be starting the NB/SB DBs only.


Are you saying that there is more to do?? as in invoke run-ovn-northd()

Yes, something has to start ovn-northd.


in ovnkube.sh which is called
from the ovnkube-master.yaml file? I see that
https://github.com/ovn-org/ovn-kubernetes says

Apply OVN DaemonSet and Deployment yamls.

|# Create OVN namespace, service accounts, ovnkube-db headless service,
configmap, and policies kubectl create -f
$HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovn-setup.yaml #
Run ovnkube-db deployment. kubectl create -f
$HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovnkube-db.yaml
# Run ovnkube-master deployment. kubectl create -f
$HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovnkube-master.yaml
# Run ovnkube daemonset for nodes kubectl create -f
$HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovnkube-node.yaml
|


But makes no mention of ovnkube-db-raft.yaml ?? so is the above is the
correct procedure?
what then is ovnkube-db-raft.yaml for?

I'm not sure about the ovnkube specifics, I'll let someone from the
ovn-k8s team comment about this. But as mentioned above, something needs
to start ovn-northd otherwise the SB won't get populated.

Regards,
Dumitru


Note I don't want to replace flannel with OVN as the CNI, I just want to
run OVN central in a k8
StatefulSet, that use flannel as the CNI.

Is it documented anywhere how I can do this? I would have thought that
ovnkube-db-raft.yaml
was the way to go, and that it would start all that is needed??

Do I need to run all of the above yamls too, or just a subset? and will
they interfere with ovnkube-db-raft.yaml
and/or flannel? Do I need to just drop ovnkube-db-raft.yaml???

Can someone provide the exact yamls I need to do this please

Thanks


Brendan.





___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] SB flows not being created in OVN K8 Stateful set

2020-08-06 Thread Dumitru Ceara
On 8/6/20 2:03 PM, Brendan Doyle wrote:
> 
> 
> On 06/08/2020 12:31, Dumitru Ceara wrote:
>> On 8/6/20 11:54 AM, Brendan Doyle wrote:
>>> I don't see any ovn-northd.log log, I only see those when I'm running
>>> OVN outside the k8s cluster.
>>> Before I start the Satefulset on my k8 nodes I run:
>>>
>>> ovn-ctl stop_northd
>>> ovn-ctl stop_ovsdb
>>> rm -rf /usr/etc/ovn/*.db
>>>
>>>
>>> The only logs I see are ovn-controller.log  (I'm running ovn controller
>>> on the K8 nodes) ovsdb-server-nb.log ovsdb-server-sb.log
>>> These logs look normal
>>>
>> ovn-northd is the daemon that translates OVN_Northbound DB records to
>> OVN_Southbound DB records (including logical flows). If your deployment
>> doesn't start ovn-northd SB won't get populated.
> 
> Well I use the ovnkube-db-raft.yaml and ovn-setup.yaml as per the
> documentation, and they call
> ovndb-raft-functions.sh and start the OVN procs as:
> 
> run_as_ovs_user_if_needed \
>   ${OVNCTL_PATH} run_${db}_ovsdb --no-monitor \
>   --db-${db}-cluster-local-addr=[${ovn_db_host}] \
>   --db-${db}-cluster-local-port=${raft_port} \
>   --db-${db}-cluster-local-proto=${transport} \
>   ${db_ssl_opts} \
>   --ovn-${db}-log="${ovn_loglevel_db}" &
> 

This seems to be starting the NB/SB DBs only.

> Are you saying that there is more to do?? as in invoke run-ovn-northd()

Yes, something has to start ovn-northd.

> in ovnkube.sh which is called
> from the ovnkube-master.yaml file? I see that
> https://github.com/ovn-org/ovn-kubernetes says
> 
> Apply OVN DaemonSet and Deployment yamls.
> 
> |# Create OVN namespace, service accounts, ovnkube-db headless service,
> configmap, and policies kubectl create -f
> $HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovn-setup.yaml #
> Run ovnkube-db deployment. kubectl create -f
> $HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovnkube-db.yaml
> # Run ovnkube-master deployment. kubectl create -f
> $HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovnkube-master.yaml
> # Run ovnkube daemonset for nodes kubectl create -f
> $HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovnkube-node.yaml
> |
> 
> 
> But makes no mention of ovnkube-db-raft.yaml ?? so is the above is the
> correct procedure?
> what then is ovnkube-db-raft.yaml for?

I'm not sure about the ovnkube specifics, I'll let someone from the
ovn-k8s team comment about this. But as mentioned above, something needs
to start ovn-northd otherwise the SB won't get populated.

Regards,
Dumitru

> 
> Note I don't want to replace flannel with OVN as the CNI, I just want to
> run OVN central in a k8
> StatefulSet, that use flannel as the CNI.
> 
> Is it documented anywhere how I can do this? I would have thought that
> ovnkube-db-raft.yaml
> was the way to go, and that it would start all that is needed??
> 
> Do I need to run all of the above yamls too, or just a subset? and will
> they interfere with ovnkube-db-raft.yaml
> and/or flannel? Do I need to just drop ovnkube-db-raft.yaml???
> 
> Can someone provide the exact yamls I need to do this please
> 
> Thanks
> 
> 
> Brendan.
> 
> 
> 

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] SB flows not being created in OVN K8 Stateful set

2020-08-06 Thread Brendan Doyle



On 06/08/2020 12:31, Dumitru Ceara wrote:

On 8/6/20 11:54 AM, Brendan Doyle wrote:

I don't see any ovn-northd.log log, I only see those when I'm running
OVN outside the k8s cluster.
Before I start the Satefulset on my k8 nodes I run:

ovn-ctl stop_northd
ovn-ctl stop_ovsdb
rm -rf /usr/etc/ovn/*.db


The only logs I see are ovn-controller.log  (I'm running ovn controller
on the K8 nodes) ovsdb-server-nb.log ovsdb-server-sb.log
These logs look normal


ovn-northd is the daemon that translates OVN_Northbound DB records to
OVN_Southbound DB records (including logical flows). If your deployment
doesn't start ovn-northd SB won't get populated.


Well I use the ovnkube-db-raft.yaml and ovn-setup.yaml as per the 
documentation, and they call

ovndb-raft-functions.sh and start the OVN procs as:

run_as_ovs_user_if_needed \
  ${OVNCTL_PATH} run_${db}_ovsdb --no-monitor \
  --db-${db}-cluster-local-addr=[${ovn_db_host}] \
  --db-${db}-cluster-local-port=${raft_port} \
  --db-${db}-cluster-local-proto=${transport} \
  ${db_ssl_opts} \
  --ovn-${db}-log="${ovn_loglevel_db}" &

Are you saying that there is more to do?? as in invoke run-ovn-northd() 
in ovnkube.sh which is called
from the ovnkube-master.yaml file? I see that 
https://github.com/ovn-org/ovn-kubernetes says


Apply OVN DaemonSet and Deployment yamls.

|# Create OVN namespace, service accounts, ovnkube-db headless service, 
configmap, and policies kubectl create -f 
$HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovn-setup.yaml 
# Run ovnkube-db deployment. kubectl create -f 
$HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovnkube-db.yaml 
# Run ovnkube-master deployment. kubectl create -f 
$HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovnkube-master.yaml 
# Run ovnkube daemonset for nodes kubectl create -f 
$HOME/work/src/github.com/ovn-org/ovn-kubernetes/dist/yaml/ovnkube-node.yaml 
|



But makes no mention of ovnkube-db-raft.yaml ?? so is the above is the 
correct procedure?

what then is ovnkube-db-raft.yaml for?

Note I don't want to replace flannel with OVN as the CNI, I just want to 
run OVN central in a k8

StatefulSet, that use flannel as the CNI.

Is it documented anywhere how I can do this? I would have thought that 
ovnkube-db-raft.yaml

was the way to go, and that it would start all that is needed??

Do I need to run all of the above yamls too, or just a subset? and will 
they interfere with ovnkube-db-raft.yaml

and/or flannel? Do I need to just drop ovnkube-db-raft.yaml???

Can someone provide the exact yamls I need to do this please

Thanks


Brendan.




Regards,
Dumitru


k8 node 1 logs

2020-08-05T13:44:51.958Z|1|vlog|INFO|opened log file
/var/log/ovn/ovsdb-server-nb.log
2020-08-05T13:44:51.961Z|2|ovsdb_server|INFO|ovsdb-server (Open
vSwitch) 2.13.90
2020-08-05T13:44:51.961Z|3|reconnect|INFO|tcp:[253.255.0.34]:6643:
connecting...
2020-08-05T13:44:51.961Z|4|reconnect|INFO|tcp:[253.255.0.34]:6643:
connected
2020-08-05T13:44:51.962Z|5|raft_rpc|INFO|learned cluster ID 6e05
2020-08-05T13:44:51.962Z|6|raft|INFO|tcp:[253.255.0.34]:6643:
learned server ID c8a5
2020-08-05T13:44:51.962Z|7|raft|INFO|server c8a5 is leader for term 1
2020-08-05T13:44:51.962Z|8|raft|INFO|rejecting append_request
because previous entry 1,5 not in local log (mismatch past end of log)
2020-08-05T13:44:51.964Z|9|raft|INFO|server c8a5 added to configuration
2020-08-05T13:44:51.966Z|00010|raft|INFO|server 0d3d added to configuration
2020-08-05T13:44:51.966Z|00011|raft|INFO|tcp:253.255.0.34:50744: learned
server ID c8a5
2020-08-05T13:44:51.966Z|00012|raft|INFO|tcp:253.255.0.34:50744: learned
remote address tcp:[253.255.0.34]:6643
2020-08-05T13:44:52.015Z|00013|raft|INFO|tcp:253.255.0.35:51780: learned
server ID 941e
2020-08-05T13:44:52.015Z|00014|raft|INFO|tcp:253.255.0.35:51780: learned
remote address tcp:[253.255.0.35]:6643
2020-08-05T13:44:52.015Z|00015|raft|INFO|adding 941e (941e at
tcp:[253.255.0.35]:6643) to cluster 6e05 failed (not leader)
2020-08-05T13:44:52.015Z|00016|raft|INFO|server 941e added to configuration
2020-08-05T13:44:52.016Z|00017|reconnect|INFO|tcp:[253.255.0.35]:6643:
connecting...
2020-08-05T13:44:52.016Z|00018|reconnect|INFO|tcp:[253.255.0.35]:6643:
connected
2020-08-05T13:45:01.962Z|00019|memory|INFO|8424 kB peak resident set
size after 10.0 seconds
2020-08-05T13:45:01.962Z|00020|memory|INFO|cells:40 monitors:0
raft-connections:4
2020-08-05T14:10:03.593Z|00021|reconnect|INFO|tcp:[253.255.0.34]:6643:
connection closed by peer
2020-08-05T14:10:03.595Z|00022|raft|INFO|server 941e is leader for term 2
2020-08-05T14:10:04.594Z|00023|reconnect|INFO|tcp:[253.255.0.34]:6643:
connecting...
2020-08-05T14:10:04.594Z|00024|reconnect|INFO|tcp:[253.255.0.34]:6643:
connection attempt failed (Connection refused)
2020-08-05T14:10:04.594Z|00025|reconnect|INFO|tcp:[253.255.0.34]:6643:
waiting 2 seconds before reconnect

Re: [ovs-discuss] SB flows not being created in OVN K8 Stateful set

2020-08-06 Thread Dumitru Ceara
On 8/6/20 11:54 AM, Brendan Doyle wrote:
> I don't see any ovn-northd.log log, I only see those when I'm running
> OVN outside the k8s cluster.
> Before I start the Satefulset on my k8 nodes I run:
> 
> ovn-ctl stop_northd
> ovn-ctl stop_ovsdb
> rm -rf /usr/etc/ovn/*.db
> 
> 
> The only logs I see are ovn-controller.log  (I'm running ovn controller
> on the K8 nodes) ovsdb-server-nb.log ovsdb-server-sb.log
> These logs look normal
> 

ovn-northd is the daemon that translates OVN_Northbound DB records to
OVN_Southbound DB records (including logical flows). If your deployment
doesn't start ovn-northd SB won't get populated.

Regards,
Dumitru

> k8 node 1 logs
> 
> 2020-08-05T13:44:51.958Z|1|vlog|INFO|opened log file
> /var/log/ovn/ovsdb-server-nb.log
> 2020-08-05T13:44:51.961Z|2|ovsdb_server|INFO|ovsdb-server (Open
> vSwitch) 2.13.90
> 2020-08-05T13:44:51.961Z|3|reconnect|INFO|tcp:[253.255.0.34]:6643:
> connecting...
> 2020-08-05T13:44:51.961Z|4|reconnect|INFO|tcp:[253.255.0.34]:6643:
> connected
> 2020-08-05T13:44:51.962Z|5|raft_rpc|INFO|learned cluster ID 6e05
> 2020-08-05T13:44:51.962Z|6|raft|INFO|tcp:[253.255.0.34]:6643:
> learned server ID c8a5
> 2020-08-05T13:44:51.962Z|7|raft|INFO|server c8a5 is leader for term 1
> 2020-08-05T13:44:51.962Z|8|raft|INFO|rejecting append_request
> because previous entry 1,5 not in local log (mismatch past end of log)
> 2020-08-05T13:44:51.964Z|9|raft|INFO|server c8a5 added to configuration
> 2020-08-05T13:44:51.966Z|00010|raft|INFO|server 0d3d added to configuration
> 2020-08-05T13:44:51.966Z|00011|raft|INFO|tcp:253.255.0.34:50744: learned
> server ID c8a5
> 2020-08-05T13:44:51.966Z|00012|raft|INFO|tcp:253.255.0.34:50744: learned
> remote address tcp:[253.255.0.34]:6643
> 2020-08-05T13:44:52.015Z|00013|raft|INFO|tcp:253.255.0.35:51780: learned
> server ID 941e
> 2020-08-05T13:44:52.015Z|00014|raft|INFO|tcp:253.255.0.35:51780: learned
> remote address tcp:[253.255.0.35]:6643
> 2020-08-05T13:44:52.015Z|00015|raft|INFO|adding 941e (941e at
> tcp:[253.255.0.35]:6643) to cluster 6e05 failed (not leader)
> 2020-08-05T13:44:52.015Z|00016|raft|INFO|server 941e added to configuration
> 2020-08-05T13:44:52.016Z|00017|reconnect|INFO|tcp:[253.255.0.35]:6643:
> connecting...
> 2020-08-05T13:44:52.016Z|00018|reconnect|INFO|tcp:[253.255.0.35]:6643:
> connected
> 2020-08-05T13:45:01.962Z|00019|memory|INFO|8424 kB peak resident set
> size after 10.0 seconds
> 2020-08-05T13:45:01.962Z|00020|memory|INFO|cells:40 monitors:0
> raft-connections:4
> 2020-08-05T14:10:03.593Z|00021|reconnect|INFO|tcp:[253.255.0.34]:6643:
> connection closed by peer
> 2020-08-05T14:10:03.595Z|00022|raft|INFO|server 941e is leader for term 2
> 2020-08-05T14:10:04.594Z|00023|reconnect|INFO|tcp:[253.255.0.34]:6643:
> connecting...
> 2020-08-05T14:10:04.594Z|00024|reconnect|INFO|tcp:[253.255.0.34]:6643:
> connection attempt failed (Connection refused)
> 2020-08-05T14:10:04.594Z|00025|reconnect|INFO|tcp:[253.255.0.34]:6643:
> waiting 2 seconds before reconnect
> 2020-08-05T14:10:06.594Z|00026|reconnect|INFO|tcp:[253.255.0.34]:6643:
> connecting...
> 2020-08-05T14:10:06.594Z|00027|reconnect|INFO|tcp:[253.255.0.34]:6643:
> connection attempt failed (Connection refused)
> 2020-08-05T14:10:06.594Z|00028|reconnect|INFO|tcp:[253.255.0.34]:6643:
> waiting 4 seconds before reconnect
> 2020-08-05T14:10:08.453Z|00029|raft|INFO|received leadership transfer
> from 941e in term 2
> 2020-08-05T14:10:08.453Z|00030|raft|INFO|term 3: starting election
> 2020-08-05T14:10:08.453Z|00031|reconnect|INFO|tcp:[253.255.0.35]:6643:
> connection closed by peer
> 
> 2020-08-05T13:44:48.971Z|1|vlog|INFO|opened log file
> /var/log/ovn/ovsdb-server-sb.log
> 2020-08-05T13:44:48.973Z|2|ovsdb_server|INFO|ovsdb-server (Open
> vSwitch) 2.13.90
> 2020-08-05T13:44:48.973Z|3|reconnect|INFO|tcp:[253.255.0.34]:6644:
> connecting...
> 2020-08-05T13:44:48.973Z|4|reconnect|INFO|tcp:[253.255.0.34]:6644:
> connected
> 2020-08-05T13:44:48.973Z|5|raft_rpc|INFO|learned cluster ID 281c
> 2020-08-05T13:44:48.973Z|6|raft|INFO|tcp:[253.255.0.34]:6644:
> learned server ID eeed
> 2020-08-05T13:44:48.974Z|7|raft|INFO|server eeed is leader for term 1
> 2020-08-05T13:44:48.974Z|8|raft|INFO|rejecting append_request
> because previous entry 1,4 not in local log (mismatch past end of log)
> 2020-08-05T13:44:48.976Z|9|raft|INFO|server eeed added to configuration
> 2020-08-05T13:44:48.977Z|00010|raft|INFO|server 5098 added to configuration
> 2020-08-05T13:44:48.977Z|00011|raft|INFO|tcp:253.255.0.34:50628: learned
> server ID eeed
> 2020-08-05T13:44:48.977Z|00012|raft|INFO|tcp:253.255.0.34:50628: learned
> remote address tcp:[253.255.0.34]:6644
> 2020-08-05T13:44:49.044Z|00013|raft|INFO|tcp:253.255.0.35:49594: learned
> server ID b9de
> 2020-08-05T13:44:49.044Z|00014|raft|INFO|tcp:253.255.0.35:49594: learned
> remote address tcp:[253.255.0.35]:6644
> 2020-08-05T13:44:49.044Z|00015|raft|INFO|adding 

[ovs-discuss] Performance drop with conntrack flows

2020-08-06 Thread K Venkata Kiran via discuss
Hi,

We see 40% traffic drop with UDP traffic over VxLAN and 20% traffic drop with 
UDP traffic over MPLSoGRE between OVS 2.8.2 & OVS 2.12.1.

We narrowed the drop in performance in our test is due to below commit and 
backing out the commit fixed the performance drop problem.

The commit of concern is :
https://github.com/openvswitch/ovs/commit/967bb5c5cd9070112138d74a2f4394c50ae48420
commit 967bb5c5cd9070112138d74a2f4394c50ae48420
Author: Darrell Ball mailto:dlu...@gmail.com>>
Date:   Thu May 9 08:15:07 2019 -0700
 conntrack: Add rcu support.
We suspect 'ct->ct_lock' lock taken to do 'conn_update_state' and for 
conn_key_lookup could be causing the issue.
Anyone noticed the issue and any pointers on fix? We could not get any obvious 
commit that could solve the issue. Any guidance in solving this issue helps?
Thanks
Kiran

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] SB flows not being created in OVN K8 Stateful set

2020-08-06 Thread Brendan Doyle
I don't see any ovn-northd.log log, I only see those when I'm running 
OVN outside the k8s cluster.

Before I start the Satefulset on my k8 nodes I run:

ovn-ctl stop_northd
ovn-ctl stop_ovsdb
rm -rf /usr/etc/ovn/*.db


The only logs I see are ovn-controller.log  (I'm running ovn controller 
on the K8 nodes) ovsdb-server-nb.log ovsdb-server-sb.log

These logs look normal

k8 node 1 logs

2020-08-05T13:44:51.958Z|1|vlog|INFO|opened log file 
/var/log/ovn/ovsdb-server-nb.log
2020-08-05T13:44:51.961Z|2|ovsdb_server|INFO|ovsdb-server (Open 
vSwitch) 2.13.90
2020-08-05T13:44:51.961Z|3|reconnect|INFO|tcp:[253.255.0.34]:6643: 
connecting...
2020-08-05T13:44:51.961Z|4|reconnect|INFO|tcp:[253.255.0.34]:6643: 
connected

2020-08-05T13:44:51.962Z|5|raft_rpc|INFO|learned cluster ID 6e05
2020-08-05T13:44:51.962Z|6|raft|INFO|tcp:[253.255.0.34]:6643: 
learned server ID c8a5

2020-08-05T13:44:51.962Z|7|raft|INFO|server c8a5 is leader for term 1
2020-08-05T13:44:51.962Z|8|raft|INFO|rejecting append_request 
because previous entry 1,5 not in local log (mismatch past end of log)

2020-08-05T13:44:51.964Z|9|raft|INFO|server c8a5 added to configuration
2020-08-05T13:44:51.966Z|00010|raft|INFO|server 0d3d added to configuration
2020-08-05T13:44:51.966Z|00011|raft|INFO|tcp:253.255.0.34:50744: learned 
server ID c8a5
2020-08-05T13:44:51.966Z|00012|raft|INFO|tcp:253.255.0.34:50744: learned 
remote address tcp:[253.255.0.34]:6643
2020-08-05T13:44:52.015Z|00013|raft|INFO|tcp:253.255.0.35:51780: learned 
server ID 941e
2020-08-05T13:44:52.015Z|00014|raft|INFO|tcp:253.255.0.35:51780: learned 
remote address tcp:[253.255.0.35]:6643
2020-08-05T13:44:52.015Z|00015|raft|INFO|adding 941e (941e at 
tcp:[253.255.0.35]:6643) to cluster 6e05 failed (not leader)

2020-08-05T13:44:52.015Z|00016|raft|INFO|server 941e added to configuration
2020-08-05T13:44:52.016Z|00017|reconnect|INFO|tcp:[253.255.0.35]:6643: 
connecting...
2020-08-05T13:44:52.016Z|00018|reconnect|INFO|tcp:[253.255.0.35]:6643: 
connected
2020-08-05T13:45:01.962Z|00019|memory|INFO|8424 kB peak resident set 
size after 10.0 seconds
2020-08-05T13:45:01.962Z|00020|memory|INFO|cells:40 monitors:0 
raft-connections:4
2020-08-05T14:10:03.593Z|00021|reconnect|INFO|tcp:[253.255.0.34]:6643: 
connection closed by peer

2020-08-05T14:10:03.595Z|00022|raft|INFO|server 941e is leader for term 2
2020-08-05T14:10:04.594Z|00023|reconnect|INFO|tcp:[253.255.0.34]:6643: 
connecting...
2020-08-05T14:10:04.594Z|00024|reconnect|INFO|tcp:[253.255.0.34]:6643: 
connection attempt failed (Connection refused)
2020-08-05T14:10:04.594Z|00025|reconnect|INFO|tcp:[253.255.0.34]:6643: 
waiting 2 seconds before reconnect
2020-08-05T14:10:06.594Z|00026|reconnect|INFO|tcp:[253.255.0.34]:6643: 
connecting...
2020-08-05T14:10:06.594Z|00027|reconnect|INFO|tcp:[253.255.0.34]:6643: 
connection attempt failed (Connection refused)
2020-08-05T14:10:06.594Z|00028|reconnect|INFO|tcp:[253.255.0.34]:6643: 
waiting 4 seconds before reconnect
2020-08-05T14:10:08.453Z|00029|raft|INFO|received leadership transfer 
from 941e in term 2

2020-08-05T14:10:08.453Z|00030|raft|INFO|term 3: starting election
2020-08-05T14:10:08.453Z|00031|reconnect|INFO|tcp:[253.255.0.35]:6643: 
connection closed by peer


2020-08-05T13:44:48.971Z|1|vlog|INFO|opened log file 
/var/log/ovn/ovsdb-server-sb.log
2020-08-05T13:44:48.973Z|2|ovsdb_server|INFO|ovsdb-server (Open 
vSwitch) 2.13.90
2020-08-05T13:44:48.973Z|3|reconnect|INFO|tcp:[253.255.0.34]:6644: 
connecting...
2020-08-05T13:44:48.973Z|4|reconnect|INFO|tcp:[253.255.0.34]:6644: 
connected

2020-08-05T13:44:48.973Z|5|raft_rpc|INFO|learned cluster ID 281c
2020-08-05T13:44:48.973Z|6|raft|INFO|tcp:[253.255.0.34]:6644: 
learned server ID eeed

2020-08-05T13:44:48.974Z|7|raft|INFO|server eeed is leader for term 1
2020-08-05T13:44:48.974Z|8|raft|INFO|rejecting append_request 
because previous entry 1,4 not in local log (mismatch past end of log)

2020-08-05T13:44:48.976Z|9|raft|INFO|server eeed added to configuration
2020-08-05T13:44:48.977Z|00010|raft|INFO|server 5098 added to configuration
2020-08-05T13:44:48.977Z|00011|raft|INFO|tcp:253.255.0.34:50628: learned 
server ID eeed
2020-08-05T13:44:48.977Z|00012|raft|INFO|tcp:253.255.0.34:50628: learned 
remote address tcp:[253.255.0.34]:6644
2020-08-05T13:44:49.044Z|00013|raft|INFO|tcp:253.255.0.35:49594: learned 
server ID b9de
2020-08-05T13:44:49.044Z|00014|raft|INFO|tcp:253.255.0.35:49594: learned 
remote address tcp:[253.255.0.35]:6644
2020-08-05T13:44:49.044Z|00015|raft|INFO|adding b9de (b9de at 
tcp:[253.255.0.35]:6644) to cluster 281c failed (not leader)

2020-08-05T13:44:49.044Z|00016|raft|INFO|server b9de added to configuration
2020-08-05T13:44:49.044Z|00017|reconnect|INFO|tcp:[253.255.0.35]:6644: 
connecting...
2020-08-05T13:44:49.045Z|00018|reconnect|INFO|tcp:[253.255.0.35]:6644: 
connected
2020-08-05T13:44:58.975Z|00019|memory|INFO|8408 kB peak resident set 
size 

Re: [ovs-discuss] [ovs-dev] [OVN] ovn-northd takes much CPU when no configuration update

2020-08-06 Thread Numan Siddique
On Tue, Aug 4, 2020 at 11:31 PM Han Zhou  wrote:

> On Tue, Aug 4, 2020 at 12:38 AM Numan Siddique  wrote:
>
> >
> >
> > On Tue, Aug 4, 2020 at 9:02 AM Tony Liu  wrote:
> >
> >> The probe awakes recomputing?
> >> There is probe every 5 seconds. Without any connection up/down or
> >> failover,
> >> ovn-northd will recompute everything every 5 seconds, no matter what?
> >> Really?
> >>
> >> Anyways, I will increase the probe interval for now, see if that helps.
> >>
> >
> > I think we should optimise this case. I am planning to look into this.
> >
> > Thanks
> > Numan
> >
>
> Thanks Numan.
> I'd like to discuss more on this before we move forward to change anything.
>
> 1) Regarding the problem itself, the CPU cost triggered by OVSDB IDLE probe
> when there is no configuration change to compute, I don't think it matters
> that much in real production. It simply wastes CPU cycles when there is
> nothing to do, so what harm would it do here? For ovn-northd, since it is
> the centralized component, we would always ensure there is enough CPU
> available for ovn-north when computing is needed, and this reservation will
> be wasted anyway when there is no change to compute. So, I'd avoid making
> any change specifically only to address this issue. I could be wrong,
> though. I'd like to hear what would be the real concern if this is not
> addressed.
>

Agree with you.


>
> 2) ovn-northd incremental processing would avoid this CPU problem
> naturally. So let's discuss how to move forward for incremental processing,
> which is much more important because it also solves the CPU efficiency when
> handling the changes, and the IDLE probe problem is just a byproduct. I
> believe the DDlog branch would have solved this problem. However, it seems
> we are not sure about the current status of DDlog. As you proposed at the
> last OVN meeting, an alternative is to implement partial
> incremental-processing using the I-P engine like ovn-controller. While I
> have no objection to this, we'd better check with Ben and Leonid on the
> plan to avoid overlapping and waste of work. @Ben @Leonid, would you mind
> sharing the status here since you were not at the meeting last week?
>
>
My idea is not to address this issue specifically. But add a very
minimalistic I-P support that
avoids unnecessary recomputation to start with. I don't want to spend too
much time or
make it complicated. But start with a simple one which will trigger full
recompute for any NB DB or SB DB change.


Thanks
Numan




Thanks,
> Han
>
> >
> >
> >>
> >>
> >> Thanks!
> >>
> >> Tony
> >>
> >> > -Original Message-
> >> > From: Han Zhou 
> >> > Sent: Monday, August 3, 2020 8:22 PM
> >> > To: Tony Liu 
> >> > Cc: Han Zhou ; ovs-discuss <
> ovs-discuss@openvswitch.org
> >> >;
> >> > ovs-dev 
> >> > Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no
> >> > configuration update
> >> >
> >> > Sorry that I didn't make it clear enough. The OVSDB probe itself
> doesn't
> >> > take much CPU, but the probe awakes ovn-northd main loop, which
> >> > recompute everything, which is why you see CPU spike.
> >> > It will be solved by incremental-processing, when only delta is
> >> > processed, and in case of probe handling, there is no change in
> >> > configuration, so the delta is zero.
> >> > For now, please follow the steps to adjust probe interval, if the CPU
> of
> >> > ovn-northd (when there is no configuration change) is a concern for
> you.
> >> > But please remember that this has no impact to the real CPU usage for
> >> > handling configuration changes.
> >> >
> >> >
> >> > Thanks,
> >> > Han
> >> >
> >> >
> >> > On Mon, Aug 3, 2020 at 8:11 PM Tony Liu  >> >  > wrote:
> >> >
> >> >
> >> >   Health check (5 sec internal) taking 30%-100% CPU is definitely
> >> not
> >> > acceptable,
> >> >   if that's really the case. There must be some blocking (and not
> >> > yielding CPU)
> >> >   in coding, which is not supposed to be there.
> >> >
> >> >   Could you point me to the coding for such health check?
> >> >   Is it single thread? Does it use any event library?
> >> >
> >> >
> >> >   Thanks!
> >> >
> >> >   Tony
> >> >
> >> >   > -Original Message-
> >> >   > From: Han Zhou mailto:hz...@ovn.org> >
> >> >   > Sent: Saturday, August 1, 2020 9:11 PM
> >> >   > To: Tony Liu  >> >  >
> >> >   > Cc: ovs-discuss mailto:ovs-
> >> > disc...@openvswitch.org> >; ovs-dev  >> >   > d...@openvswitch.org  >
> >> >   > Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much CPU
> when
> >> > no
> >> >   > configuration update
> >> >   >
> >> >   >
> >> >   >
> >> >   > On Fri, Jul 31, 2020 at 4:14 PM Tony Liu <
> >> tonyliu0...@hotmail.com
> >> > 
> >> >   >  >> >  > > wrote:
> >> >   >

Re: [ovs-discuss] [OVN] no response to inactivity probe

2020-08-06 Thread Han Zhou
On Wed, Aug 5, 2020 at 9:14 PM Tony Liu  wrote:

> I set the connection target="ptcp:6641:10.6.20.84" for ovn-nb-db
> and "ptcp:6642:10.6.20.84" for ovn-sb-db. .84 is the first node
> of cluster. Also ovn-openflow-probe-interval=30 on compute node.
> It seems helping. Not that many connect/drop/reconnect in logging.
> That "commit failure" is also gone.
> The issue I reported in another thread "packet drop" seems gone.
> And launching VM starts working.
>
> How should I set connection table for all ovn-nb-db and ovn-sb-db
> nodes in the cluster to set inactivity_probe?
> One row with address 0.0.0.0 seems not working.
>

You can simply use 0.0.0.0 in the connection table, but don't specify the
same connection method on the command line when starting ovsdb-server for
NB/SB DB. Otherwise, these are conflicting and that's why you saw "Address
already in use" error.


> Is "external_ids:ovn-remote-probe-interval" in ovsdb-server on
> compute node for ovn-controller to probe ovn-sb-db?
>
> OVSDB probe is bidirectional, so you need to set this value, too, if you
don't want too many probes handled by the SB server. (setting the
connection table for SB only changes the server side).



> Is "external_ids:ovn-openflow-probe-interval" in ovsdb-server on
> compute node for ovn-controller to probe ovsdb-server?
>
> It is for the OpenFlow connection between ovn-controller and ovs-vswitchd,
which is part of the OpenFlow protocol.


> What's probe interval for ovsdb-server to probe ovn-controller?
>
> The local ovsdb connection uses unix socket, which doesn't send probe by
default (if I remember correctly).

For ovn-controller, since it is implemented with incremental-processing,
even if there are probes from openflow or local ovsdb, it doesn't matter.
If there is no configuration change, ovn-controller simply replies the
probe and there is no extra cost.


> Thanks!
>
> Tony
> > -Original Message-
> > From: discuss  On Behalf Of Tony
> > Liu
> > Sent: Wednesday, August 5, 2020 4:29 PM
> > To: Han Zhou 
> > Cc: ovs-dev ; ovs-discuss  > disc...@openvswitch.org>
> > Subject: Re: [ovs-discuss] [OVN] no response to inactivity probe
> >
> > Hi Han,
> >
> > After setting connection target="ptcp:6642:0.0.0.0" for ovn-sb-db, I see
> > this error.
> > 
> > 2020-08-
> > 05T23:01:26.819Z|06799|ovsdb_jsonrpc_server|ERR|ptcp:6642:0.0.0.0:
> > listen failed: Address already in use  Anything I am missing
> > here?
> >
> >
> > Thanks!
> >
> > Tony
> > > -Original Message-
> > > From: Han Zhou 
> > > Sent: Tuesday, August 4, 2020 4:44 PM
> > > To: Tony Liu 
> > > Cc: Numan Siddique ; Han Zhou ; ovs-
> > > discuss ; ovs-dev
> > > 
> > > Subject: Re: [ovs-discuss] [OVN] no response to inactivity probe
> > >
> > >
> > >
> > > On Tue, Aug 4, 2020 at 2:50 PM Tony Liu  > >  > wrote:
> > >
> > >
> > > Hi,
> > >
> > > Since I have 3 OVN DB nodes, should I add 3 rows in connection
> > table
> > > for the inactivity_probe? Or put 3 addresses into one row?
> > >
> > > "set-connection" set one row only, and there is no
> "add-connection".
> > > How should I add 3 rows into the table connection?
> > >
> > >
> > >
> > >
> > > You only need to set one row. Try this command:
> > >
> > > ovn-nbctl -- --id=@conn_uuid create Connection
> > > target="ptcp\:6641\:0.0.0.0" inactivity_probe=0 -- set NB_Global .
> > > connections=@conn_uuid
> > >
> > >
> > >
> > > Thanks!
> > >
> > > Tony
> > >
> > > > -Original Message-
> > > > From: Numan Siddique mailto:num...@ovn.org> >
> > > > Sent: Tuesday, August 4, 2020 12:36 AM
> > > > To: Tony Liu  > >  >
> > > > Cc: ovs-discuss mailto:ovs-
> > > disc...@openvswitch.org> >; ovs-dev  > > > d...@openvswitch.org  >
> > > > Subject: Re: [ovs-discuss] [OVN] no response to inactivity probe
> > > >
> > > >
> > > >
> > > > On Tue, Aug 4, 2020 at 9:12 AM Tony Liu  > > 
> > > >  > >  > > wrote:
> > > >
> > > >
> > > >   In my deployment, on each Neutron server, there are 13
> > > Neutron
> > > > server processes.
> > > >   I see 12 of them (monitor, maintenance, RPC, API) connect
> > > to both
> > > > ovn-nb-db
> > > >   and ovn-sb-db. With 3 Neutron server nodes, that's 36 OVSDB
> > > clients.
> > > >   Is so many clients OK?
> > > >
> > > >   Any suggestions how to figure out which side doesn't
> > > respond the
> > > > probe,
> > > >   if it's bi-directional? I don't see any activities from
> > > logging,
> > > > other than
> > > >   connect/drop and reconnect...
> > > >
> > > >   BTW, please let me know if this is not the right place to
> > > discuss
> > > > Neutron OVN
> > > >   ML2 driver.
> > > >
> > > >
>