Re: [ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker

2019-07-08 Thread Daniel Alvarez Sanchez
On Mon, Jul 8, 2019 at 5:43 PM Ben Pfaff  wrote:
>
> Would you mind formally submitting this?  It seems like the best
> immediate solution.

Will do, thanks a lot Ben!
>
> On Mon, Jul 08, 2019 at 02:27:31PM +0200, Daniel Alvarez Sanchez wrote:
> > I tried a simple patch and it fixes the issue (see below). The
> > question now is, do we want to do this? I think it makes sense to drop
> > *all* the connections when the role changes but I'm curious to see
> > what other people think:
> >
> > diff --git a/ovsdb/jsonrpc-server.c b/ovsdb/jsonrpc-server.c
> > index 4dda63a..ddbbc2e 100644
> > --- a/ovsdb/jsonrpc-server.c
> > +++ b/ovsdb/jsonrpc-server.c
> > @@ -365,7 +365,7 @@ ovsdb_jsonrpc_server_set_read_only(struct
> > ovsdb_jsonrpc_server *svr,
> >  {
> >  if (svr->read_only != read_only) {
> >  svr->read_only = read_only;
> > -ovsdb_jsonrpc_server_reconnect(svr, false,
> > +ovsdb_jsonrpc_server_reconnect(svr, true,
> > xstrdup(read_only
> > ? "making server read-only"
> > : "making server 
> > read/write"));
> >
> >
> > $export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach)
> > $ovn-nbctl ls-add sw0
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> > state: active
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server
> > tcp:192.0.2.2:6641
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> > state: backup
> > connecting: tcp:192.0.2.2:6641
> > $ ovn-nbctl ls-add sw1
> > ovn-nbctl: transaction error: {"details":"insert operation not allowed
> > when database server is in read only mode","error":"not allowed"}
> >
> > On Mon, Jul 8, 2019 at 1:25 PM Daniel Alvarez Sanchez
> >  wrote:
> > >
> > > I *think* that it may not a bug in ovsdb-server but a problem with
> > > ovn-controller as it doesn't seem to be a DB change aware client.
> > >
> > > When the role changes from master to backup or viceversa, connections
> > > are expected to be reestablished for all clients except those that are
> > > not aware of db changes [0] (note the 'false' argument). This flag is
> > > explained here [1] and looks like since ovn-controller is not
> > > monitoring the Database table in the _Server database, then the
> > > connection with it is not re-established. This is just a blind guess
> > > but  I can give it a shot :)
> > >
> > > [0] 
> > > https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L368
> > > [1] 
> > > https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L450-L456
> > >
> > > On Mon, Jul 8, 2019 at 12:45 PM Numan Siddique  
> > > wrote:
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Jul 8, 2019 at 3:52 PM Daniel Alvarez Sanchez 
> > > >  wrote:
> > > >>
> > > >> Hi folks,
> > > >>
> > > >> While working with an OpenStack environment running OVN and
> > > >> ovsdb-server in A/P configuration with Pacemaker we hit an issue that
> > > >> has been probably around for a long time. The bug itself seems to be
> > > >> related with ovsdb-server not updating the read-only flag properly.
> > > >>
> > > >> With a 3 nodes cluster running ovsdb-server in active/passive mode,
> > > >> when we restart the master-node, pacemaker promotes another node as
> > > >> master and moves the associated IPAddr2 resource to it.
> > > >> At this point, ovn-controller instances across the cloud reconnect to
> > > >> the new node but there's a window where ovsdb-server is still running
> > > >> as backup.
> > > >>
> > > >> For those ovn-controller instances that reconnect within that window,
> > > >> every attempt to write in the OVSDB will fail with "operation not
> > > >> allowed when database server is in read only mode". This state will
> > > >> remain forever unless a reconnection is forced. Restarting
> > > >> ovn-controller or killing the connection (for example with tcpkill)
> > > >> will make things work again.
> > > >>
> > > >> A workaround in OVN OCF script could be to wait for the
> > > >> ovsdb_server_promote function to wait until we get 'running/active' on
> > > >> that instance.
> > > >>
> > > >> Another open question is what should clients (in this case,
> > > >> ovn-controller) do in such situation? Shall they log an error and
> > > >> attempt a reconnection (rate limited)?
> > > >
> > > >
> > > > Thanks for reporting this issue Daniel.
> > > >
> > > > I can easily  reproduce the issue with the below commands.
> > > >
> > > > $  > > > $export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach)
> > > > $ovn-nbctl ls-add sw0
> > > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> > > > state: active
> > > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server 
> > > > tcp:192.0.2.2:6641
> > > > $ovs-appctl -t 

Re: [ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker

2019-07-08 Thread Ben Pfaff
Would you mind formally submitting this?  It seems like the best
immediate solution.

On Mon, Jul 08, 2019 at 02:27:31PM +0200, Daniel Alvarez Sanchez wrote:
> I tried a simple patch and it fixes the issue (see below). The
> question now is, do we want to do this? I think it makes sense to drop
> *all* the connections when the role changes but I'm curious to see
> what other people think:
> 
> diff --git a/ovsdb/jsonrpc-server.c b/ovsdb/jsonrpc-server.c
> index 4dda63a..ddbbc2e 100644
> --- a/ovsdb/jsonrpc-server.c
> +++ b/ovsdb/jsonrpc-server.c
> @@ -365,7 +365,7 @@ ovsdb_jsonrpc_server_set_read_only(struct
> ovsdb_jsonrpc_server *svr,
>  {
>  if (svr->read_only != read_only) {
>  svr->read_only = read_only;
> -ovsdb_jsonrpc_server_reconnect(svr, false,
> +ovsdb_jsonrpc_server_reconnect(svr, true,
> xstrdup(read_only
> ? "making server read-only"
> : "making server 
> read/write"));
> 
> 
> $export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach)
> $ovn-nbctl ls-add sw0
> $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> state: active
> $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server
> tcp:192.0.2.2:6641
> $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server
> $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> state: backup
> connecting: tcp:192.0.2.2:6641
> $ ovn-nbctl ls-add sw1
> ovn-nbctl: transaction error: {"details":"insert operation not allowed
> when database server is in read only mode","error":"not allowed"}
> 
> On Mon, Jul 8, 2019 at 1:25 PM Daniel Alvarez Sanchez
>  wrote:
> >
> > I *think* that it may not a bug in ovsdb-server but a problem with
> > ovn-controller as it doesn't seem to be a DB change aware client.
> >
> > When the role changes from master to backup or viceversa, connections
> > are expected to be reestablished for all clients except those that are
> > not aware of db changes [0] (note the 'false' argument). This flag is
> > explained here [1] and looks like since ovn-controller is not
> > monitoring the Database table in the _Server database, then the
> > connection with it is not re-established. This is just a blind guess
> > but  I can give it a shot :)
> >
> > [0] 
> > https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L368
> > [1] 
> > https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L450-L456
> >
> > On Mon, Jul 8, 2019 at 12:45 PM Numan Siddique  wrote:
> > >
> > >
> > >
> > >
> > > On Mon, Jul 8, 2019 at 3:52 PM Daniel Alvarez Sanchez 
> > >  wrote:
> > >>
> > >> Hi folks,
> > >>
> > >> While working with an OpenStack environment running OVN and
> > >> ovsdb-server in A/P configuration with Pacemaker we hit an issue that
> > >> has been probably around for a long time. The bug itself seems to be
> > >> related with ovsdb-server not updating the read-only flag properly.
> > >>
> > >> With a 3 nodes cluster running ovsdb-server in active/passive mode,
> > >> when we restart the master-node, pacemaker promotes another node as
> > >> master and moves the associated IPAddr2 resource to it.
> > >> At this point, ovn-controller instances across the cloud reconnect to
> > >> the new node but there's a window where ovsdb-server is still running
> > >> as backup.
> > >>
> > >> For those ovn-controller instances that reconnect within that window,
> > >> every attempt to write in the OVSDB will fail with "operation not
> > >> allowed when database server is in read only mode". This state will
> > >> remain forever unless a reconnection is forced. Restarting
> > >> ovn-controller or killing the connection (for example with tcpkill)
> > >> will make things work again.
> > >>
> > >> A workaround in OVN OCF script could be to wait for the
> > >> ovsdb_server_promote function to wait until we get 'running/active' on
> > >> that instance.
> > >>
> > >> Another open question is what should clients (in this case,
> > >> ovn-controller) do in such situation? Shall they log an error and
> > >> attempt a reconnection (rate limited)?
> > >
> > >
> > > Thanks for reporting this issue Daniel.
> > >
> > > I can easily  reproduce the issue with the below commands.
> > >
> > > $  > > $export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach)
> > > $ovn-nbctl ls-add sw0
> > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> > > state: active
> > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server 
> > > tcp:192.0.2.2:6641
> > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server
> > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> > > state: backup
> > > connecting: tcp:192.0.2.2:6641
> > > $ovn-nbctl ls-add sw1  --> This should have failed. Since OVN_NB_DAEMON 
> > > is set, ovn-nbctl talks to the
> > >

Re: [ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker

2019-07-08 Thread Ben Pfaff
ovn-controller is in fact change-aware, but the _Server database doesn't
report whether a particular database is read-only or read/write.  I
guess that was an oversight when I designed that schema.  That means
that there's no way for clients to monitor whether a particular database
changes between read-only and read/write.

I guess there are two ways to fix it:

1. Add a read/write column to the _Server schema and implement it in
   ovsdb-server and ovn-controller.

2. Make ovsdb-server kill connections when read/write status changes.

#2 is probably what we should do right away.  #1 can wait.

On Mon, Jul 08, 2019 at 01:25:09PM +0200, Daniel Alvarez Sanchez wrote:
> I *think* that it may not a bug in ovsdb-server but a problem with
> ovn-controller as it doesn't seem to be a DB change aware client.
> 
> When the role changes from master to backup or viceversa, connections
> are expected to be reestablished for all clients except those that are
> not aware of db changes [0] (note the 'false' argument). This flag is
> explained here [1] and looks like since ovn-controller is not
> monitoring the Database table in the _Server database, then the
> connection with it is not re-established. This is just a blind guess
> but  I can give it a shot :)
> 
> [0] 
> https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L368
> [1] 
> https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L450-L456
> 
> On Mon, Jul 8, 2019 at 12:45 PM Numan Siddique  wrote:
> >
> >
> >
> >
> > On Mon, Jul 8, 2019 at 3:52 PM Daniel Alvarez Sanchez  
> > wrote:
> >>
> >> Hi folks,
> >>
> >> While working with an OpenStack environment running OVN and
> >> ovsdb-server in A/P configuration with Pacemaker we hit an issue that
> >> has been probably around for a long time. The bug itself seems to be
> >> related with ovsdb-server not updating the read-only flag properly.
> >>
> >> With a 3 nodes cluster running ovsdb-server in active/passive mode,
> >> when we restart the master-node, pacemaker promotes another node as
> >> master and moves the associated IPAddr2 resource to it.
> >> At this point, ovn-controller instances across the cloud reconnect to
> >> the new node but there's a window where ovsdb-server is still running
> >> as backup.
> >>
> >> For those ovn-controller instances that reconnect within that window,
> >> every attempt to write in the OVSDB will fail with "operation not
> >> allowed when database server is in read only mode". This state will
> >> remain forever unless a reconnection is forced. Restarting
> >> ovn-controller or killing the connection (for example with tcpkill)
> >> will make things work again.
> >>
> >> A workaround in OVN OCF script could be to wait for the
> >> ovsdb_server_promote function to wait until we get 'running/active' on
> >> that instance.
> >>
> >> Another open question is what should clients (in this case,
> >> ovn-controller) do in such situation? Shall they log an error and
> >> attempt a reconnection (rate limited)?
> >
> >
> > Thanks for reporting this issue Daniel.
> >
> > I can easily  reproduce the issue with the below commands.
> >
> > $  > $export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach)
> > $ovn-nbctl ls-add sw0
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> > state: active
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server 
> > tcp:192.0.2.2:6641
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> > state: backup
> > connecting: tcp:192.0.2.2:6641
> > $ovn-nbctl ls-add sw1  --> This should have failed. Since OVN_NB_DAEMON is 
> > set, ovn-nbctl talks to the
> >ovn-nbctl daemon and it is able 
> > to create a logical switch even though the db is in backup mode
> > $unset OVN_NB_DAEMON
> > $ovn-nbctl ls-add sw2
> > ovn-nbctl: transaction error: {"details":"insert operation not allowed when 
> > database server is in read only mode","error":"not allowed"}
> >
> >
> > I looked into the ovsdb-server code, when the user changes the state of the 
> > ovsdb-server, the read_only param of  active ovsdb_server_sessions
> > are not updated.
> >
> > Thanks
> > Numan
> >
> >>
> >> Thoughts?
> >>
> >> Thanks a lot,
> >> Daniel
> >> ___
> >> discuss mailing list
> >> disc...@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker

2019-07-08 Thread Daniel Alvarez Sanchez
I tried a simple patch and it fixes the issue (see below). The
question now is, do we want to do this? I think it makes sense to drop
*all* the connections when the role changes but I'm curious to see
what other people think:

diff --git a/ovsdb/jsonrpc-server.c b/ovsdb/jsonrpc-server.c
index 4dda63a..ddbbc2e 100644
--- a/ovsdb/jsonrpc-server.c
+++ b/ovsdb/jsonrpc-server.c
@@ -365,7 +365,7 @@ ovsdb_jsonrpc_server_set_read_only(struct
ovsdb_jsonrpc_server *svr,
 {
 if (svr->read_only != read_only) {
 svr->read_only = read_only;
-ovsdb_jsonrpc_server_reconnect(svr, false,
+ovsdb_jsonrpc_server_reconnect(svr, true,
xstrdup(read_only
? "making server read-only"
: "making server read/write"));


$export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach)
$ovn-nbctl ls-add sw0
$ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
state: active
$ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server
tcp:192.0.2.2:6641
$ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server
$ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
state: backup
connecting: tcp:192.0.2.2:6641
$ ovn-nbctl ls-add sw1
ovn-nbctl: transaction error: {"details":"insert operation not allowed
when database server is in read only mode","error":"not allowed"}

On Mon, Jul 8, 2019 at 1:25 PM Daniel Alvarez Sanchez
 wrote:
>
> I *think* that it may not a bug in ovsdb-server but a problem with
> ovn-controller as it doesn't seem to be a DB change aware client.
>
> When the role changes from master to backup or viceversa, connections
> are expected to be reestablished for all clients except those that are
> not aware of db changes [0] (note the 'false' argument). This flag is
> explained here [1] and looks like since ovn-controller is not
> monitoring the Database table in the _Server database, then the
> connection with it is not re-established. This is just a blind guess
> but  I can give it a shot :)
>
> [0] 
> https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L368
> [1] 
> https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L450-L456
>
> On Mon, Jul 8, 2019 at 12:45 PM Numan Siddique  wrote:
> >
> >
> >
> >
> > On Mon, Jul 8, 2019 at 3:52 PM Daniel Alvarez Sanchez  
> > wrote:
> >>
> >> Hi folks,
> >>
> >> While working with an OpenStack environment running OVN and
> >> ovsdb-server in A/P configuration with Pacemaker we hit an issue that
> >> has been probably around for a long time. The bug itself seems to be
> >> related with ovsdb-server not updating the read-only flag properly.
> >>
> >> With a 3 nodes cluster running ovsdb-server in active/passive mode,
> >> when we restart the master-node, pacemaker promotes another node as
> >> master and moves the associated IPAddr2 resource to it.
> >> At this point, ovn-controller instances across the cloud reconnect to
> >> the new node but there's a window where ovsdb-server is still running
> >> as backup.
> >>
> >> For those ovn-controller instances that reconnect within that window,
> >> every attempt to write in the OVSDB will fail with "operation not
> >> allowed when database server is in read only mode". This state will
> >> remain forever unless a reconnection is forced. Restarting
> >> ovn-controller or killing the connection (for example with tcpkill)
> >> will make things work again.
> >>
> >> A workaround in OVN OCF script could be to wait for the
> >> ovsdb_server_promote function to wait until we get 'running/active' on
> >> that instance.
> >>
> >> Another open question is what should clients (in this case,
> >> ovn-controller) do in such situation? Shall they log an error and
> >> attempt a reconnection (rate limited)?
> >
> >
> > Thanks for reporting this issue Daniel.
> >
> > I can easily  reproduce the issue with the below commands.
> >
> > $  > $export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach)
> > $ovn-nbctl ls-add sw0
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> > state: active
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server 
> > tcp:192.0.2.2:6641
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> > state: backup
> > connecting: tcp:192.0.2.2:6641
> > $ovn-nbctl ls-add sw1  --> This should have failed. Since OVN_NB_DAEMON is 
> > set, ovn-nbctl talks to the
> >ovn-nbctl daemon and it is able 
> > to create a logical switch even though the db is in backup mode
> > $unset OVN_NB_DAEMON
> > $ovn-nbctl ls-add sw2
> > ovn-nbctl: transaction error: {"details":"insert operation not allowed when 
> > database server is in read only mode","error":"not allowed"}
> >
> >
> > I looked into the ovsdb-server code, when 

Re: [ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker

2019-07-08 Thread Daniel Alvarez Sanchez
I *think* that it may not a bug in ovsdb-server but a problem with
ovn-controller as it doesn't seem to be a DB change aware client.

When the role changes from master to backup or viceversa, connections
are expected to be reestablished for all clients except those that are
not aware of db changes [0] (note the 'false' argument). This flag is
explained here [1] and looks like since ovn-controller is not
monitoring the Database table in the _Server database, then the
connection with it is not re-established. This is just a blind guess
but  I can give it a shot :)

[0] 
https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L368
[1] 
https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L450-L456

On Mon, Jul 8, 2019 at 12:45 PM Numan Siddique  wrote:
>
>
>
>
> On Mon, Jul 8, 2019 at 3:52 PM Daniel Alvarez Sanchez  
> wrote:
>>
>> Hi folks,
>>
>> While working with an OpenStack environment running OVN and
>> ovsdb-server in A/P configuration with Pacemaker we hit an issue that
>> has been probably around for a long time. The bug itself seems to be
>> related with ovsdb-server not updating the read-only flag properly.
>>
>> With a 3 nodes cluster running ovsdb-server in active/passive mode,
>> when we restart the master-node, pacemaker promotes another node as
>> master and moves the associated IPAddr2 resource to it.
>> At this point, ovn-controller instances across the cloud reconnect to
>> the new node but there's a window where ovsdb-server is still running
>> as backup.
>>
>> For those ovn-controller instances that reconnect within that window,
>> every attempt to write in the OVSDB will fail with "operation not
>> allowed when database server is in read only mode". This state will
>> remain forever unless a reconnection is forced. Restarting
>> ovn-controller or killing the connection (for example with tcpkill)
>> will make things work again.
>>
>> A workaround in OVN OCF script could be to wait for the
>> ovsdb_server_promote function to wait until we get 'running/active' on
>> that instance.
>>
>> Another open question is what should clients (in this case,
>> ovn-controller) do in such situation? Shall they log an error and
>> attempt a reconnection (rate limited)?
>
>
> Thanks for reporting this issue Daniel.
>
> I can easily  reproduce the issue with the below commands.
>
> $  $export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach)
> $ovn-nbctl ls-add sw0
> $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> state: active
> $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server 
> tcp:192.0.2.2:6641
> $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server
> $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> state: backup
> connecting: tcp:192.0.2.2:6641
> $ovn-nbctl ls-add sw1  --> This should have failed. Since OVN_NB_DAEMON is 
> set, ovn-nbctl talks to the
>ovn-nbctl daemon and it is able to 
> create a logical switch even though the db is in backup mode
> $unset OVN_NB_DAEMON
> $ovn-nbctl ls-add sw2
> ovn-nbctl: transaction error: {"details":"insert operation not allowed when 
> database server is in read only mode","error":"not allowed"}
>
>
> I looked into the ovsdb-server code, when the user changes the state of the 
> ovsdb-server, the read_only param of  active ovsdb_server_sessions
> are not updated.
>
> Thanks
> Numan
>
>>
>> Thoughts?
>>
>> Thanks a lot,
>> Daniel
>> ___
>> discuss mailing list
>> disc...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker

2019-07-08 Thread Lucas Alvares Gomes
Hi,

Thanks for reporting, Daniel.

On Mon, Jul 8, 2019 at 11:22 AM Daniel Alvarez Sanchez
 wrote:
>
> Hi folks,
>
> While working with an OpenStack environment running OVN and
> ovsdb-server in A/P configuration with Pacemaker we hit an issue that
> has been probably around for a long time. The bug itself seems to be
> related with ovsdb-server not updating the read-only flag properly.
>
> With a 3 nodes cluster running ovsdb-server in active/passive mode,
> when we restart the master-node, pacemaker promotes another node as
> master and moves the associated IPAddr2 resource to it.
> At this point, ovn-controller instances across the cloud reconnect to
> the new node but there's a window where ovsdb-server is still running
> as backup.
>
> For those ovn-controller instances that reconnect within that window,
> every attempt to write in the OVSDB will fail with "operation not
> allowed when database server is in read only mode". This state will
> remain forever unless a reconnection is forced. Restarting
> ovn-controller or killing the connection (for example with tcpkill)
> will make things work again.
>
> A workaround in OVN OCF script could be to wait for the
> ovsdb_server_promote function to wait until we get 'running/active' on
> that instance.
>
> Another open question is what should clients (in this case,
> ovn-controller) do in such situation? Shall they log an error and
> attempt a reconnection (rate limited)?
>

I would say so, ovn-controller _requires_ a read-write session for it
to function properly. Either it can retry to reconnect forever as you
suggested or assert and exit if it's a read-only connection or a
combination of the two (retry first and then exit).

Also, we need to improve the logs for such errors. While debugging the
problem it wasn't "easy" to find why ovn-controller wasn't updating
the database (we were looking into the nb_cfg column of the Chassis
table in the Southbound OVSDB). We've checked the state of the
connection (it was stable), the process was healthy, etc... Was only
when we enabled the DBG log level for ovn-controller that we've
started seeing messages such as:

2019-07-04T15:11:19.522Z|00148|jsonrpc|DBG|tcp:172.17.1.27:6642:
received notification, method="update2",
params=[["monid","OVN_Southbound"],{"Chassis":{"cb669c72-0f84-412c-a3b
f-482119649d85":{"modify":{"nb_cfg":3300]
2019-07-04T15:11:19.522Z|00149|jsonrpc|DBG|tcp:172.17.1.27:6642:
received reply, result=[{"details":"update operation not allowed when
database server is in read only mode","er ror":"not allowed"}],
id=8062

So, perhaps logging it as ERROR would be better because without the
DBG level all we could see in the logs was two INFO messages saying
that it reconnected to the Southbound OVSDB.

Cheers,
Lucas
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker

2019-07-08 Thread Numan Siddique
On Mon, Jul 8, 2019 at 3:52 PM Daniel Alvarez Sanchez 
wrote:

> Hi folks,
>
> While working with an OpenStack environment running OVN and
> ovsdb-server in A/P configuration with Pacemaker we hit an issue that
> has been probably around for a long time. The bug itself seems to be
> related with ovsdb-server not updating the read-only flag properly.
>
> With a 3 nodes cluster running ovsdb-server in active/passive mode,
> when we restart the master-node, pacemaker promotes another node as
> master and moves the associated IPAddr2 resource to it.
> At this point, ovn-controller instances across the cloud reconnect to
> the new node but there's a window where ovsdb-server is still running
> as backup.
>
> For those ovn-controller instances that reconnect within that window,
> every attempt to write in the OVSDB will fail with "operation not
> allowed when database server is in read only mode". This state will
> remain forever unless a reconnection is forced. Restarting
> ovn-controller or killing the connection (for example with tcpkill)
> will make things work again.
>
> A workaround in OVN OCF script could be to wait for the
> ovsdb_server_promote function to wait until we get 'running/active' on
> that instance.
>
> Another open question is what should clients (in this case,
> ovn-controller) do in such situation? Shall they log an error and
> attempt a reconnection (rate limited)?
>

Thanks for reporting this issue Daniel.

I can easily  reproduce the issue with the below commands.

$  This should have failed. Since OVN_NB_DAEMON is
set, ovn-nbctl talks to the
   ovn-nbctl daemon and it is able
to create a logical switch even though the db is in backup mode
$unset OVN_NB_DAEMON
$ovn-nbctl ls-add sw2
ovn-nbctl: transaction error: {"details":"insert operation not allowed when
database server is in read only mode","error":"not allowed"}


I looked into the ovsdb-server code, when the user changes the state of the
ovsdb-server, the read_only param of  active ovsdb_server_sessions
are not updated.

Thanks
Numan


> Thoughts?
>
> Thanks a lot,
> Daniel
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker

2019-07-08 Thread Daniel Alvarez Sanchez
Hi folks,

While working with an OpenStack environment running OVN and
ovsdb-server in A/P configuration with Pacemaker we hit an issue that
has been probably around for a long time. The bug itself seems to be
related with ovsdb-server not updating the read-only flag properly.

With a 3 nodes cluster running ovsdb-server in active/passive mode,
when we restart the master-node, pacemaker promotes another node as
master and moves the associated IPAddr2 resource to it.
At this point, ovn-controller instances across the cloud reconnect to
the new node but there's a window where ovsdb-server is still running
as backup.

For those ovn-controller instances that reconnect within that window,
every attempt to write in the OVSDB will fail with "operation not
allowed when database server is in read only mode". This state will
remain forever unless a reconnection is forced. Restarting
ovn-controller or killing the connection (for example with tcpkill)
will make things work again.

A workaround in OVN OCF script could be to wait for the
ovsdb_server_promote function to wait until we get 'running/active' on
that instance.

Another open question is what should clients (in this case,
ovn-controller) do in such situation? Shall they log an error and
attempt a reconnection (rate limited)?

Thoughts?

Thanks a lot,
Daniel
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss