Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-12 Thread aginwala
On Fri, May 11, 2018 at 5:21 PM, Han Zhou  wrote:

> Thanks for the output. It appears to be more complex than I thought
> before. It is good that the new slave doesn't listen on 6641, although I am
> not sure how is it achieved. I guess a stop&start has been triggered
> instead of simply demote, but I need to spend some time on the pacemaker
> state machine. And please ignore my comment about calling
> ovsdb_server_start() in demote - it would cause recursive call since
> ovsdb_server_start() calls demote(), too.
>
> Regarding the change:
>  if [ "x${present_master}" = x ]; then
> +set $@ --db-nb-create-insecure-remote=yes
> +set $@ --db-sb-create-insecure-remote=yes
>  # No master detected, or the previous master is not among the
>  # set starting.
>
> >>> Sure. Makes sense. Thanks for review.


> This "if" branch is when there is no master present, but in fact we want
> it to be set when current node is master. So this change doesn't affect
> anything. It is the below change that made the test work (so that on slave
> node the tcp port is not opened):
>  elif [ ${present_master} != ${host_name} ]; then
> +set $@ --db-nb-create-insecure-remote=no
> +set $@ --db-sb-create-insecure-remote=no
>
> The error log of ovsdb should not be skipped. We should never bind the LB
> VIP on the ovsdb socket because it is not on the host. I think it is
> related to the code in ovsdb_server_notify():
> ovn-nbctl -- --id=@conn_uuid create Connection \
> target="p${NB_MASTER_PROTO}\:${NB_MASTER_PORT}\:${MASTER_IP}" \
> inactivity_probe=$INACTIVE_PROBE -- set NB_Global . connections=@conn_uuid
>
> 
Thanks for the pointer,
I am able to fix the socket util error by skipping target both for nb and
sb db LB use case. Also, it is getting stamped if we use vritual IPaddr2
heartbeart resource of OCF too with existing feature using L2 VIP IP under
same subnet. Hence, it needs to be skipped for both cases any ways. May be
do we need to handle that in same commit or in different one?
+if [ "x${LISTEN_ON_MASTER_IP_ONLY}" = xyes ]; then
+ovn-nbctl -- --id=@conn_uuid create Connection \
+inactivity_probe=$INACTIVE_PROBE -- set NB_Global . connections=@conn_uuid
+else


> When using LB, we should set 0.0.0.0 here.
>
> Also, the failed action is a concern. We may dig more on the root cause.
> Thanks for finding these issues.
>
 For crm move I am able to now see actual error which resulted in
failed action where move is triggering self replication for a bit.

*2018-05-12T20:08:21.687Z|00021|ovsdb_error|ERR|unexpected ovsdb error:
Server ID check failed: Self replicating is not allowed*

However, functionality is intact.May be due to some race condition via
pacemaker state machine as you pointed out where in crm resource move use
case needs to be handled explicitly? However, reboot node/ service
pacemaker/corosync restart, etc.  do not result in self replicating issues
while promoting the new node. Will also try to see if I can find something
more.


>
> Thanks,
> Han
>
>
> On Fri, May 11, 2018 at 3:29 PM, aginwala  wrote:
>
>> Sure:
>>
>> *VIP_ip* = 10.149.4.252
>> *LB IP* = 10.149.0.40
>> *slave netstat where it syncs from master LB VIP IP *
>> #netstat -an | grep 6641
>> tcp0  0 10.169.129.34:47426 10.149.4.252:6641
>>  ESTABLISHED
>> tcp0  0 10.169.129.34:47444 10.149.4.252:6641
>>  ESTABLISHED
>>
>> *Slave OVS:, *
>> # ps aux |grep ovsdb-server
>> root  7388  0.0  0.0  18048   376 ?Ss   14:08   0:00
>> ovsdb-server: monitoring pid 7389 (healthy)
>> root  7389  0.0  0.0  18464  4556 ?S14:08   0:00
>> ovsdb-server -vconsole:off -vfile:info 
>> --log-file=/var/log/openvswitch/ovsdb-server-nb.log
>> --remote=punix:/var/run/openvswitch/ovnnb_db.sock
>> --pidfile=/var/run/openvswitch/ovnnb_db.pid --unixctl=ovnnb_db.ctl
>> --detach --monitor --remote=db:OVN_Northbound,NB_Global,connections
>> --private-key=db:OVN_Northbound,SSL,private_key
>> --certificate=db:OVN_Northbound,SSL,certificate
>> --ca-cert=db:OVN_Northbound,SSL,ca_cert 
>> --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols
>> --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers --sync-from=tcp:
>> 10.149.4.252:6641 /etc/openvswitch/ovnnb_db.db
>> root  7397  0.0  0.0  18048   372 ?Ss   14:08   0:00
>> ovsdb-server: monitoring pid 7398 (healthy)
>> root  7398  0.0  0.0  18868  5280 ?S14:08   0:01
>> ovsdb-server -vconsole:off -vfile:info 
>> --log-file=/var/log/openvswitch/ovsdb-server-sb.log
>> --remote=punix:/var/run/openvswitch/ovnsb_db.sock
>> --pidfile=/var/run/openvswitch/ovnsb_db.pid --unixctl=ovnsb_db.ctl
>> --detach --monitor --remote=db:OVN_Southbound,SB_Global,connections
>> --private-key=db:OVN_Southbound,SSL,private_key
>> --certificate=db:OVN_Southbound,SSL,certificate
>> --ca-cert=db:OVN_Southbound,SSL,ca_cert 
>> --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
>> --

Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-11 Thread Han Zhou
Thanks for the output. It appears to be more complex than I thought before.
It is good that the new slave doesn't listen on 6641, although I am not
sure how is it achieved. I guess a stop&start has been triggered instead of
simply demote, but I need to spend some time on the pacemaker state
machine. And please ignore my comment about calling ovsdb_server_start() in
demote - it would cause recursive call since ovsdb_server_start() calls
demote(), too.

Regarding the change:
 if [ "x${present_master}" = x ]; then
+set $@ --db-nb-create-insecure-remote=yes
+set $@ --db-sb-create-insecure-remote=yes
 # No master detected, or the previous master is not among the
 # set starting.

This "if" branch is when there is no master present, but in fact we want it
to be set when current node is master. So this change doesn't affect
anything. It is the below change that made the test work (so that on slave
node the tcp port is not opened):
 elif [ ${present_master} != ${host_name} ]; then
+set $@ --db-nb-create-insecure-remote=no
+set $@ --db-sb-create-insecure-remote=no

The error log of ovsdb should not be skipped. We should never bind the LB
VIP on the ovsdb socket because it is not on the host. I think it is
related to the code in ovsdb_server_notify():
ovn-nbctl -- --id=@conn_uuid create Connection \
target="p${NB_MASTER_PROTO}\:${NB_MASTER_PORT}\:${MASTER_IP}" \
inactivity_probe=$INACTIVE_PROBE -- set NB_Global . connections=@conn_uuid

When using LB, we should set 0.0.0.0 here.

Also, the failed action is a concern. We may dig more on the root cause.
Thanks for finding these issues.

Thanks,
Han

On Fri, May 11, 2018 at 3:29 PM, aginwala  wrote:

> Sure:
>
> *VIP_ip* = 10.149.4.252
> *LB IP* = 10.149.0.40
> *slave netstat where it syncs from master LB VIP IP *
> #netstat -an | grep 6641
> tcp0  0 10.169.129.34:47426 10.149.4.252:6641
>  ESTABLISHED
> tcp0  0 10.169.129.34:47444 10.149.4.252:6641
>  ESTABLISHED
>
> *Slave OVS:, *
> # ps aux |grep ovsdb-server
> root  7388  0.0  0.0  18048   376 ?Ss   14:08   0:00
> ovsdb-server: monitoring pid 7389 (healthy)
> root  7389  0.0  0.0  18464  4556 ?S14:08   0:00
> ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/
> openvswitch/ovsdb-server-nb.log 
> --remote=punix:/var/run/openvswitch/ovnnb_db.sock
> --pidfile=/var/run/openvswitch/ovnnb_db.pid --unixctl=ovnnb_db.ctl
> --detach --monitor --remote=db:OVN_Northbound,NB_Global,connections
> --private-key=db:OVN_Northbound,SSL,private_key 
> --certificate=db:OVN_Northbound,SSL,certificate
> --ca-cert=db:OVN_Northbound,SSL,ca_cert 
> --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols
> --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers --sync-from=tcp:
> 10.149.4.252:6641 /etc/openvswitch/ovnnb_db.db
> root  7397  0.0  0.0  18048   372 ?Ss   14:08   0:00
> ovsdb-server: monitoring pid 7398 (healthy)
> root  7398  0.0  0.0  18868  5280 ?S14:08   0:01
> ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/
> openvswitch/ovsdb-server-sb.log 
> --remote=punix:/var/run/openvswitch/ovnsb_db.sock
> --pidfile=/var/run/openvswitch/ovnsb_db.pid --unixctl=ovnsb_db.ctl
> --detach --monitor --remote=db:OVN_Southbound,SB_Global,connections
> --private-key=db:OVN_Southbound,SSL,private_key 
> --certificate=db:OVN_Southbound,SSL,certificate
> --ca-cert=db:OVN_Southbound,SSL,ca_cert 
> --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
> --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers --sync-from=tcp:
> 10.149.4.252:6642 /etc/openvswitch/ovnsb_db.db
>
> *Master netstat where connections is established with LB :*
> netstat -an | grep 6641
> tcp0  0 0.0.0.0:66410.0.0.0:*   LISTEN
> tcp0  0 10.169.129.33:6641  10.149.0.40:47426
>  ESTABLISHED
> tcp0  0 10.169.129.33:6641  10.149.0.40:47444
>  ESTABLISHED
>
> *Master OVS:*
> # ps aux | grep ovsdb-server
> root  3318  0.0  0.0  12940  1012 pts/0S+   15:23   0:00 grep
> --color=auto ovsdb-server
> root 11648  0.0  0.0  18048   372 ?Ss   14:08   0:00
> ovsdb-server: monitoring pid 11649 (healthy)
> root 11649  0.0  0.0  18312  4208 ?S14:08   0:01
> ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/
> openvswitch/ovsdb-server-nb.log 
> --remote=punix:/var/run/openvswitch/ovnnb_db.sock
> --pidfile=/var/run/openvswitch/ovnnb_db.pid --unixctl=ovnnb_db.ctl
> --detach --monitor --remote=db:OVN_Northbound,NB_Global,connections
> --private-key=db:OVN_Northbound,SSL,private_key 
> --certificate=db:OVN_Northbound,SSL,certificate
> --ca-cert=db:OVN_Northbound,SSL,ca_cert 
> --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols
> --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers
> --remote=ptcp:6641:0.0.0.0 --sync-from=tcp:192.0.2.254:6641
> /etc/openvswitch/ovnnb_db.db
> root 11657  0.0  0.0  18048   376 ?Ss   14:08   0:00

Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-11 Thread aginwala
Sure:

*VIP_ip* = 10.149.4.252
*LB IP* = 10.149.0.40
*slave netstat where it syncs from master LB VIP IP *
#netstat -an | grep 6641
tcp0  0 10.169.129.34:47426 10.149.4.252:6641
 ESTABLISHED
tcp0  0 10.169.129.34:47444 10.149.4.252:6641
 ESTABLISHED

*Slave OVS:, *
# ps aux |grep ovsdb-server
root  7388  0.0  0.0  18048   376 ?Ss   14:08   0:00
ovsdb-server: monitoring pid 7389 (healthy)
root  7389  0.0  0.0  18464  4556 ?S14:08   0:00
ovsdb-server -vconsole:off -vfile:info
--log-file=/var/log/openvswitch/ovsdb-server-nb.log
--remote=punix:/var/run/openvswitch/ovnnb_db.sock
--pidfile=/var/run/openvswitch/ovnnb_db.pid --unixctl=ovnnb_db.ctl --detach
--monitor --remote=db:OVN_Northbound,NB_Global,connections
--private-key=db:OVN_Northbound,SSL,private_key
--certificate=db:OVN_Northbound,SSL,certificate
--ca-cert=db:OVN_Northbound,SSL,ca_cert
--ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols
--ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers --sync-from=tcp:
10.149.4.252:6641 /etc/openvswitch/ovnnb_db.db
root  7397  0.0  0.0  18048   372 ?Ss   14:08   0:00
ovsdb-server: monitoring pid 7398 (healthy)
root  7398  0.0  0.0  18868  5280 ?S14:08   0:01
ovsdb-server -vconsole:off -vfile:info
--log-file=/var/log/openvswitch/ovsdb-server-sb.log
--remote=punix:/var/run/openvswitch/ovnsb_db.sock
--pidfile=/var/run/openvswitch/ovnsb_db.pid --unixctl=ovnsb_db.ctl --detach
--monitor --remote=db:OVN_Southbound,SB_Global,connections
--private-key=db:OVN_Southbound,SSL,private_key
--certificate=db:OVN_Southbound,SSL,certificate
--ca-cert=db:OVN_Southbound,SSL,ca_cert
--ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
--ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers --sync-from=tcp:
10.149.4.252:6642 /etc/openvswitch/ovnsb_db.db

*Master netstat where connections is established with LB :*
netstat -an | grep 6641
tcp0  0 0.0.0.0:66410.0.0.0:*   LISTEN
tcp0  0 10.169.129.33:6641  10.149.0.40:47426
 ESTABLISHED
tcp0  0 10.169.129.33:6641  10.149.0.40:47444
 ESTABLISHED

*Master OVS:*
# ps aux | grep ovsdb-server
root  3318  0.0  0.0  12940  1012 pts/0S+   15:23   0:00 grep
--color=auto ovsdb-server
root 11648  0.0  0.0  18048   372 ?Ss   14:08   0:00
ovsdb-server: monitoring pid 11649 (healthy)
root 11649  0.0  0.0  18312  4208 ?S14:08   0:01
ovsdb-server -vconsole:off -vfile:info
--log-file=/var/log/openvswitch/ovsdb-server-nb.log
--remote=punix:/var/run/openvswitch/ovnnb_db.sock
--pidfile=/var/run/openvswitch/ovnnb_db.pid --unixctl=ovnnb_db.ctl --detach
--monitor --remote=db:OVN_Northbound,NB_Global,connections
--private-key=db:OVN_Northbound,SSL,private_key
--certificate=db:OVN_Northbound,SSL,certificate
--ca-cert=db:OVN_Northbound,SSL,ca_cert
--ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols
--ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers --remote=ptcp:6641:0.0.0.0
--sync-from=tcp:192.0.2.254:6641 /etc/openvswitch/ovnnb_db.db
root 11657  0.0  0.0  18048   376 ?Ss   14:08   0:00
ovsdb-server: monitoring pid 11658 (healthy)
root 11658  0.0  0.0  19340  5552 ?S14:08   0:01
ovsdb-server -vconsole:off -vfile:info
--log-file=/var/log/openvswitch/ovsdb-server-sb.log
--remote=punix:/var/run/openvswitch/ovnsb_db.sock
--pidfile=/var/run/openvswitch/ovnsb_db.pid --unixctl=ovnsb_db.ctl --detach
--monitor --remote=db:OVN_Southbound,SB_Global,connections
--private-key=db:OVN_Southbound,SSL,private_key
--certificate=db:OVN_Southbound,SSL,certificate
--ca-cert=db:OVN_Southbound,SSL,ca_cert
--ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
--ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers --remote=ptcp:6642:0.0.0.0
--sync-from=tcp:192.0.2.254:6642 /etc/openvswitch/ovnsb_db.db



Same is for 6642 for sb db. Hope it's clear. Sorry did not post in the
previous message as I thought you already got the point  :) .



Regards,
Aliasgar


On Fri, May 11, 2018 at 3:16 PM, Han Zhou  wrote:

> Ali, could you share output of "ps | grep ovsdb" and "netstat -lpn | grep
> 6641" on the new slave node after you do "crm resource move"?
>
> On Fri, May 11, 2018 at 2:25 PM, aginwala  wrote:
>
>> Thanks Han for more suggestions:
>>
>>
>> I did test failover by gracefully stopping pacemaker+corosync on master
>> node along with crm move and it works as expected too as crm move is
>> triggering promote of new master and hence the new master gets elected
>> along with slave getting demoted as expected to listen on sync-from node.
>> Hence, whatever code change I posted earlier is well and good.
>>
>> # crm stat
>> Stack: corosync
>> Current DC: test-pace1-2365293 (version 1.1.14-70404b0) - partition with
>> quorum
>> 2 nodes and 2 resources configured
>>
>> Online: [ test-pace1-2365293 test-pace2-2365308 ]
>>
>> Full list of resources:
>>
>>  Master/Slave Set: ovndb_servers-master [ovndb_servers]
>>  Masters: [ test-pace2-2365308 ]
>>  Sla

Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-11 Thread Han Zhou
Ali, could you share output of "ps | grep ovsdb" and "netstat -lpn | grep
6641" on the new slave node after you do "crm resource move"?

On Fri, May 11, 2018 at 2:25 PM, aginwala  wrote:

> Thanks Han for more suggestions:
>
>
> I did test failover by gracefully stopping pacemaker+corosync on master
> node along with crm move and it works as expected too as crm move is
> triggering promote of new master and hence the new master gets elected
> along with slave getting demoted as expected to listen on sync-from node.
> Hence, whatever code change I posted earlier is well and good.
>
> # crm stat
> Stack: corosync
> Current DC: test-pace1-2365293 (version 1.1.14-70404b0) - partition with
> quorum
> 2 nodes and 2 resources configured
>
> Online: [ test-pace1-2365293 test-pace2-2365308 ]
>
> Full list of resources:
>
>  Master/Slave Set: ovndb_servers-master [ovndb_servers]
>  Masters: [ test-pace2-2365308 ]
>  Slaves: [ test-pace1-2365293 ]
>
> #crm --debug resource move ovndb_servers test-pace1-2365293
> DEBUG: pacemaker version: [err: ][out: CRM Version: 1.1.14 (70404b0)]
> DEBUG: found pacemaker version: 1.1.14
> DEBUG: invoke: crm_resource --quiet --move -r 'ovndb_servers'
> --node='test-pace1-2365293'
> # crm stat
>
> Stack: corosync
> Current DC: test-pace1-2365293 (version 1.1.14-70404b0) - partition with
> quorum
> 2 nodes and 2 resources configured
>
> Online: [ test-pace1-2365293 test-pace2-2365308 ]
>
> Full list of resources:
>
>  Master/Slave Set: ovndb_servers-master [ovndb_servers]
>  Masters: [ test-pace1-2365293 ]
>  Slaves: [ test-pace2-2365308 ]
>
> Failed Actions:
> * ovndb_servers_monitor_1 on test-pace2-2365308 'master' (8): call=46,
> status=complete, exitreason='none',
> last-rc-change='Fri May 11 14:08:35 2018', queued=0ms, exec=83ms
>
> Note: Failed Actions warning only comes for crm move command and not
> using reboot/kill/service pacemaker/corosync stop/start
>
> I cleaned up the warning using below commad:
> #crm_resource -P
> Waiting for 1 replies from the CRMd. OK
>
> Also wanted to call out above findings noticed that ocf_attribute_target
> is not getting called as per pacemaker logs as code says it will not work
> for older pacemaker versions and not sure what versions exactly as I am on
> version 1.1.14
> # pacemaker logs
>  notice: operation_finished: ovndb_servers_monitor_1:7561:stderr [
> /usr/lib/ocf/resource.d/ovn/ovndb-servers: line 31: ocf_attribute_target:
> command not found ]
>
>
> # Also need nb db logs are showing socket util errors which I think need a
> code change too to skip stamping it as functionality is still working as
> expected (may be in a separate commit since its ovsdb change)
> 018-05-11T21:14:25.958Z|00560|socket_util|ERR|6641:10.149.4.252: bind:
> Cannot assign requested address
> 2018-05-11T21:14:25.958Z|00561|socket_util|ERR|6641:10.149.4.252: bind:
> Cannot assign requested address
> 2018-05-11T21:14:27.859Z|00562|socket_util|ERR|6641:10.149.4.252: bind:
> Cannot assign requested address
>
>
>
> Let me know for any suggestions further.
>
>
> Regards,
> Aliasgar
>
>
> On Thu, May 10, 2018 at 3:49 PM, Han Zhou  wrote:
>
>> Good progress!
>>
>> I think at least one more change is needed to ensure when demote happens,
>> the TCP port is shut down. Otherwise, the LB will be confused again and
>> can't figure out which one is active. This is the graceful failover
>> scenario which can be tested by crm resource move instead of reboot/killing
>> process.
>>
>> This may be done by the same approach you did for promote, i.e. stop
>> ovsdb and then call ovsdb_server_start() so the parameters are reset
>> correctly before starting. Alternatively we can add a command in
>> ovsdb-server, in addition to the commands that switches to/from
>> active/backup modes, to open/close the TCP ports, to avoid restarting
>> during failover, but I am not sure if this is valuable. It depends on
>> whether restarting ovsdb-server during failover is sufficient enough. Could
>> you add the restart logic for demote and try more? Thanks!
>>
>> Thanks,
>> Han
>>
>> On Thu, May 10, 2018 at 1:54 PM, aginwala  wrote:
>>
>>> Hi :
>>>
>>> Just to further update, I am able to re-open tcp port for failover
>>> scenario when new master is getting promoted with additional code changes
>>> as below which do require stop of ovs service on the new selected master to
>>> reset the tcp settings:
>>>
>>>
>>> diff --git a/ovn/utilities/ovndb-servers.ocf
>>> b/ovn/utilities/ovndb-servers.ocf
>>> index 164b6bc..8cb4c25 100755
>>> --- a/ovn/utilities/ovndb-servers.ocf
>>> +++ b/ovn/utilities/ovndb-servers.ocf
>>> @@ -295,8 +295,8 @@ ovsdb_server_start() {
>>>
>>>  set ${OVN_CTL}
>>>
>>> -set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
>>> -set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
>>> +set $@ --db-nb-port=${NB_MASTER_PORT}
>>> +set $@ --db-sb-port=${SB_MASTER_PORT}
>>>
>>>  if [ "x${NB_MASTER_PROTO}" = xtcp ]; 

Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-11 Thread aginwala
Thanks Han for more suggestions:


I did test failover by gracefully stopping pacemaker+corosync on master
node along with crm move and it works as expected too as crm move is
triggering promote of new master and hence the new master gets elected
along with slave getting demoted as expected to listen on sync-from node.
Hence, whatever code change I posted earlier is well and good.

# crm stat
Stack: corosync
Current DC: test-pace1-2365293 (version 1.1.14-70404b0) - partition with
quorum
2 nodes and 2 resources configured

Online: [ test-pace1-2365293 test-pace2-2365308 ]

Full list of resources:

 Master/Slave Set: ovndb_servers-master [ovndb_servers]
 Masters: [ test-pace2-2365308 ]
 Slaves: [ test-pace1-2365293 ]

#crm --debug resource move ovndb_servers test-pace1-2365293
DEBUG: pacemaker version: [err: ][out: CRM Version: 1.1.14 (70404b0)]
DEBUG: found pacemaker version: 1.1.14
DEBUG: invoke: crm_resource --quiet --move -r 'ovndb_servers'
--node='test-pace1-2365293'
# crm stat

Stack: corosync
Current DC: test-pace1-2365293 (version 1.1.14-70404b0) - partition with
quorum
2 nodes and 2 resources configured

Online: [ test-pace1-2365293 test-pace2-2365308 ]

Full list of resources:

 Master/Slave Set: ovndb_servers-master [ovndb_servers]
 Masters: [ test-pace1-2365293 ]
 Slaves: [ test-pace2-2365308 ]

Failed Actions:
* ovndb_servers_monitor_1 on test-pace2-2365308 'master' (8): call=46,
status=complete, exitreason='none',
last-rc-change='Fri May 11 14:08:35 2018', queued=0ms, exec=83ms

Note: Failed Actions warning only comes for crm move command and not using
reboot/kill/service pacemaker/corosync stop/start

I cleaned up the warning using below commad:
#crm_resource -P
Waiting for 1 replies from the CRMd. OK

Also wanted to call out above findings noticed that ocf_attribute_target is
not getting called as per pacemaker logs as code says it will not work for
older pacemaker versions and not sure what versions exactly as I am on
version 1.1.14
# pacemaker logs
 notice: operation_finished: ovndb_servers_monitor_1:7561:stderr [
/usr/lib/ocf/resource.d/ovn/ovndb-servers: line 31: ocf_attribute_target:
command not found ]


# Also need nb db logs are showing socket util errors which I think need a
code change too to skip stamping it as functionality is still working as
expected (may be in a separate commit since its ovsdb change)
018-05-11T21:14:25.958Z|00560|socket_util|ERR|6641:10.149.4.252: bind:
Cannot assign requested address
2018-05-11T21:14:25.958Z|00561|socket_util|ERR|6641:10.149.4.252: bind:
Cannot assign requested address
2018-05-11T21:14:27.859Z|00562|socket_util|ERR|6641:10.149.4.252: bind:
Cannot assign requested address



Let me know for any suggestions further.


Regards,
Aliasgar


On Thu, May 10, 2018 at 3:49 PM, Han Zhou  wrote:

> Good progress!
>
> I think at least one more change is needed to ensure when demote happens,
> the TCP port is shut down. Otherwise, the LB will be confused again and
> can't figure out which one is active. This is the graceful failover
> scenario which can be tested by crm resource move instead of reboot/killing
> process.
>
> This may be done by the same approach you did for promote, i.e. stop ovsdb
> and then call ovsdb_server_start() so the parameters are reset correctly
> before starting. Alternatively we can add a command in ovsdb-server, in
> addition to the commands that switches to/from active/backup modes, to
> open/close the TCP ports, to avoid restarting during failover, but I am not
> sure if this is valuable. It depends on whether restarting ovsdb-server
> during failover is sufficient enough. Could you add the restart logic for
> demote and try more? Thanks!
>
> Thanks,
> Han
>
> On Thu, May 10, 2018 at 1:54 PM, aginwala  wrote:
>
>> Hi :
>>
>> Just to further update, I am able to re-open tcp port for failover
>> scenario when new master is getting promoted with additional code changes
>> as below which do require stop of ovs service on the new selected master to
>> reset the tcp settings:
>>
>>
>> diff --git a/ovn/utilities/ovndb-servers.ocf
>> b/ovn/utilities/ovndb-servers.ocf
>> index 164b6bc..8cb4c25 100755
>> --- a/ovn/utilities/ovndb-servers.ocf
>> +++ b/ovn/utilities/ovndb-servers.ocf
>> @@ -295,8 +295,8 @@ ovsdb_server_start() {
>>
>>  set ${OVN_CTL}
>>
>> -set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
>> -set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
>> +set $@ --db-nb-port=${NB_MASTER_PORT}
>> +set $@ --db-sb-port=${SB_MASTER_PORT}
>>
>>  if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
>>  set $@ --db-nb-create-insecure-remote=yes
>> @@ -307,6 +307,8 @@ ovsdb_server_start() {
>>  fi
>>
>>  if [ "x${present_master}" = x ]; then
>> +set $@ --db-nb-create-insecure-remote=yes
>> +set $@ --db-sb-create-insecure-remote=yes
>>  # No master detected, or the previous master is not among the
>>  # set starting.
>>   

Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-10 Thread Han Zhou
Good progress!

I think at least one more change is needed to ensure when demote happens,
the TCP port is shut down. Otherwise, the LB will be confused again and
can't figure out which one is active. This is the graceful failover
scenario which can be tested by crm resource move instead of reboot/killing
process.

This may be done by the same approach you did for promote, i.e. stop ovsdb
and then call ovsdb_server_start() so the parameters are reset correctly
before starting. Alternatively we can add a command in ovsdb-server, in
addition to the commands that switches to/from active/backup modes, to
open/close the TCP ports, to avoid restarting during failover, but I am not
sure if this is valuable. It depends on whether restarting ovsdb-server
during failover is sufficient enough. Could you add the restart logic for
demote and try more? Thanks!

Thanks,
Han

On Thu, May 10, 2018 at 1:54 PM, aginwala  wrote:

> Hi :
>
> Just to further update, I am able to re-open tcp port for failover
> scenario when new master is getting promoted with additional code changes
> as below which do require stop of ovs service on the new selected master to
> reset the tcp settings:
>
>
> diff --git a/ovn/utilities/ovndb-servers.ocf
> b/ovn/utilities/ovndb-servers.ocf
> index 164b6bc..8cb4c25 100755
> --- a/ovn/utilities/ovndb-servers.ocf
> +++ b/ovn/utilities/ovndb-servers.ocf
> @@ -295,8 +295,8 @@ ovsdb_server_start() {
>
>  set ${OVN_CTL}
>
> -set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
> -set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
> +set $@ --db-nb-port=${NB_MASTER_PORT}
> +set $@ --db-sb-port=${SB_MASTER_PORT}
>
>  if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
>  set $@ --db-nb-create-insecure-remote=yes
> @@ -307,6 +307,8 @@ ovsdb_server_start() {
>  fi
>
>  if [ "x${present_master}" = x ]; then
> +set $@ --db-nb-create-insecure-remote=yes
> +set $@ --db-sb-create-insecure-remote=yes
>  # No master detected, or the previous master is not among the
>  # set starting.
>  #
> @@ -316,6 +318,8 @@ ovsdb_server_start() {
>  set $@ --db-nb-sync-from-addr=${INVALID_IP_ADDRESS}
> --db-sb-sync-from-addr=${INVALID_IP_ADDRESS}
>
>  elif [ ${present_master} != ${host_name} ]; then
> +set $@ --db-nb-create-insecure-remote=no
> +set $@ --db-sb-create-insecure-remote=no
>  # An existing master is active, connect to it
>  set $@ --db-nb-sync-from-addr=${MASTER_IP}
> --db-sb-sync-from-addr=${MASTER_IP}
>  set $@ --db-nb-sync-from-port=${NB_MASTER_PORT}
> @@ -416,6 +420,8 @@ ovsdb_server_promote() {
>  ;;
>  esac
>
> +${OVN_CTL} stop_ovsdb
> +ovsdb_server_start
>  ${OVN_CTL} promote_ovnnb
>  ${OVN_CTL} promote_ovnsb
>
>
>
> Below are the scenarios tested:
> MasterSlaveScenarioResult
>
>-
>
>
>-
>
> reboot/failure New master gets promoted with tcp ports enabled to start
> taking LB traffic.
>
>-
>
>
>-
>
> reboot/failure
> No change and current master continues taking traffic with slave continue
> to sync from master.
>
>-
>
>
>-
>
> reboot/failure
> New master gets promoted with tcp ports enabled to start taking LB traffic.
>
> Also sync on slaves from master works as expected:
> # On master
> ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add  556
> # on slave port is shutdown as expected
> ovn-nbctl --db=tcp:10.169.129.34:6641 show
> ovn-nbctl: tcp:10.169.129.34:6641: database connection failed (Connection
> refused)
> # on slave local unix socket, above lswitch 556 gets replicated too as
> --sync-from=tcp:10.149.4.252:6641
> ovn-nbctl show
> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
>
> # Same testing for sb db too
> # Slave port 6642 is shutdown too
> ovn-sbctl --db=tcp:10.169.129.34:6642 show hangs and
> # Using master ip works
>  ovn-sbctl --db=tcp:10.169.129.33:6642 show
> Chassis "21f12bd6-e9e8-4ee2-afeb-28b331df6715"
> hostname: "test-pace2-2365308.lvs02.dev.ebayc3.com"
> Encap geneve
> ip: "10.169.129.34"
> options: {csum="true"}
>
>
>
> # Accessing via LB vip works fine too as only one member is active:
> for i in `seq 1 500`; do ovn-sbctl --db=tcp:10.149.4.252:6642 show; done
> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
>
>
> Everything works fine as expected. Let me know for any corner case missed.
> I will submit a formal patch using LISTEN_ON_MASTER_IP_ONLY for using LB
> with tcp  to avoid breaking existing functionality accordingly.
>
>
>
> Regards,
> Aliasgar
>
>
>
> On Thu, May 10, 2018 at 9:55 AM, aginwala  wrote:
>
>> Thanks folks for suggestions:
>>
>> For LB vip configurations, I did  the testing further and yes it does
>> tries to hit the slave db as per the 

Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-10 Thread aginwala
On Thu, May 10, 2018 at 1:54 PM, aginwala  wrote:

> Hi :
>
> Just to further update, I am able to re-open tcp port for failover
> scenario when new master is getting promoted with additional code changes
> as below which do require stop of ovs service on the new selected master to
> reset the tcp settings:
>
>
> diff --git a/ovn/utilities/ovndb-servers.ocf
> b/ovn/utilities/ovndb-servers.ocf
> index 164b6bc..8cb4c25 100755
> --- a/ovn/utilities/ovndb-servers.ocf
> +++ b/ovn/utilities/ovndb-servers.ocf
> @@ -295,8 +295,8 @@ ovsdb_server_start() {
>
>  set ${OVN_CTL}
>
> -set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
> -set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
> +set $@ --db-nb-port=${NB_MASTER_PORT}
> +set $@ --db-sb-port=${SB_MASTER_PORT}
>
>  if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
>  set $@ --db-nb-create-insecure-remote=yes
> @@ -307,6 +307,8 @@ ovsdb_server_start() {
>  fi
>
>  if [ "x${present_master}" = x ]; then
> +set $@ --db-nb-create-insecure-remote=yes
> +set $@ --db-sb-create-insecure-remote=yes
>  # No master detected, or the previous master is not among the
>  # set starting.
>  #
> @@ -316,6 +318,8 @@ ovsdb_server_start() {
>  set $@ --db-nb-sync-from-addr=${INVALID_IP_ADDRESS}
> --db-sb-sync-from-addr=${INVALID_IP_ADDRESS}
>
>  elif [ ${present_master} != ${host_name} ]; then
> +set $@ --db-nb-create-insecure-remote=no
> +set $@ --db-sb-create-insecure-remote=no
>  # An existing master is active, connect to it
>  set $@ --db-nb-sync-from-addr=${MASTER_IP}
> --db-sb-sync-from-addr=${MASTER_IP}
>  set $@ --db-nb-sync-from-port=${NB_MASTER_PORT}
> @@ -416,6 +420,8 @@ ovsdb_server_promote() {
>  ;;
>  esac
>
> +${OVN_CTL} stop_ovsdb
> +ovsdb_server_start
>  ${OVN_CTL} promote_ovnnb
>  ${OVN_CTL} promote_ovnsb
>
>
>
> Below are the scenarios tested:
>
>>> updating the test scenario table correctly as it got skipped from
confluence copy

> MasterSlaveScenarioResult
>
>- Reboot master
>
>
>- NA
>
> reboot/failure New master gets promoted with tcp ports enabled to start
> taking LB traffic.
>
>- NA
>
>
>-
>   - Reboot slave
>
>
> reboot/failure
> No change and current master continues taking traffic with slave continue
> to sync from master.
>
>-
>   - Reboot master
>
>
>
>-
>   - Reboot slave
>
>
> reboot/failure
> New master gets promoted with tcp ports enabled to start taking LB traffic.
>
> Also sync on slaves from master works as expected:
> # On master
> ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add  556
> # on slave port is shutdown as expected
> ovn-nbctl --db=tcp:10.169.129.34:6641 show
> ovn-nbctl: tcp:10.169.129.34:6641: database connection failed (Connection
> refused)
> # on slave local unix socket, above lswitch 556 gets replicated too as
> --sync-from=tcp:10.149.4.252:6641
> ovn-nbctl show
> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
>
> # Same testing for sb db too
> # Slave port 6642 is shutdown too
> ovn-sbctl --db=tcp:10.169.129.34:6642 show hangs and
> # Using master ip works
>  ovn-sbctl --db=tcp:10.169.129.33:6642 show
> Chassis "21f12bd6-e9e8-4ee2-afeb-28b331df6715"
> hostname: "test-pace2-2365308.lvs02.dev.ebayc3.com"
> Encap geneve
> ip: "10.169.129.34"
> options: {csum="true"}
>
>
>
> # Accessing via LB vip works fine too as only one member is active:
> for i in `seq 1 500`; do ovn-sbctl --db=tcp:10.149.4.252:664
> 2  show; done
>
>>> Typo as its:  for i in `seq 1 500`; do ovn-nbctl --db=tcp:
10.149.4.252:664 1 show ;done

> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
>
>
> Everything works fine as expected. Let me know for any corner case missed.
> I will submit a formal patch using LISTEN_ON_MASTER_IP_ONLY for using LB
> with tcp  to avoid breaking existing functionality accordingly.
>
>
>
> Regards,
> Aliasgar
>
>
>
> On Thu, May 10, 2018 at 9:55 AM, aginwala  wrote:
>
>> Thanks folks for suggestions:
>>
>> For LB vip configurations, I did  the testing further and yes it does
>> tries to hit the slave db as per the logs below and fails as slave do not
>> have write permission of which LB is not aware of:
>> for i in `seq 1 500`; do ovn-nbctl --db=tcp:10.149.4.252:6641 ls-add
>> $i590;done
>> ovn-nbctl: transaction error: {"details":"insert operation not allowed
>> when database server is in read only mode","error":"not allowed"}
>> ovn-nbctl: transaction error: {"details":"insert operation not allowed
>> when database server is in read only mode","error":"not allowed"}
>> ovn-nbctl: transaction error: {"details":

Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-10 Thread aginwala
Hi :

Just to further update, I am able to re-open tcp port for failover scenario
when new master is getting promoted with additional code changes as below
which do require stop of ovs service on the new selected master to reset
the tcp settings:


diff --git a/ovn/utilities/ovndb-servers.ocf
b/ovn/utilities/ovndb-servers.ocf
index 164b6bc..8cb4c25 100755
--- a/ovn/utilities/ovndb-servers.ocf
+++ b/ovn/utilities/ovndb-servers.ocf
@@ -295,8 +295,8 @@ ovsdb_server_start() {

 set ${OVN_CTL}

-set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
-set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
+set $@ --db-nb-port=${NB_MASTER_PORT}
+set $@ --db-sb-port=${SB_MASTER_PORT}

 if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
 set $@ --db-nb-create-insecure-remote=yes
@@ -307,6 +307,8 @@ ovsdb_server_start() {
 fi

 if [ "x${present_master}" = x ]; then
+set $@ --db-nb-create-insecure-remote=yes
+set $@ --db-sb-create-insecure-remote=yes
 # No master detected, or the previous master is not among the
 # set starting.
 #
@@ -316,6 +318,8 @@ ovsdb_server_start() {
 set $@ --db-nb-sync-from-addr=${INVALID_IP_ADDRESS}
--db-sb-sync-from-addr=${INVALID_IP_ADDRESS}

 elif [ ${present_master} != ${host_name} ]; then
+set $@ --db-nb-create-insecure-remote=no
+set $@ --db-sb-create-insecure-remote=no
 # An existing master is active, connect to it
 set $@ --db-nb-sync-from-addr=${MASTER_IP}
--db-sb-sync-from-addr=${MASTER_IP}
 set $@ --db-nb-sync-from-port=${NB_MASTER_PORT}
@@ -416,6 +420,8 @@ ovsdb_server_promote() {
 ;;
 esac

+${OVN_CTL} stop_ovsdb
+ovsdb_server_start
 ${OVN_CTL} promote_ovnnb
 ${OVN_CTL} promote_ovnsb



Below are the scenarios tested:
MasterSlaveScenarioResult

   -


   -

reboot/failure New master gets promoted with tcp ports enabled to start
taking LB traffic.

   -


   -

reboot/failure
No change and current master continues taking traffic with slave continue
to sync from master.

   -


   -

reboot/failure
New master gets promoted with tcp ports enabled to start taking LB traffic.

Also sync on slaves from master works as expected:
# On master
ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add  556
# on slave port is shutdown as expected
ovn-nbctl --db=tcp:10.169.129.34:6641 show
ovn-nbctl: tcp:10.169.129.34:6641: database connection failed (Connection
refused)
# on slave local unix socket, above lswitch 556 gets replicated too as
--sync-from=tcp:10.149.4.252:6641
ovn-nbctl show
switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)

# Same testing for sb db too
# Slave port 6642 is shutdown too
ovn-sbctl --db=tcp:10.169.129.34:6642 show hangs and
# Using master ip works
 ovn-sbctl --db=tcp:10.169.129.33:6642 show
Chassis "21f12bd6-e9e8-4ee2-afeb-28b331df6715"
hostname: "test-pace2-2365308.lvs02.dev.ebayc3.com"
Encap geneve
ip: "10.169.129.34"
options: {csum="true"}



# Accessing via LB vip works fine too as only one member is active:
for i in `seq 1 500`; do ovn-sbctl --db=tcp:10.149.4.252:6642 show; done
switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)


Everything works fine as expected. Let me know for any corner case missed.
I will submit a formal patch using LISTEN_ON_MASTER_IP_ONLY for using LB
with tcp  to avoid breaking existing functionality accordingly.



Regards,
Aliasgar



On Thu, May 10, 2018 at 9:55 AM, aginwala  wrote:

> Thanks folks for suggestions:
>
> For LB vip configurations, I did  the testing further and yes it does
> tries to hit the slave db as per the logs below and fails as slave do not
> have write permission of which LB is not aware of:
> for i in `seq 1 500`; do ovn-nbctl --db=tcp:10.149.4.252:6641 ls-add
> $i590;done
> ovn-nbctl: transaction error: {"details":"insert operation not allowed
> when database server is in read only mode","error":"not allowed"}
> ovn-nbctl: transaction error: {"details":"insert operation not allowed
> when database server is in read only mode","error":"not allowed"}
> ovn-nbctl: transaction error: {"details":"insert operation not allowed
> when database server is in read only mode","error":"not allowed"}
>
> Hence, with little more code changes(in the same patch without the flag
> variable suggestion), I am able to shutdown the tcp port on the slave and
> it works fine as below:
> #Master Node
> # ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add test444
> #Slave Node
> # ovn-nbctl --db=tcp:10.169.129.34:6641 ls-add test444
> ovn-nbctl: tcp:10.169.129.34:6641: database connection failed (Connection
> refused)
>
> Code to shutdown tcp port on slave db along with only master listening on
> tcp ports:
> diff --git a/ovn/utilities/ovndb-servers.ocf
> b/ov

Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-10 Thread aginwala
Thanks folks for suggestions:

For LB vip configurations, I did  the testing further and yes it does tries
to hit the slave db as per the logs below and fails as slave do not have
write permission of which LB is not aware of:
for i in `seq 1 500`; do ovn-nbctl --db=tcp:10.149.4.252:6641 ls-add
$i590;done
ovn-nbctl: transaction error: {"details":"insert operation not allowed when
database server is in read only mode","error":"not allowed"}
ovn-nbctl: transaction error: {"details":"insert operation not allowed when
database server is in read only mode","error":"not allowed"}
ovn-nbctl: transaction error: {"details":"insert operation not allowed when
database server is in read only mode","error":"not allowed"}

Hence, with little more code changes(in the same patch without the flag
variable suggestion), I am able to shutdown the tcp port on the slave and
it works fine as below:
#Master Node
# ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add test444
#Slave Node
# ovn-nbctl --db=tcp:10.169.129.34:6641 ls-add test444
ovn-nbctl: tcp:10.169.129.34:6641: database connection failed (Connection
refused)

Code to shutdown tcp port on slave db along with only master listening on
tcp ports:
diff --git a/ovn/utilities/ovndb-servers.ocf
b/ovn/utilities/ovndb-servers.ocf
index 164b6bc..b265df6 100755
--- a/ovn/utilities/ovndb-servers.ocf
+++ b/ovn/utilities/ovndb-servers.ocf
@@ -295,8 +295,8 @@ ovsdb_server_start() {

 set ${OVN_CTL}

-set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
-set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
+set $@ --db-nb-port=${NB_MASTER_PORT}
+set $@ --db-sb-port=${SB_MASTER_PORT}

 if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
 set $@ --db-nb-create-insecure-remote=yes
@@ -307,6 +307,8 @@ ovsdb_server_start() {
 fi

 if [ "x${present_master}" = x ]; then
+set $@ --db-nb-create-insecure-remote=yes
+set $@ --db-sb-create-insecure-remote=yes
 # No master detected, or the previous master is not among the
 # set starting.
 #
@@ -316,6 +318,8 @@ ovsdb_server_start() {
 set $@ --db-nb-sync-from-addr=${INVALID_IP_ADDRESS}
--db-sb-sync-from-addr=${INVALID_IP_ADDR

 elif [ ${present_master} != ${host_name} ]; then
+set $@ --db-nb-create-insecure-remote=no
+set $@ --db-sb-create-insecure-remote=no


But I noticed that if the slave becomes active post failover after active
node reboot/failure, pacemaker shows it online but I am not able to access
the dbs.

# crm status
Online: [ test-pace2-2365308 ]
OFFLINE: [ test-pace1-2365293 ]

Full list of resources:

 Master/Slave Set: ovndb_servers-master [ovndb_servers]
 Masters: [ test-pace2-2365308 ]
 Stopped: [ test-pace1-2365293 ]


# ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add test444
ovn-nbctl: tcp:10.169.129.33:6641: database connection failed (Connection
refused)
# ovn-nbctl --db=tcp:10.169.129.34:6641 ls-add test444
ovn-nbctl: tcp:10.169.129.34:6641: database connection failed (Connection
refused)

Hence, if failover happens, slave is already running with
--sync-from=lbVIP:6641/6642 for nb and sb db respectively. Thus, re-opening
of tcp ports for nb and sb db on the slave that is getting promoted to
master is not happening automatically.

Let me know if there is a valid way/approach too which I am missing to
handle it during slave promote logic?  Will do further code changes
accordingly.

Note: Current code changes for use with LB will needs to be handled for ssl
too. Will have to handle that separately but want to get the tcp working
first and we can add ssl support later.


Regards,
Aliasgar

On Wed, May 9, 2018 at 12:19 PM, Numan Siddique  wrote:

>
>
> On Thu, May 10, 2018 at 12:44 AM, Han Zhou  wrote:
>
>>
>>
>> On Wed, May 9, 2018 at 11:51 AM, Numan Siddique 
>> wrote:
>>
>>>
>>>
>>> On Thu, May 10, 2018 at 12:15 AM, Han Zhou  wrote:
>>>
 Thanks Ali for the quick patch. Please see my comments inline.

 On Wed, May 9, 2018 at 9:30 AM, aginwala  wrote:
 >
 > Thanks Han and Numan for the clarity to help sort it out.
 >
 > For making vip work with using LB in my two node setup, I had changed
 below code to skip setting master IP  when creating pcs resource for ovndbs
 and listen on 0.0.0.0 instead. Hence, the discussion seems inline with the
 code change which is small for sure as below:
 >
 >
 > diff --git a/ovn/utilities/ovndb-servers.ocf
 b/ovn/utilities/ovndb-servers.ocf
 > index 164b6bc..d4c9ad7 100755
 > --- a/ovn/utilities/ovndb-servers.ocf
 > +++ b/ovn/utilities/ovndb-servers.ocf
 > @@ -295,8 +295,8 @@ ovsdb_server_start() {
 >
 >  set ${OVN_CTL}
 >
 > -set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
 > -set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
 > +set $@ --db-nb-port=${NB_MASTER_PORT}
 > +set $@ --db-sb-port=${SB_MASTER_PORT}
 >
 >  if [ "x

Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-09 Thread Numan Siddique
On Thu, May 10, 2018 at 12:44 AM, Han Zhou  wrote:

>
>
> On Wed, May 9, 2018 at 11:51 AM, Numan Siddique 
> wrote:
>
>>
>>
>> On Thu, May 10, 2018 at 12:15 AM, Han Zhou  wrote:
>>
>>> Thanks Ali for the quick patch. Please see my comments inline.
>>>
>>> On Wed, May 9, 2018 at 9:30 AM, aginwala  wrote:
>>> >
>>> > Thanks Han and Numan for the clarity to help sort it out.
>>> >
>>> > For making vip work with using LB in my two node setup, I had changed
>>> below code to skip setting master IP  when creating pcs resource for ovndbs
>>> and listen on 0.0.0.0 instead. Hence, the discussion seems inline with the
>>> code change which is small for sure as below:
>>> >
>>> >
>>> > diff --git a/ovn/utilities/ovndb-servers.ocf
>>> b/ovn/utilities/ovndb-servers.ocf
>>> > index 164b6bc..d4c9ad7 100755
>>> > --- a/ovn/utilities/ovndb-servers.ocf
>>> > +++ b/ovn/utilities/ovndb-servers.ocf
>>> > @@ -295,8 +295,8 @@ ovsdb_server_start() {
>>> >
>>> >  set ${OVN_CTL}
>>> >
>>> > -set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
>>> > -set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
>>> > +set $@ --db-nb-port=${NB_MASTER_PORT}
>>> > +set $@ --db-sb-port=${SB_MASTER_PORT}
>>> >
>>> >  if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
>>> >  set $@ --db-nb-create-insecure-remote=yes
>>> >
>>>
>>> This change solves the IP binding problem. It will just listen on
>>> 0.0.0.0.
>>>
>>
>> One problem with this approach I see is that it would listen on all the
>> IPs. May be it's not a good idea and may have some security issues.
>>
>> Can we instead check the value of  MASTER_IP param something like below ?
>>
>>  if [ "$MASTER_IP" == "0.0.0.0" ]; then
>>  set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
>>  set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
>> else
>>  set $@ --db-nb-port=${NB_MASTER_PORT}
>>  set $@ --db-sb-port=${SB_MASTER_PORT}
>> fi
>>
>> And when you create OVN pacemaker resource in your deployment, you can
>> pass master_ip=0.0.0.0
>>
>> Will this work ?
>>
>>
> Maybe some misunderstanding here. We still need to use master_ip = LB VIP,
> so that the standby nodes can "sync-from" the active node. So we cannot
> pass 0.0.0.0 explicitly.
>

I misunderstood earlier. I thought you wouldn't need master ip at all.
Thanks for the clarification.

>
> I didn't understand your code above either. Why would we specify the
> master_ip if we know it is 0.0.0.0? Or do you mean the other way around but
> just a typo in the code?
>
> For security of listening on any IP, I am not quit sure. It may be a
> problem if the nodes sits on multiple networks and some of them are
> considered insecure, and you want to listen on the security one only. If
> this is the concern, we can add a parameter e.g. LISTEN_ON_MASTER_IP_ONLY,
> and set it to true by default. What do you think?
>

I would prefer adding the parameter as you have suggested so that the
existing behavior remain intact.

Thanks
Numan


> Thanks,
> Han
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-09 Thread Han Zhou
On Wed, May 9, 2018 at 11:51 AM, Numan Siddique  wrote:

>
>
> On Thu, May 10, 2018 at 12:15 AM, Han Zhou  wrote:
>
>> Thanks Ali for the quick patch. Please see my comments inline.
>>
>> On Wed, May 9, 2018 at 9:30 AM, aginwala  wrote:
>> >
>> > Thanks Han and Numan for the clarity to help sort it out.
>> >
>> > For making vip work with using LB in my two node setup, I had changed
>> below code to skip setting master IP  when creating pcs resource for ovndbs
>> and listen on 0.0.0.0 instead. Hence, the discussion seems inline with the
>> code change which is small for sure as below:
>> >
>> >
>> > diff --git a/ovn/utilities/ovndb-servers.ocf
>> b/ovn/utilities/ovndb-servers.ocf
>> > index 164b6bc..d4c9ad7 100755
>> > --- a/ovn/utilities/ovndb-servers.ocf
>> > +++ b/ovn/utilities/ovndb-servers.ocf
>> > @@ -295,8 +295,8 @@ ovsdb_server_start() {
>> >
>> >  set ${OVN_CTL}
>> >
>> > -set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
>> > -set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
>> > +set $@ --db-nb-port=${NB_MASTER_PORT}
>> > +set $@ --db-sb-port=${SB_MASTER_PORT}
>> >
>> >  if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
>> >  set $@ --db-nb-create-insecure-remote=yes
>> >
>>
>> This change solves the IP binding problem. It will just listen on 0.0.0.0.
>>
>
> One problem with this approach I see is that it would listen on all the
> IPs. May be it's not a good idea and may have some security issues.
>
> Can we instead check the value of  MASTER_IP param something like below ?
>
>  if [ "$MASTER_IP" == "0.0.0.0" ]; then
>  set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
>  set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
> else
>  set $@ --db-nb-port=${NB_MASTER_PORT}
>  set $@ --db-sb-port=${SB_MASTER_PORT}
> fi
>
> And when you create OVN pacemaker resource in your deployment, you can
> pass master_ip=0.0.0.0
>
> Will this work ?
>
>
Maybe some misunderstanding here. We still need to use master_ip = LB VIP,
so that the standby nodes can "sync-from" the active node. So we cannot
pass 0.0.0.0 explicitly.

I didn't understand your code above either. Why would we specify the
master_ip if we know it is 0.0.0.0? Or do you mean the other way around but
just a typo in the code?

For security of listening on any IP, I am not quit sure. It may be a
problem if the nodes sits on multiple networks and some of them are
considered insecure, and you want to listen on the security one only. If
this is the concern, we can add a parameter e.g. LISTEN_ON_MASTER_IP_ONLY,
and set it to true by default. What do you think?

Thanks,
Han
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-09 Thread Numan Siddique
On Thu, May 10, 2018 at 12:15 AM, Han Zhou  wrote:

> Thanks Ali for the quick patch. Please see my comments inline.
>
> On Wed, May 9, 2018 at 9:30 AM, aginwala  wrote:
> >
> > Thanks Han and Numan for the clarity to help sort it out.
> >
> > For making vip work with using LB in my two node setup, I had changed
> below code to skip setting master IP  when creating pcs resource for ovndbs
> and listen on 0.0.0.0 instead. Hence, the discussion seems inline with the
> code change which is small for sure as below:
> >
> >
> > diff --git a/ovn/utilities/ovndb-servers.ocf
> b/ovn/utilities/ovndb-servers.ocf
> > index 164b6bc..d4c9ad7 100755
> > --- a/ovn/utilities/ovndb-servers.ocf
> > +++ b/ovn/utilities/ovndb-servers.ocf
> > @@ -295,8 +295,8 @@ ovsdb_server_start() {
> >
> >  set ${OVN_CTL}
> >
> > -set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
> > -set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
> > +set $@ --db-nb-port=${NB_MASTER_PORT}
> > +set $@ --db-sb-port=${SB_MASTER_PORT}
> >
> >  if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
> >  set $@ --db-nb-create-insecure-remote=yes
> >
>
> This change solves the IP binding problem. It will just listen on 0.0.0.0.
>

One problem with this approach I see is that it would listen on all the
IPs. May be it's not a good idea and may have some security issues.

Can we instead check the value of  MASTER_IP param something like below ?

 if [ "$MASTER_IP" == "0.0.0.0" ]; then
 set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
 set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
else
 set $@ --db-nb-port=${NB_MASTER_PORT}
 set $@ --db-sb-port=${SB_MASTER_PORT}
fi

And when you create OVN pacemaker resource in your deployment, you can pass
master_ip=0.0.0.0

Will this work ?

Thanks
Numan

However, another problem is that we should let LB to do health check with
> TCP port, and point only to the master. This requires that standby NB/SBs
> do not listen on the same TCP ports, so we can make one more change so that
> if the NB/SB is on slave, they start with unix socket only.
>
> >
> > Results:
> > # accessing via LB VIP
> > ovn-nbctl --db=tcp:10.149.7.56:6641 show
> > switch bb130c99-a00d-43cf-b40a-9c6fb1df5ed7 (ls666)
> > ovn-nbctl --db=tcp:10.149.7.56:6641 ls-add ls55
> > # accessing via active node pool member
> > root@test-pace2-2365308:~# ovn-nbctl --db=tcp:10.169.129.33:6641 show
> > switch bb130c99-a00d-43cf-b40a-9c6fb1df5ed7 (ls666)
> > switch 41922d23-3430-436d-b67a-00422367a653 (ls55)
> > # accessing using standby node pool member
> > root@test-pace2-2365308:~# ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add
> lss
> > ovn-nbctl: transaction error: {"details":"insert operation not allowed
> when database serv
> > # using connect string and skip using VIP resource just for reading db
> and not for writing.
> > ovn-nbctl --db=tcp:10.169.129.34:6641,tcp:10.169.129.33:6641 show
> >
> > I am pointing northd and ovn-controller to the db vip which works as
> expected too.
> >
> > For northd, we can use local unix socket too which is valid as I have
> tested both ways by keeping it running on both nodes. I think its just a
> personal pref to use vip or unix socket as both are valid for northd. I
> think that we might need to update the documentation too with above details.
> >
> > I will send a formal patch along with documentation update. Let me know
> if there are other suggestions too in case anything is missed.
> >
> >
> > Regards,
> > Aliasgar
> >
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-09 Thread Han Zhou
Thanks Ali for the quick patch. Please see my comments inline.

On Wed, May 9, 2018 at 9:30 AM, aginwala  wrote:
>
> Thanks Han and Numan for the clarity to help sort it out.
>
> For making vip work with using LB in my two node setup, I had changed
below code to skip setting master IP  when creating pcs resource for ovndbs
and listen on 0.0.0.0 instead. Hence, the discussion seems inline with the
code change which is small for sure as below:
>
>
> diff --git a/ovn/utilities/ovndb-servers.ocf
b/ovn/utilities/ovndb-servers.ocf
> index 164b6bc..d4c9ad7 100755
> --- a/ovn/utilities/ovndb-servers.ocf
> +++ b/ovn/utilities/ovndb-servers.ocf
> @@ -295,8 +295,8 @@ ovsdb_server_start() {
>
>  set ${OVN_CTL}
>
> -set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
> -set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
> +set $@ --db-nb-port=${NB_MASTER_PORT}
> +set $@ --db-sb-port=${SB_MASTER_PORT}
>
>  if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
>  set $@ --db-nb-create-insecure-remote=yes
>

This change solves the IP binding problem. It will just listen on 0.0.0.0.
However, another problem is that we should let LB to do health check with
TCP port, and point only to the master. This requires that standby NB/SBs
do not listen on the same TCP ports, so we can make one more change so that
if the NB/SB is on slave, they start with unix socket only.

>
> Results:
> # accessing via LB VIP
> ovn-nbctl --db=tcp:10.149.7.56:6641 show
> switch bb130c99-a00d-43cf-b40a-9c6fb1df5ed7 (ls666)
> ovn-nbctl --db=tcp:10.149.7.56:6641 ls-add ls55
> # accessing via active node pool member
> root@test-pace2-2365308:~# ovn-nbctl --db=tcp:10.169.129.33:6641 show
> switch bb130c99-a00d-43cf-b40a-9c6fb1df5ed7 (ls666)
> switch 41922d23-3430-436d-b67a-00422367a653 (ls55)
> # accessing using standby node pool member
> root@test-pace2-2365308:~# ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add
lss
> ovn-nbctl: transaction error: {"details":"insert operation not allowed
when database serv
> # using connect string and skip using VIP resource just for reading db
and not for writing.
> ovn-nbctl --db=tcp:10.169.129.34:6641,tcp:10.169.129.33:6641 show
>
> I am pointing northd and ovn-controller to the db vip which works as
expected too.
>
> For northd, we can use local unix socket too which is valid as I have
tested both ways by keeping it running on both nodes. I think its just a
personal pref to use vip or unix socket as both are valid for northd. I
think that we might need to update the documentation too with above details.
>
> I will send a formal patch along with documentation update. Let me know
if there are other suggestions too in case anything is missed.
>
>
> Regards,
> Aliasgar
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-09 Thread aginwala
Thanks Han and Numan for the clarity to help sort it out.

For making vip work with using LB in my two node setup, I had changed below
code to skip setting master IP  when creating pcs resource for ovndbs and
listen on 0.0.0.0 instead. Hence, the discussion seems inline with the code
change which is small for sure as below:


diff --git a/ovn/utilities/ovndb-servers.ocf b/ovn/utilities/ovndb-servers.
ocf
index 164b6bc..d4c9ad7 100755
--- a/ovn/utilities/ovndb-servers.ocf
+++ b/ovn/utilities/ovndb-servers.ocf
@@ -295,8 +295,8 @@ ovsdb_server_start() {

 set ${OVN_CTL}

-set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
-set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
+set $@ --db-nb-port=${NB_MASTER_PORT}
+set $@ --db-sb-port=${SB_MASTER_PORT}

 if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
 set $@ --db-nb-create-insecure-remote=yes


Results:
# accessing via LB VIP
ovn-nbctl --db=tcp:10.149.7.56:6641 show
switch bb130c99-a00d-43cf-b40a-9c6fb1df5ed7 (ls666)
ovn-nbctl --db=tcp:10.149.7.56:6641 ls-add ls55
# accessing via active node pool member
root@test-pace2-2365308:~# ovn-nbctl --db=tcp:10.169.129.33:6641 show
switch bb130c99-a00d-43cf-b40a-9c6fb1df5ed7 (ls666)
switch 41922d23-3430-436d-b67a-00422367a653 (ls55)
# accessing using standby node pool member
root@test-pace2-2365308:~# ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add
lss
ovn-nbctl: transaction error: {"details":"insert operation not allowed when
database serv
# using connect string and skip using VIP resource just for reading db and
not for writing.
ovn-nbctl --db=tcp:10.169.129.34:6641,tcp:10.169.129.33:6641 show

I am pointing northd and ovn-controller to the db vip which works as
expected too.

For northd, we can use local unix socket too which is valid as I have
tested both ways by keeping it running on both nodes. I think its just a
personal pref to use vip or unix socket as both are valid for northd. I
think that we might need to update the documentation too with above details.

I will send a formal patch along with documentation update. Let me know if
there are other suggestions too in case anything is missed.


Regards,
Aliasgar


On Wed, May 9, 2018 at 9:18 AM, Han Zhou  wrote:

>
>
> On Wed, May 9, 2018 at 9:02 AM, Numan Siddique 
> wrote:
>
>>
>>
>> On Wed, May 9, 2018 at 9:02 PM, Han Zhou  wrote:
>>
>>> Hi Numan,
>>>
>>> Thanks you so much for the detailed answer! Please see my comments
>>> inline.
>>>
>>> On Wed, May 9, 2018 at 7:41 AM, Numan Siddique 
>>> wrote:
>>>
 Hi Han,

 Please see below for inline comments

 On Wed, May 9, 2018 at 5:17 AM, Han Zhou  wrote:

> Hi Babu/Numan,
>
> I have a question regarding OVN pacemaker OCF script.
> I see in the script MASTER_IP is used to start the active DB and
> standby DBs will use that IP to sync from.
>
> In the Documentation/topics/integration.rst it is also mentioned:
>
> `master_ip` is the IP address on which the active database server is
> expected to be listening, the slave node uses it to connect to the master
> node.
>
> However, since active node will change after failover, I wonder if we
> should provide all the IPs of each nodes, and let pacemaker to decide 
> which
> IP is the master IP to be used, dynamically.
>



> I see in the documentation it is mentioned about using the IPAddr2
> resource for virtual IP. Does it indicate that we should use the virtual 
> IP
> as the master IP?
>

 That is true. If the master ip is not virtual ip, then we will not be
 able to figure out which is the master node. We need to configure
 networking-ovn and ovn-controller to point to the right master node so that
 they can do write transactions on the DB.

 Below is how we have configured pacemaker OVN HA dbs in tripleo
 openstack deployment

  - Tripleo deployment creates many virtual IPs (using IPAddr2) and
 these IP addresses are frontend IPs for keystone and all other openstack
 API services and haproxy is used to load balance the traffic (the
 deployment will mostly have 3 controllers and all the openstack API
 services will be running on each node).

  - We choose one of the IPaddr2 virtual ip and we set a colocation
 constraint when creating the OVN pacemaker HA db resource i.e we ask
 pacemaker to promote the ovsdb-servers running in the node configured with
 the virtual ip (i.e master_ip).  Pacemaker will call the promote action [1]
 on the node where master ip is configured.

 - tripleo configures "ovn_nb_connection=tcp:VIP:6641" and "
 ovn_sb_connection=tcp:VIP:6642" in neutron.conf and runs "ovs-vsctl
 set open . external_ids:ovn-remote=tcp:VIP:6642" on all the nodes
 where ovn-controller service is started.

 - Suppose the master ip node goes down for some reason. Pacemaker
 detects this a

Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-09 Thread Han Zhou
On Wed, May 9, 2018 at 9:02 AM, Numan Siddique  wrote:

>
>
> On Wed, May 9, 2018 at 9:02 PM, Han Zhou  wrote:
>
>> Hi Numan,
>>
>> Thanks you so much for the detailed answer! Please see my comments inline.
>>
>> On Wed, May 9, 2018 at 7:41 AM, Numan Siddique 
>> wrote:
>>
>>> Hi Han,
>>>
>>> Please see below for inline comments
>>>
>>> On Wed, May 9, 2018 at 5:17 AM, Han Zhou  wrote:
>>>
 Hi Babu/Numan,

 I have a question regarding OVN pacemaker OCF script.
 I see in the script MASTER_IP is used to start the active DB and
 standby DBs will use that IP to sync from.

 In the Documentation/topics/integration.rst it is also mentioned:

 `master_ip` is the IP address on which the active database server is
 expected to be listening, the slave node uses it to connect to the master
 node.

 However, since active node will change after failover, I wonder if we
 should provide all the IPs of each nodes, and let pacemaker to decide which
 IP is the master IP to be used, dynamically.

>>>
>>>
>>>
 I see in the documentation it is mentioned about using the IPAddr2
 resource for virtual IP. Does it indicate that we should use the virtual IP
 as the master IP?

>>>
>>> That is true. If the master ip is not virtual ip, then we will not be
>>> able to figure out which is the master node. We need to configure
>>> networking-ovn and ovn-controller to point to the right master node so that
>>> they can do write transactions on the DB.
>>>
>>> Below is how we have configured pacemaker OVN HA dbs in tripleo
>>> openstack deployment
>>>
>>>  - Tripleo deployment creates many virtual IPs (using IPAddr2) and these
>>> IP addresses are frontend IPs for keystone and all other openstack API
>>> services and haproxy is used to load balance the traffic (the deployment
>>> will mostly have 3 controllers and all the openstack API services will be
>>> running on each node).
>>>
>>>  - We choose one of the IPaddr2 virtual ip and we set a colocation
>>> constraint when creating the OVN pacemaker HA db resource i.e we ask
>>> pacemaker to promote the ovsdb-servers running in the node configured with
>>> the virtual ip (i.e master_ip).  Pacemaker will call the promote action [1]
>>> on the node where master ip is configured.
>>>
>>> - tripleo configures "ovn_nb_connection=tcp:VIP:6641" and "
>>> ovn_sb_connection=tcp:VIP:6642" in neutron.conf and runs "ovs-vsctl set
>>> open . external_ids:ovn-remote=tcp:VIP:6642" on all the nodes where
>>> ovn-controller service is started.
>>>
>>> - Suppose the master ip node goes down for some reason. Pacemaker
>>> detects this and moves the virtual ip IPAddr2 resource to another node and
>>> promotes the ovsdb-servers running on that node to master. This way, the
>>> neutron-servers and ovn-controlloers can still talk to the same IP without
>>> even noticing that other node becoming master.
>>>
>>>
>>>
>>> Since tripleo was using the IPaddr2 model, we thought this would be the
>>> better way to have a master/slave HA for ovsdb-servers.
>>>
>>> However, this may not work in all scenarios, since the virtual IP works
 only if it can be routed to all nodes, e.g. when all nodes are on the same
 subnet.

>>>
>>> You mean you want to create a pacemaker cluster with nodes belonging to
>>> different subnets ? I had a chat with the pacemaker folks and this is
>>> possible. You can also create a IPAddr2 resource. Pacemaker doesn't put any
>>> restrictions. But you need to solve the  reachability of that ip from all
>>> the networks/nodes.
>>>
>>
>> Yes, and this is why we can't use IPAddr2 due to the reachability
>> problem. (Not in same L2, no BGP, etc.)
>>
>>
>>> In those cases the IPAddr2 virtual IP won't work. In those cases, for
 the clients to access the DB, we can use Load-Balancer VIP. But the problem
 is still how to set the master_ip and how to make the standby to connect to
 the new active after failover.

>>>
>>> I am a bit confused here. Your setup will still have the pacemaker
>>> cluster right ? Are you talking about having OVN db servers active/passive
>>> setup on a non pacemaker cluster setup ? If so, I don't think the OVN OCF
>>> script can be used and you have to solve it differently. Correct me if I am
>>> wrong here.
>>>
>>>
>> You mentioned above "However, since active node will change after
>>> failover, I wonder if we should provide all the IPs of each nodes, and let
>>> pacemaker to decide which IP is the master IP to be used, dynamically".
>>>
>>> We can definitely add this support. Whenever pacemaker promotes a node,
>>> other nodes come to know about it and OVN OCF script can configure the
>>> ovsdb-servers on the slave nodes to connect to the new master. But how will
>>> you configure the neutron-server and ovn-controllers to talk to the new
>>> master ?
>>> Are you planning to use load balancer IP for this purpose ? What if the
>>> load balancer ip resolves to a standby server

Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-09 Thread Numan Siddique
On Wed, May 9, 2018 at 9:02 PM, Han Zhou  wrote:

> Hi Numan,
>
> Thanks you so much for the detailed answer! Please see my comments inline.
>
> On Wed, May 9, 2018 at 7:41 AM, Numan Siddique 
> wrote:
>
>> Hi Han,
>>
>> Please see below for inline comments
>>
>> On Wed, May 9, 2018 at 5:17 AM, Han Zhou  wrote:
>>
>>> Hi Babu/Numan,
>>>
>>> I have a question regarding OVN pacemaker OCF script.
>>> I see in the script MASTER_IP is used to start the active DB and standby
>>> DBs will use that IP to sync from.
>>>
>>> In the Documentation/topics/integration.rst it is also mentioned:
>>>
>>> `master_ip` is the IP address on which the active database server is
>>> expected to be listening, the slave node uses it to connect to the master
>>> node.
>>>
>>> However, since active node will change after failover, I wonder if we
>>> should provide all the IPs of each nodes, and let pacemaker to decide which
>>> IP is the master IP to be used, dynamically.
>>>
>>
>>
>>
>>> I see in the documentation it is mentioned about using the IPAddr2
>>> resource for virtual IP. Does it indicate that we should use the virtual IP
>>> as the master IP?
>>>
>>
>> That is true. If the master ip is not virtual ip, then we will not be
>> able to figure out which is the master node. We need to configure
>> networking-ovn and ovn-controller to point to the right master node so that
>> they can do write transactions on the DB.
>>
>> Below is how we have configured pacemaker OVN HA dbs in tripleo openstack
>> deployment
>>
>>  - Tripleo deployment creates many virtual IPs (using IPAddr2) and these
>> IP addresses are frontend IPs for keystone and all other openstack API
>> services and haproxy is used to load balance the traffic (the deployment
>> will mostly have 3 controllers and all the openstack API services will be
>> running on each node).
>>
>>  - We choose one of the IPaddr2 virtual ip and we set a colocation
>> constraint when creating the OVN pacemaker HA db resource i.e we ask
>> pacemaker to promote the ovsdb-servers running in the node configured with
>> the virtual ip (i.e master_ip).  Pacemaker will call the promote action [1]
>> on the node where master ip is configured.
>>
>> - tripleo configures "ovn_nb_connection=tcp:VIP:6641" and "
>> ovn_sb_connection=tcp:VIP:6642" in neutron.conf and runs "ovs-vsctl set
>> open . external_ids:ovn-remote=tcp:VIP:6642" on all the nodes where
>> ovn-controller service is started.
>>
>> - Suppose the master ip node goes down for some reason. Pacemaker detects
>> this and moves the virtual ip IPAddr2 resource to another node and promotes
>> the ovsdb-servers running on that node to master. This way, the
>> neutron-servers and ovn-controlloers can still talk to the same IP without
>> even noticing that other node becoming master.
>>
>>
>>
>> Since tripleo was using the IPaddr2 model, we thought this would be the
>> better way to have a master/slave HA for ovsdb-servers.
>>
>> However, this may not work in all scenarios, since the virtual IP works
>>> only if it can be routed to all nodes, e.g. when all nodes are on the same
>>> subnet.
>>>
>>
>> You mean you want to create a pacemaker cluster with nodes belonging to
>> different subnets ? I had a chat with the pacemaker folks and this is
>> possible. You can also create a IPAddr2 resource. Pacemaker doesn't put any
>> restrictions. But you need to solve the  reachability of that ip from all
>> the networks/nodes.
>>
>
> Yes, and this is why we can't use IPAddr2 due to the reachability problem.
> (Not in same L2, no BGP, etc.)
>
>
>> In those cases the IPAddr2 virtual IP won't work. In those cases, for the
>>> clients to access the DB, we can use Load-Balancer VIP. But the problem is
>>> still how to set the master_ip and how to make the standby to connect to
>>> the new active after failover.
>>>
>>
>> I am a bit confused here. Your setup will still have the pacemaker
>> cluster right ? Are you talking about having OVN db servers active/passive
>> setup on a non pacemaker cluster setup ? If so, I don't think the OVN OCF
>> script can be used and you have to solve it differently. Correct me if I am
>> wrong here.
>>
>>
> You mentioned above "However, since active node will change after
>> failover, I wonder if we should provide all the IPs of each nodes, and let
>> pacemaker to decide which IP is the master IP to be used, dynamically".
>>
>> We can definitely add this support. Whenever pacemaker promotes a node,
>> other nodes come to know about it and OVN OCF script can configure the
>> ovsdb-servers on the slave nodes to connect to the new master. But how will
>> you configure the neutron-server and ovn-controllers to talk to the new
>> master ?
>> Are you planning to use load balancer IP for this purpose ? What if the
>> load balancer ip resolves to a standby server ?
>>
>
> We still have pacemaker to manage the cluster HA, but just don't use
> IPAddr2 for VIP. To solve the VIP problem, we use physical/soft
> load-balancer. The VIP is on 

Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-09 Thread Han Zhou
Hi Numan,

Thanks you so much for the detailed answer! Please see my comments inline.

On Wed, May 9, 2018 at 7:41 AM, Numan Siddique  wrote:

> Hi Han,
>
> Please see below for inline comments
>
> On Wed, May 9, 2018 at 5:17 AM, Han Zhou  wrote:
>
>> Hi Babu/Numan,
>>
>> I have a question regarding OVN pacemaker OCF script.
>> I see in the script MASTER_IP is used to start the active DB and standby
>> DBs will use that IP to sync from.
>>
>> In the Documentation/topics/integration.rst it is also mentioned:
>>
>> `master_ip` is the IP address on which the active database server is
>> expected to be listening, the slave node uses it to connect to the master
>> node.
>>
>> However, since active node will change after failover, I wonder if we
>> should provide all the IPs of each nodes, and let pacemaker to decide which
>> IP is the master IP to be used, dynamically.
>>
>
>
>
>> I see in the documentation it is mentioned about using the IPAddr2
>> resource for virtual IP. Does it indicate that we should use the virtual IP
>> as the master IP?
>>
>
> That is true. If the master ip is not virtual ip, then we will not be able
> to figure out which is the master node. We need to configure networking-ovn
> and ovn-controller to point to the right master node so that they can do
> write transactions on the DB.
>
> Below is how we have configured pacemaker OVN HA dbs in tripleo openstack
> deployment
>
>  - Tripleo deployment creates many virtual IPs (using IPAddr2) and these
> IP addresses are frontend IPs for keystone and all other openstack API
> services and haproxy is used to load balance the traffic (the deployment
> will mostly have 3 controllers and all the openstack API services will be
> running on each node).
>
>  - We choose one of the IPaddr2 virtual ip and we set a colocation
> constraint when creating the OVN pacemaker HA db resource i.e we ask
> pacemaker to promote the ovsdb-servers running in the node configured with
> the virtual ip (i.e master_ip).  Pacemaker will call the promote action [1]
> on the node where master ip is configured.
>
> - tripleo configures "ovn_nb_connection=tcp:VIP:6641" and "
> ovn_sb_connection=tcp:VIP:6642" in neutron.conf and runs "ovs-vsctl set
> open . external_ids:ovn-remote=tcp:VIP:6642" on all the nodes where
> ovn-controller service is started.
>
> - Suppose the master ip node goes down for some reason. Pacemaker detects
> this and moves the virtual ip IPAddr2 resource to another node and promotes
> the ovsdb-servers running on that node to master. This way, the
> neutron-servers and ovn-controlloers can still talk to the same IP without
> even noticing that other node becoming master.
>
>
>
> Since tripleo was using the IPaddr2 model, we thought this would be the
> better way to have a master/slave HA for ovsdb-servers.
>
> However, this may not work in all scenarios, since the virtual IP works
>> only if it can be routed to all nodes, e.g. when all nodes are on the same
>> subnet.
>>
>
> You mean you want to create a pacemaker cluster with nodes belonging to
> different subnets ? I had a chat with the pacemaker folks and this is
> possible. You can also create a IPAddr2 resource. Pacemaker doesn't put any
> restrictions. But you need to solve the  reachability of that ip from all
> the networks/nodes.
>

Yes, and this is why we can't use IPAddr2 due to the reachability problem.
(Not in same L2, no BGP, etc.)


> In those cases the IPAddr2 virtual IP won't work. In those cases, for the
>> clients to access the DB, we can use Load-Balancer VIP. But the problem is
>> still how to set the master_ip and how to make the standby to connect to
>> the new active after failover.
>>
>
> I am a bit confused here. Your setup will still have the pacemaker cluster
> right ? Are you talking about having OVN db servers active/passive setup on
> a non pacemaker cluster setup ? If so, I don't think the OVN OCF script can
> be used and you have to solve it differently. Correct me if I am wrong here.
>
>
You mentioned above "However, since active node will change after failover,
> I wonder if we should provide all the IPs of each nodes, and let pacemaker
> to decide which IP is the master IP to be used, dynamically".
>
> We can definitely add this support. Whenever pacemaker promotes a node,
> other nodes come to know about it and OVN OCF script can configure the
> ovsdb-servers on the slave nodes to connect to the new master. But how will
> you configure the neutron-server and ovn-controllers to talk to the new
> master ?
> Are you planning to use load balancer IP for this purpose ? What if the
> load balancer ip resolves to a standby server ?
>

We still have pacemaker to manage the cluster HA, but just don't use
IPAddr2 for VIP. To solve the VIP problem, we use physical/soft
load-balancer. The VIP is on LB rather than bound on the ovn central node
interface. There is no problem for client, but a little problem on the OCF
script. Since the OCF script relies on the master IP to star

Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-09 Thread Numan Siddique
Hi Han,

Please see below for inline comments

On Wed, May 9, 2018 at 5:17 AM, Han Zhou  wrote:

> Hi Babu/Numan,
>
> I have a question regarding OVN pacemaker OCF script.
> I see in the script MASTER_IP is used to start the active DB and standby
> DBs will use that IP to sync from.
>
> In the Documentation/topics/integration.rst it is also mentioned:
>
> `master_ip` is the IP address on which the active database server is
> expected to be listening, the slave node uses it to connect to the master
> node.
>
> However, since active node will change after failover, I wonder if we
> should provide all the IPs of each nodes, and let pacemaker to decide which
> IP is the master IP to be used, dynamically.
>



> I see in the documentation it is mentioned about using the IPAddr2
> resource for virtual IP. Does it indicate that we should use the virtual IP
> as the master IP?
>

That is true. If the master ip is not virtual ip, then we will not be able
to figure out which is the master node. We need to configure networking-ovn
and ovn-controller to point to the right master node so that they can do
write transactions on the DB.

Below is how we have configured pacemaker OVN HA dbs in tripleo openstack
deployment

 - Tripleo deployment creates many virtual IPs (using IPAddr2) and these IP
addresses are frontend IPs for keystone and all other openstack API
services and haproxy is used to load balance the traffic (the deployment
will mostly have 3 controllers and all the openstack API services will be
running on each node).

 - We choose one of the IPaddr2 virtual ip and we set a colocation
constraint when creating the OVN pacemaker HA db resource i.e we ask
pacemaker to promote the ovsdb-servers running in the node configured with
the virtual ip (i.e master_ip).  Pacemaker will call the promote action [1]
on the node where master ip is configured.

- tripleo configures "ovn_nb_connection=tcp:VIP:6641" and "
ovn_sb_connection=tcp:VIP:6642" in neutron.conf and runs "ovs-vsctl set
open . external_ids:ovn-remote=tcp:VIP:6642" on all the nodes where
ovn-controller service is started.

- Suppose the master ip node goes down for some reason. Pacemaker detects
this and moves the virtual ip IPAddr2 resource to another node and promotes
the ovsdb-servers running on that node to master. This way, the
neutron-servers and ovn-controlloers can still talk to the same IP without
even noticing that other node becoming master.



Since tripleo was using the IPaddr2 model, we thought this would be the
better way to have a master/slave HA for ovsdb-servers.

However, this may not work in all scenarios, since the virtual IP works
> only if it can be routed to all nodes, e.g. when all nodes are on the same
> subnet.
>

You mean you want to create a pacemaker cluster with nodes belonging to
different subnets ? I had a chat with the pacemaker folks and this is
possible. You can also create a IPAddr2 resource. Pacemaker doesn't put any
restrictions. But you need to solve the  reachability of that ip from all
the networks/nodes.

In those cases the IPAddr2 virtual IP won't work. In those cases, for the
> clients to access the DB, we can use Load-Balancer VIP. But the problem is
> still how to set the master_ip and how to make the standby to connect to
> the new active after failover.
>

I am a bit confused here. Your setup will still have the pacemaker cluster
right ? Are you talking about having OVN db servers active/passive setup on
a non pacemaker cluster setup ? If so, I don't think the OVN OCF script can
be used and you have to solve it differently. Correct me if I am wrong here.

You mentioned above "However, since active node will change after failover,
I wonder if we should provide all the IPs of each nodes, and let pacemaker
to decide which IP is the master IP to be used, dynamically".

We can definitely add this support. Whenever pacemaker promotes a node,
other nodes come to know about it and OVN OCF script can configure the
ovsdb-servers on the slave nodes to connect to the new master. But how will
you configure the neutron-server and ovn-controllers to talk to the new
master ?
Are you planning to use load balancer IP for this purpose ? What if the
load balancer ip resolves to a standby server ?

Hope this helps.

If you have a requirement to support this scenario (i.e without master_ip
param), it can be done. But care should be taken when implementing it.


[1] -
https://github.com/openvswitch/ovs/blob/master/ovn/utilities/ovndb-servers.ocf#L505
   http://www.linux-ha.org/doc/dev-guides/_resource_agent_actions.html



> I may have missed something here. Could you help explain what's the
> expected way to work?
>




>
> Thanks,
> Han
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] Question to OVN DB pacemaker script

2018-05-08 Thread Han Zhou
Hi Babu/Numan,

I have a question regarding OVN pacemaker OCF script.
I see in the script MASTER_IP is used to start the active DB and standby
DBs will use that IP to sync from.

In the Documentation/topics/integration.rst it is also mentioned:

`master_ip` is the IP address on which the active database server is
expected to be listening, the slave node uses it to connect to the master
node.

However, since active node will change after failover, I wonder if we
should provide all the IPs of each nodes, and let pacemaker to decide which
IP is the master IP to be used, dynamically.

I see in the documentation it is mentioned about using the IPAddr2 resource
for virtual IP. Does it indicate that we should use the virtual IP as the
master IP? However, this may not work in all scenarios, since the virtual
IP works only if it can be routed to all nodes, e.g. when all nodes are on
the same subnet. In those cases the IPAddr2 virtual IP won't work. In those
cases, for the clients to access the DB, we can use Load-Balancer VIP. But
the problem is still how to set the master_ip and how to make the standby
to connect to the new active after failover.

I may have missed something here. Could you help explain what's the
expected way to work?

Thanks,
Han
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss