Re: [ovs-discuss] Question to OVN DB pacemaker script

Han Zhou Fri, 11 May 2018 17:22:15 -0700

Thanks for the output. It appears to be more complex than I thought before.
It is good that the new slave doesn't listen on 6641, although I am not
sure how is it achieved. I guess a stop&start has been triggered instead of
simply demote, but I need to spend some time on the pacemaker state
machine. And please ignore my comment about calling ovsdb_server_start() in
demote - it would cause recursive call since ovsdb_server_start() calls
demote(), too.


Regarding the change:
     if [ "x${present_master}" = x ]; then
+        set $@ --db-nb-create-insecure-remote=yes
+        set $@ --db-sb-create-insecure-remote=yes
         # No master detected, or the previous master is not among the
         # set starting.

This "if" branch is when there is no master present, but in fact we want it
to be set when current node is master. So this change doesn't affect
anything. It is the below change that made the test work (so that on slave
node the tcp port is not opened):
     elif [ ${present_master} != ${host_name} ]; then
+        set $@ --db-nb-create-insecure-remote=no
+        set $@ --db-sb-create-insecure-remote=no

The error log of ovsdb should not be skipped. We should never bind the LB
VIP on the ovsdb socket because it is not on the host. I think it is
related to the code in ovsdb_server_notify():
            ovn-nbctl -- --id=@conn_uuid create Connection \
target="p${NB_MASTER_PROTO}\:${NB_MASTER_PORT}\:${MASTER_IP}" \
inactivity_probe=$INACTIVE_PROBE -- set NB_Global . connections=@conn_uuid

When using LB, we should set 0.0.0.0 here.

Also, the failed action is a concern. We may dig more on the root cause.
Thanks for finding these issues.

Thanks,
Han

On Fri, May 11, 2018 at 3:29 PM, aginwala <aginw...@asu.edu> wrote:

> Sure:
>
> *VIP_ip* = 10.149.4.252
> *LB IP* = 10.149.0.40
> *slave netstat where it syncs from master LB VIP IP *
> #netstat -an | grep 6641
> tcp        0      0 10.169.129.34:47426     10.149.4.252:6641
>  ESTABLISHED
> tcp        0      0 10.169.129.34:47444     10.149.4.252:6641
>  ESTABLISHED
>
> *Slave OVS:, *
> # ps aux |grep ovsdb-server
> root      7388  0.0  0.0  18048   376 ?        Ss   14:08   0:00
> ovsdb-server: monitoring pid 7389 (healthy)
> root      7389  0.0  0.0  18464  4556 ?        S    14:08   0:00
> ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/
> openvswitch/ovsdb-server-nb.log 
> --remote=punix:/var/run/openvswitch/ovnnb_db.sock
> --pidfile=/var/run/openvswitch/ovnnb_db.pid --unixctl=ovnnb_db.ctl
> --detach --monitor --remote=db:OVN_Northbound,NB_Global,connections
> --private-key=db:OVN_Northbound,SSL,private_key 
> --certificate=db:OVN_Northbound,SSL,certificate
> --ca-cert=db:OVN_Northbound,SSL,ca_cert 
> --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols
> --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers --sync-from=tcp:
> 10.149.4.252:6641 /etc/openvswitch/ovnnb_db.db
> root      7397  0.0  0.0  18048   372 ?        Ss   14:08   0:00
> ovsdb-server: monitoring pid 7398 (healthy)
> root      7398  0.0  0.0  18868  5280 ?        S    14:08   0:01
> ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/
> openvswitch/ovsdb-server-sb.log 
> --remote=punix:/var/run/openvswitch/ovnsb_db.sock
> --pidfile=/var/run/openvswitch/ovnsb_db.pid --unixctl=ovnsb_db.ctl
> --detach --monitor --remote=db:OVN_Southbound,SB_Global,connections
> --private-key=db:OVN_Southbound,SSL,private_key 
> --certificate=db:OVN_Southbound,SSL,certificate
> --ca-cert=db:OVN_Southbound,SSL,ca_cert 
> --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
> --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers --sync-from=tcp:
> 10.149.4.252:6642 /etc/openvswitch/ovnsb_db.db
>
> *Master netstat where connections is established with LB :*
> netstat -an | grep 6641
> tcp        0      0 0.0.0.0:6641            0.0.0.0:*               LISTEN
> tcp        0      0 10.169.129.33:6641      10.149.0.40:47426
>  ESTABLISHED
> tcp        0      0 10.169.129.33:6641      10.149.0.40:47444
>  ESTABLISHED
>
> *Master OVS:*
> # ps aux | grep ovsdb-server
> root      3318  0.0  0.0  12940  1012 pts/0    S+   15:23   0:00 grep
> --color=auto ovsdb-server
> root     11648  0.0  0.0  18048   372 ?        Ss   14:08   0:00
> ovsdb-server: monitoring pid 11649 (healthy)
> root     11649  0.0  0.0  18312  4208 ?        S    14:08   0:01
> ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/
> openvswitch/ovsdb-server-nb.log 
> --remote=punix:/var/run/openvswitch/ovnnb_db.sock
> --pidfile=/var/run/openvswitch/ovnnb_db.pid --unixctl=ovnnb_db.ctl
> --detach --monitor --remote=db:OVN_Northbound,NB_Global,connections
> --private-key=db:OVN_Northbound,SSL,private_key 
> --certificate=db:OVN_Northbound,SSL,certificate
> --ca-cert=db:OVN_Northbound,SSL,ca_cert 
> --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols
> --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers
> --remote=ptcp:6641:0.0.0.0 --sync-from=tcp:192.0.2.254:6641
> /etc/openvswitch/ovnnb_db.db
> root     11657  0.0  0.0  18048   376 ?        Ss   14:08   0:00
> ovsdb-server: monitoring pid 11658 (healthy)
> root     11658  0.0  0.0  19340  5552 ?        S    14:08   0:01
> ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/
> openvswitch/ovsdb-server-sb.log 
> --remote=punix:/var/run/openvswitch/ovnsb_db.sock
> --pidfile=/var/run/openvswitch/ovnsb_db.pid --unixctl=ovnsb_db.ctl
> --detach --monitor --remote=db:OVN_Southbound,SB_Global,connections
> --private-key=db:OVN_Southbound,SSL,private_key 
> --certificate=db:OVN_Southbound,SSL,certificate
> --ca-cert=db:OVN_Southbound,SSL,ca_cert 
> --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
> --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers
> --remote=ptcp:6642:0.0.0.0 --sync-from=tcp:192.0.2.254:6642
> /etc/openvswitch/ovnsb_db.db
>
>
>
> Same is for 6642 for sb db. Hope it's clear. Sorry did not post in the
> previous message as I thought you already got the point  :) .
>
>
>
> Regards,
> Aliasgar
>
>
> On Fri, May 11, 2018 at 3:16 PM, Han Zhou <zhou...@gmail.com> wrote:
>
>> Ali, could you share output of "ps | grep ovsdb" and "netstat -lpn | grep
>> 6641" on the new slave node after you do "crm resource move"?
>>
>> On Fri, May 11, 2018 at 2:25 PM, aginwala <aginw...@asu.edu> wrote:
>>
>>> Thanks Han for more suggestions:
>>>
>>>
>>> I did test failover by gracefully stopping pacemaker+corosync on master
>>> node along with crm move and it works as expected too as crm move is
>>> triggering promote of new master and hence the new master gets elected
>>> along with slave getting demoted as expected to listen on sync-from node.
>>> Hence, whatever code change I posted earlier is well and good.
>>>
>>> # crm stat
>>> Stack: corosync
>>> Current DC: test-pace1-2365293 (version 1.1.14-70404b0) - partition with
>>> quorum
>>> 2 nodes and 2 resources configured
>>>
>>> Online: [ test-pace1-2365293 test-pace2-2365308 ]
>>>
>>> Full list of resources:
>>>
>>>  Master/Slave Set: ovndb_servers-master [ovndb_servers]
>>>      Masters: [ test-pace2-2365308 ]
>>>      Slaves: [ test-pace1-2365293 ]
>>>
>>> #crm --debug resource move ovndb_servers test-pace1-2365293
>>> DEBUG: pacemaker version: [err: ][out: CRM Version: 1.1.14 (70404b0)]
>>> DEBUG: found pacemaker version: 1.1.14
>>> DEBUG: invoke: crm_resource --quiet --move -r 'ovndb_servers'
>>> --node='test-pace1-2365293'
>>> # crm stat
>>>
>>> Stack: corosync
>>> Current DC: test-pace1-2365293 (version 1.1.14-70404b0) - partition with
>>> quorum
>>> 2 nodes and 2 resources configured
>>>
>>> Online: [ test-pace1-2365293 test-pace2-2365308 ]
>>>
>>> Full list of resources:
>>>
>>>  Master/Slave Set: ovndb_servers-master [ovndb_servers]
>>>      Masters: [ test-pace1-2365293 ]
>>>      Slaves: [ test-pace2-2365308 ]
>>>
>>> Failed Actions:
>>> * ovndb_servers_monitor_10000 on test-pace2-2365308 'master' (8):
>>> call=46, status=complete, exitreason='none',
>>>     last-rc-change='Fri May 11 14:08:35 2018', queued=0ms, exec=83ms
>>>
>>> Note: Failed Actions warning only comes for crm move command and not
>>> using reboot/kill/service pacemaker/corosync stop/start
>>>
>>> I cleaned up the warning using below commad:
>>> #crm_resource -P
>>> Waiting for 1 replies from the CRMd. OK
>>>
>>> Also wanted to call out above findings noticed that ocf_attribute_target
>>> is not getting called as per pacemaker logs as code says it will not work
>>> for older pacemaker versions and not sure what versions exactly as I am on
>>> version 1.1.14
>>> # pacemaker logs
>>>  notice: operation_finished: ovndb_servers_monitor_10000:7561:stderr [
>>> /usr/lib/ocf/resource.d/ovn/ovndb-servers: line 31:
>>> ocf_attribute_target: command not found ]
>>>
>>>
>>> # Also need nb db logs are showing socket util errors which I think need
>>> a code change too to skip stamping it as functionality is still working as
>>> expected (may be in a separate commit since its ovsdb change)
>>> 018-05-11T21:14:25.958Z|00560|socket_util|ERR|6641:10.149.4.252: bind:
>>> Cannot assign requested address
>>> 2018-05-11T21:14:25.958Z|00561|socket_util|ERR|6641:10.149.4.252: bind:
>>> Cannot assign requested address
>>> 2018-05-11T21:14:27.859Z|00562|socket_util|ERR|6641:10.149.4.252: bind:
>>> Cannot assign requested address
>>>
>>>
>>>
>>> Let me know for any suggestions further.
>>>
>>>
>>> Regards,
>>> Aliasgar
>>>
>>>
>>> On Thu, May 10, 2018 at 3:49 PM, Han Zhou <zhou...@gmail.com> wrote:
>>>
>>>> Good progress!
>>>>
>>>> I think at least one more change is needed to ensure when demote
>>>> happens, the TCP port is shut down. Otherwise, the LB will be confused
>>>> again and can't figure out which one is active. This is the graceful
>>>> failover scenario which can be tested by crm resource move instead of
>>>> reboot/killing process.
>>>>
>>>> This may be done by the same approach you did for promote, i.e. stop
>>>> ovsdb and then call ovsdb_server_start() so the parameters are reset
>>>> correctly before starting. Alternatively we can add a command in
>>>> ovsdb-server, in addition to the commands that switches to/from
>>>> active/backup modes, to open/close the TCP ports, to avoid restarting
>>>> during failover, but I am not sure if this is valuable. It depends on
>>>> whether restarting ovsdb-server during failover is sufficient enough. Could
>>>> you add the restart logic for demote and try more? Thanks!
>>>>
>>>> Thanks,
>>>> Han
>>>>
>>>> On Thu, May 10, 2018 at 1:54 PM, aginwala <aginw...@asu.edu> wrote:
>>>>
>>>>> Hi :
>>>>>
>>>>> Just to further update, I am able to re-open tcp port for failover
>>>>> scenario when new master is getting promoted with additional code changes
>>>>> as below which do require stop of ovs service on the new selected master 
>>>>> to
>>>>> reset the tcp settings:
>>>>>
>>>>>
>>>>> diff --git a/ovn/utilities/ovndb-servers.ocf
>>>>> b/ovn/utilities/ovndb-servers.ocf
>>>>> index 164b6bc..8cb4c25 100755
>>>>> --- a/ovn/utilities/ovndb-servers.ocf
>>>>> +++ b/ovn/utilities/ovndb-servers.ocf
>>>>> @@ -295,8 +295,8 @@ ovsdb_server_start() {
>>>>>
>>>>>      set ${OVN_CTL}
>>>>>
>>>>> -    set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
>>>>> -    set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
>>>>> +    set $@ --db-nb-port=${NB_MASTER_PORT}
>>>>> +    set $@ --db-sb-port=${SB_MASTER_PORT}
>>>>>
>>>>>      if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
>>>>>          set $@ --db-nb-create-insecure-remote=yes
>>>>> @@ -307,6 +307,8 @@ ovsdb_server_start() {
>>>>>      fi
>>>>>
>>>>>      if [ "x${present_master}" = x ]; then
>>>>> +        set $@ --db-nb-create-insecure-remote=yes
>>>>> +        set $@ --db-sb-create-insecure-remote=yes
>>>>>          # No master detected, or the previous master is not among the
>>>>>          # set starting.
>>>>>          #
>>>>> @@ -316,6 +318,8 @@ ovsdb_server_start() {
>>>>>          set $@ --db-nb-sync-from-addr=${INVALID_IP_ADDRESS}
>>>>> --db-sb-sync-from-addr=${INVALID_IP_ADDRESS}
>>>>>
>>>>>      elif [ ${present_master} != ${host_name} ]; then
>>>>> +        set $@ --db-nb-create-insecure-remote=no
>>>>> +        set $@ --db-sb-create-insecure-remote=no
>>>>>          # An existing master is active, connect to it
>>>>>          set $@ --db-nb-sync-from-addr=${MASTER_IP}
>>>>> --db-sb-sync-from-addr=${MASTER_IP}
>>>>>          set $@ --db-nb-sync-from-port=${NB_MASTER_PORT}
>>>>> @@ -416,6 +420,8 @@ ovsdb_server_promote() {
>>>>>              ;;
>>>>>      esac
>>>>>
>>>>> +    ${OVN_CTL} stop_ovsdb
>>>>> +    ovsdb_server_start
>>>>>      ${OVN_CTL} promote_ovnnb
>>>>>      ${OVN_CTL} promote_ovnsb
>>>>>
>>>>>
>>>>>
>>>>> Below are the scenarios tested:
>>>>> MasterSlaveScenarioResult
>>>>>
>>>>>    -
>>>>>
>>>>>
>>>>>    -
>>>>>
>>>>> reboot/failure New master gets promoted with tcp ports enabled to
>>>>> start taking LB traffic.
>>>>>
>>>>>    -
>>>>>
>>>>>
>>>>>    -
>>>>>
>>>>> reboot/failure
>>>>> No change and current master continues taking traffic with slave
>>>>> continue to sync from master.
>>>>>
>>>>>    -
>>>>>
>>>>>
>>>>>    -
>>>>>
>>>>> reboot/failure
>>>>> New master gets promoted with tcp ports enabled to start taking LB
>>>>> traffic.
>>>>>
>>>>> Also sync on slaves from master works as expected:
>>>>> # On master
>>>>> ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add  556
>>>>> # on slave port is shutdown as expected
>>>>> ovn-nbctl --db=tcp:10.169.129.34:6641 show
>>>>> ovn-nbctl: tcp:10.169.129.34:6641: database connection failed
>>>>> (Connection refused)
>>>>> # on slave local unix socket, above lswitch 556 gets replicated too as
>>>>> --sync-from=tcp:10.149.4.252:6641
>>>>> ovn-nbctl show
>>>>> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
>>>>>
>>>>> # Same testing for sb db too
>>>>> # Slave port 6642 is shutdown too
>>>>> ovn-sbctl --db=tcp:10.169.129.34:6642 show hangs and
>>>>> # Using master ip works
>>>>>  ovn-sbctl --db=tcp:10.169.129.33:6642 show
>>>>> Chassis "21f12bd6-e9e8-4ee2-afeb-28b331df6715"
>>>>>     hostname: "test-pace2-2365308.lvs02.dev.ebayc3.com"
>>>>>     Encap geneve
>>>>>         ip: "10.169.129.34"
>>>>>         options: {csum="true"}
>>>>>
>>>>>
>>>>>
>>>>> # Accessing via LB vip works fine too as only one member is active:
>>>>> for i in `seq 1 500`; do ovn-sbctl --db=tcp:10.149.4.252:6642 show;
>>>>> done
>>>>> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
>>>>> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
>>>>> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
>>>>> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
>>>>> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
>>>>>
>>>>>
>>>>> Everything works fine as expected. Let me know for any corner case
>>>>> missed. I will submit a formal patch using LISTEN_ON_MASTER_IP_ONLY
>>>>> for using LB with tcp  to avoid breaking existing functionality 
>>>>> accordingly.
>>>>>
>>>>>
>>>>>
>>>>> Regards,
>>>>> Aliasgar
>>>>>
>>>>>
>>>>>
>>>>> On Thu, May 10, 2018 at 9:55 AM, aginwala <aginw...@asu.edu> wrote:
>>>>>
>>>>>> Thanks folks for suggestions:
>>>>>>
>>>>>> For LB vip configurations, I did  the testing further and yes it does
>>>>>> tries to hit the slave db as per the logs below and fails as slave do not
>>>>>> have write permission of which LB is not aware of:
>>>>>> for i in `seq 1 500`; do ovn-nbctl --db=tcp:10.149.4.252:6641 ls-add
>>>>>> $i590;done
>>>>>> ovn-nbctl: transaction error: {"details":"insert operation not
>>>>>> allowed when database server is in read only mode","error":"not allowed"}
>>>>>> ovn-nbctl: transaction error: {"details":"insert operation not
>>>>>> allowed when database server is in read only mode","error":"not allowed"}
>>>>>> ovn-nbctl: transaction error: {"details":"insert operation not
>>>>>> allowed when database server is in read only mode","error":"not allowed"}
>>>>>>
>>>>>> Hence, with little more code changes(in the same patch without the
>>>>>> flag variable suggestion), I am able to shutdown the tcp port on the 
>>>>>> slave
>>>>>> and it works fine as below:
>>>>>> #Master Node
>>>>>> # ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add test444
>>>>>> #Slave Node
>>>>>> # ovn-nbctl --db=tcp:10.169.129.34:6641 ls-add test444
>>>>>> ovn-nbctl: tcp:10.169.129.34:6641: database connection failed
>>>>>> (Connection refused)
>>>>>>
>>>>>> Code to shutdown tcp port on slave db along with only master
>>>>>> listening on tcp ports:
>>>>>> diff --git a/ovn/utilities/ovndb-servers.ocf
>>>>>> b/ovn/utilities/ovndb-servers.ocf
>>>>>> index 164b6bc..b265df6 100755
>>>>>> --- a/ovn/utilities/ovndb-servers.ocf
>>>>>> +++ b/ovn/utilities/ovndb-servers.ocf
>>>>>> @@ -295,8 +295,8 @@ ovsdb_server_start() {
>>>>>>
>>>>>>      set ${OVN_CTL}
>>>>>>
>>>>>> -    set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
>>>>>> -    set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
>>>>>> +    set $@ --db-nb-port=${NB_MASTER_PORT}
>>>>>> +    set $@ --db-sb-port=${SB_MASTER_PORT}
>>>>>>
>>>>>>      if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
>>>>>>          set $@ --db-nb-create-insecure-remote=yes
>>>>>> @@ -307,6 +307,8 @@ ovsdb_server_start() {
>>>>>>      fi
>>>>>>
>>>>>>      if [ "x${present_master}" = x ]; then
>>>>>> +        set $@ --db-nb-create-insecure-remote=yes
>>>>>> +        set $@ --db-sb-create-insecure-remote=yes
>>>>>>          # No master detected, or the previous master is not among the
>>>>>>          # set starting.
>>>>>>          #
>>>>>> @@ -316,6 +318,8 @@ ovsdb_server_start() {
>>>>>>          set $@ --db-nb-sync-from-addr=${INVALID_IP_ADDRESS}
>>>>>> --db-sb-sync-from-addr=${INVALID_IP_ADDR
>>>>>>
>>>>>>      elif [ ${present_master} != ${host_name} ]; then
>>>>>> +        set $@ --db-nb-create-insecure-remote=no
>>>>>> +        set $@ --db-sb-create-insecure-remote=no
>>>>>>
>>>>>>
>>>>>> But I noticed that if the slave becomes active post failover after
>>>>>> active node reboot/failure, pacemaker shows it online but I am not able 
>>>>>> to
>>>>>> access the dbs.
>>>>>>
>>>>>> # crm status
>>>>>> Online: [ test-pace2-2365308 ]
>>>>>> OFFLINE: [ test-pace1-2365293 ]
>>>>>>
>>>>>> Full list of resources:
>>>>>>
>>>>>>  Master/Slave Set: ovndb_servers-master [ovndb_servers]
>>>>>>      Masters: [ test-pace2-2365308 ]
>>>>>>      Stopped: [ test-pace1-2365293 ]
>>>>>>
>>>>>>
>>>>>> # ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add test444
>>>>>> ovn-nbctl: tcp:10.169.129.33:6641: database connection failed
>>>>>> (Connection refused)
>>>>>> # ovn-nbctl --db=tcp:10.169.129.34:6641 ls-add test444
>>>>>> ovn-nbctl: tcp:10.169.129.34:6641: database connection failed
>>>>>> (Connection refused)
>>>>>>
>>>>>> Hence, if failover happens, slave is already running with
>>>>>> --sync-from=lbVIP:6641/6642 for nb and sb db respectively. Thus, 
>>>>>> re-opening
>>>>>> of tcp ports for nb and sb db on the slave that is getting promoted to
>>>>>> master is not happening automatically.
>>>>>>
>>>>>> Let me know if there is a valid way/approach too which I am missing
>>>>>> to handle it during slave promote logic?  Will do further code changes
>>>>>> accordingly.
>>>>>>
>>>>>> Note: Current code changes for use with LB will needs to be handled
>>>>>> for ssl too. Will have to handle that separately but want to get the tcp
>>>>>> working first and we can add ssl support later.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Aliasgar
>>>>>>
>>>>>> On Wed, May 9, 2018 at 12:19 PM, Numan Siddique <nusid...@redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, May 10, 2018 at 12:44 AM, Han Zhou <zhou...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, May 9, 2018 at 11:51 AM, Numan Siddique <
>>>>>>>> nusid...@redhat.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, May 10, 2018 at 12:15 AM, Han Zhou <zhou...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Ali for the quick patch. Please see my comments inline.
>>>>>>>>>>
>>>>>>>>>> On Wed, May 9, 2018 at 9:30 AM, aginwala <aginw...@asu.edu>
>>>>>>>>>> wrote:
>>>>>>>>>> >
>>>>>>>>>> > Thanks Han and Numan for the clarity to help sort it out.
>>>>>>>>>> >
>>>>>>>>>> > For making vip work with using LB in my two node setup, I had
>>>>>>>>>> changed below code to skip setting master IP  when creating pcs 
>>>>>>>>>> resource
>>>>>>>>>> for ovndbs and listen on 0.0.0.0 instead. Hence, the discussion seems
>>>>>>>>>> inline with the code change which is small for sure as below:
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > diff --git a/ovn/utilities/ovndb-servers.ocf
>>>>>>>>>> b/ovn/utilities/ovndb-servers.ocf
>>>>>>>>>> > index 164b6bc..d4c9ad7 100755
>>>>>>>>>> > --- a/ovn/utilities/ovndb-servers.ocf
>>>>>>>>>> > +++ b/ovn/utilities/ovndb-servers.ocf
>>>>>>>>>> > @@ -295,8 +295,8 @@ ovsdb_server_start() {
>>>>>>>>>> >
>>>>>>>>>> >      set ${OVN_CTL}
>>>>>>>>>> >
>>>>>>>>>> > -    set $@ --db-nb-addr=${MASTER_IP}
>>>>>>>>>> --db-nb-port=${NB_MASTER_PORT}
>>>>>>>>>> > -    set $@ --db-sb-addr=${MASTER_IP}
>>>>>>>>>> --db-sb-port=${SB_MASTER_PORT}
>>>>>>>>>> > +    set $@ --db-nb-port=${NB_MASTER_PORT}
>>>>>>>>>> > +    set $@ --db-sb-port=${SB_MASTER_PORT}
>>>>>>>>>> >
>>>>>>>>>> >      if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
>>>>>>>>>> >          set $@ --db-nb-create-insecure-remote=yes
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>> This change solves the IP binding problem. It will just listen on
>>>>>>>>>> 0.0.0.0.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> One problem with this approach I see is that it would listen on
>>>>>>>>> all the IPs. May be it's not a good idea and may have some security 
>>>>>>>>> issues.
>>>>>>>>>
>>>>>>>>> Can we instead check the value of  MASTER_IP param something like
>>>>>>>>> below ?
>>>>>>>>>
>>>>>>>>>  if [ "$MASTER_IP" == "0.0.0.0" ]; then
>>>>>>>>>      set $@ --db-nb-addr=${MASTER_IP}
>>>>>>>>> --db-nb-port=${NB_MASTER_PORT}
>>>>>>>>>      set $@ --db-sb-addr=${MASTER_IP}
>>>>>>>>> --db-sb-port=${SB_MASTER_PORT}
>>>>>>>>> else
>>>>>>>>>      set $@ --db-nb-port=${NB_MASTER_PORT}
>>>>>>>>>      set $@ --db-sb-port=${SB_MASTER_PORT}
>>>>>>>>> fi
>>>>>>>>>
>>>>>>>>> And when you create OVN pacemaker resource in your deployment, you
>>>>>>>>> can pass master_ip=0.0.0.0
>>>>>>>>>
>>>>>>>>> Will this work ?
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Maybe some misunderstanding here. We still need to use master_ip =
>>>>>>>> LB VIP, so that the standby nodes can "sync-from" the active node. So 
>>>>>>>> we
>>>>>>>> cannot pass 0.0.0.0 explicitly.
>>>>>>>>
>>>>>>>
>>>>>>> I misunderstood earlier. I thought you wouldn't need master ip at
>>>>>>> all. Thanks for the clarification.
>>>>>>>
>>>>>>>>
>>>>>>>> I didn't understand your code above either. Why would we specify
>>>>>>>> the master_ip if we know it is 0.0.0.0? Or do you mean the other way 
>>>>>>>> around
>>>>>>>> but just a typo in the code?
>>>>>>>>
>>>>>>>> For security of listening on any IP, I am not quit sure. It may be
>>>>>>>> a problem if the nodes sits on multiple networks and some of them are
>>>>>>>> considered insecure, and you want to listen on the security one only. 
>>>>>>>> If
>>>>>>>> this is the concern, we can add a parameter e.g. 
>>>>>>>> LISTEN_ON_MASTER_IP_ONLY,
>>>>>>>> and set it to true by default. What do you think?
>>>>>>>>
>>>>>>>
>>>>>>> I would prefer adding the parameter as you have suggested so that
>>>>>>> the existing behavior remain intact.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Numan
>>>>>>>
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Han
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Re: [ovs-discuss] Question to OVN DB pacemaker script

Reply via email to