Good progress!

I think at least one more change is needed to ensure when demote happens,
the TCP port is shut down. Otherwise, the LB will be confused again and
can't figure out which one is active. This is the graceful failover
scenario which can be tested by crm resource move instead of reboot/killing
process.

This may be done by the same approach you did for promote, i.e. stop ovsdb
and then call ovsdb_server_start() so the parameters are reset correctly
before starting. Alternatively we can add a command in ovsdb-server, in
addition to the commands that switches to/from active/backup modes, to
open/close the TCP ports, to avoid restarting during failover, but I am not
sure if this is valuable. It depends on whether restarting ovsdb-server
during failover is sufficient enough. Could you add the restart logic for
demote and try more? Thanks!

Thanks,
Han

On Thu, May 10, 2018 at 1:54 PM, aginwala <aginw...@asu.edu> wrote:

> Hi :
>
> Just to further update, I am able to re-open tcp port for failover
> scenario when new master is getting promoted with additional code changes
> as below which do require stop of ovs service on the new selected master to
> reset the tcp settings:
>
>
> diff --git a/ovn/utilities/ovndb-servers.ocf
> b/ovn/utilities/ovndb-servers.ocf
> index 164b6bc..8cb4c25 100755
> --- a/ovn/utilities/ovndb-servers.ocf
> +++ b/ovn/utilities/ovndb-servers.ocf
> @@ -295,8 +295,8 @@ ovsdb_server_start() {
>
>      set ${OVN_CTL}
>
> -    set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
> -    set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
> +    set $@ --db-nb-port=${NB_MASTER_PORT}
> +    set $@ --db-sb-port=${SB_MASTER_PORT}
>
>      if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
>          set $@ --db-nb-create-insecure-remote=yes
> @@ -307,6 +307,8 @@ ovsdb_server_start() {
>      fi
>
>      if [ "x${present_master}" = x ]; then
> +        set $@ --db-nb-create-insecure-remote=yes
> +        set $@ --db-sb-create-insecure-remote=yes
>          # No master detected, or the previous master is not among the
>          # set starting.
>          #
> @@ -316,6 +318,8 @@ ovsdb_server_start() {
>          set $@ --db-nb-sync-from-addr=${INVALID_IP_ADDRESS}
> --db-sb-sync-from-addr=${INVALID_IP_ADDRESS}
>
>      elif [ ${present_master} != ${host_name} ]; then
> +        set $@ --db-nb-create-insecure-remote=no
> +        set $@ --db-sb-create-insecure-remote=no
>          # An existing master is active, connect to it
>          set $@ --db-nb-sync-from-addr=${MASTER_IP}
> --db-sb-sync-from-addr=${MASTER_IP}
>          set $@ --db-nb-sync-from-port=${NB_MASTER_PORT}
> @@ -416,6 +420,8 @@ ovsdb_server_promote() {
>              ;;
>      esac
>
> +    ${OVN_CTL} stop_ovsdb
> +    ovsdb_server_start
>      ${OVN_CTL} promote_ovnnb
>      ${OVN_CTL} promote_ovnsb
>
>
>
> Below are the scenarios tested:
> MasterSlaveScenarioResult
>
>    -
>
>
>    -
>
> reboot/failure New master gets promoted with tcp ports enabled to start
> taking LB traffic.
>
>    -
>
>
>    -
>
> reboot/failure
> No change and current master continues taking traffic with slave continue
> to sync from master.
>
>    -
>
>
>    -
>
> reboot/failure
> New master gets promoted with tcp ports enabled to start taking LB traffic.
>
> Also sync on slaves from master works as expected:
> # On master
> ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add  556
> # on slave port is shutdown as expected
> ovn-nbctl --db=tcp:10.169.129.34:6641 show
> ovn-nbctl: tcp:10.169.129.34:6641: database connection failed (Connection
> refused)
> # on slave local unix socket, above lswitch 556 gets replicated too as
> --sync-from=tcp:10.149.4.252:6641
> ovn-nbctl show
> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
>
> # Same testing for sb db too
> # Slave port 6642 is shutdown too
> ovn-sbctl --db=tcp:10.169.129.34:6642 show hangs and
> # Using master ip works
>  ovn-sbctl --db=tcp:10.169.129.33:6642 show
> Chassis "21f12bd6-e9e8-4ee2-afeb-28b331df6715"
>     hostname: "test-pace2-2365308.lvs02.dev.ebayc3.com"
>     Encap geneve
>         ip: "10.169.129.34"
>         options: {csum="true"}
>
>
>
> # Accessing via LB vip works fine too as only one member is active:
> for i in `seq 1 500`; do ovn-sbctl --db=tcp:10.149.4.252:6642 show; done
> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
> switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
>
>
> Everything works fine as expected. Let me know for any corner case missed.
> I will submit a formal patch using LISTEN_ON_MASTER_IP_ONLY for using LB
> with tcp  to avoid breaking existing functionality accordingly.
>
>
>
> Regards,
> Aliasgar
>
>
>
> On Thu, May 10, 2018 at 9:55 AM, aginwala <aginw...@asu.edu> wrote:
>
>> Thanks folks for suggestions:
>>
>> For LB vip configurations, I did  the testing further and yes it does
>> tries to hit the slave db as per the logs below and fails as slave do not
>> have write permission of which LB is not aware of:
>> for i in `seq 1 500`; do ovn-nbctl --db=tcp:10.149.4.252:6641 ls-add
>> $i590;done
>> ovn-nbctl: transaction error: {"details":"insert operation not allowed
>> when database server is in read only mode","error":"not allowed"}
>> ovn-nbctl: transaction error: {"details":"insert operation not allowed
>> when database server is in read only mode","error":"not allowed"}
>> ovn-nbctl: transaction error: {"details":"insert operation not allowed
>> when database server is in read only mode","error":"not allowed"}
>>
>> Hence, with little more code changes(in the same patch without the flag
>> variable suggestion), I am able to shutdown the tcp port on the slave and
>> it works fine as below:
>> #Master Node
>> # ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add test444
>> #Slave Node
>> # ovn-nbctl --db=tcp:10.169.129.34:6641 ls-add test444
>> ovn-nbctl: tcp:10.169.129.34:6641: database connection failed
>> (Connection refused)
>>
>> Code to shutdown tcp port on slave db along with only master listening on
>> tcp ports:
>> diff --git a/ovn/utilities/ovndb-servers.ocf
>> b/ovn/utilities/ovndb-servers.ocf
>> index 164b6bc..b265df6 100755
>> --- a/ovn/utilities/ovndb-servers.ocf
>> +++ b/ovn/utilities/ovndb-servers.ocf
>> @@ -295,8 +295,8 @@ ovsdb_server_start() {
>>
>>      set ${OVN_CTL}
>>
>> -    set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
>> -    set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
>> +    set $@ --db-nb-port=${NB_MASTER_PORT}
>> +    set $@ --db-sb-port=${SB_MASTER_PORT}
>>
>>      if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
>>          set $@ --db-nb-create-insecure-remote=yes
>> @@ -307,6 +307,8 @@ ovsdb_server_start() {
>>      fi
>>
>>      if [ "x${present_master}" = x ]; then
>> +        set $@ --db-nb-create-insecure-remote=yes
>> +        set $@ --db-sb-create-insecure-remote=yes
>>          # No master detected, or the previous master is not among the
>>          # set starting.
>>          #
>> @@ -316,6 +318,8 @@ ovsdb_server_start() {
>>          set $@ --db-nb-sync-from-addr=${INVALID_IP_ADDRESS}
>> --db-sb-sync-from-addr=${INVALID_IP_ADDR
>>
>>      elif [ ${present_master} != ${host_name} ]; then
>> +        set $@ --db-nb-create-insecure-remote=no
>> +        set $@ --db-sb-create-insecure-remote=no
>>
>>
>> But I noticed that if the slave becomes active post failover after active
>> node reboot/failure, pacemaker shows it online but I am not able to access
>> the dbs.
>>
>> # crm status
>> Online: [ test-pace2-2365308 ]
>> OFFLINE: [ test-pace1-2365293 ]
>>
>> Full list of resources:
>>
>>  Master/Slave Set: ovndb_servers-master [ovndb_servers]
>>      Masters: [ test-pace2-2365308 ]
>>      Stopped: [ test-pace1-2365293 ]
>>
>>
>> # ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add test444
>> ovn-nbctl: tcp:10.169.129.33:6641: database connection failed
>> (Connection refused)
>> # ovn-nbctl --db=tcp:10.169.129.34:6641 ls-add test444
>> ovn-nbctl: tcp:10.169.129.34:6641: database connection failed
>> (Connection refused)
>>
>> Hence, if failover happens, slave is already running with
>> --sync-from=lbVIP:6641/6642 for nb and sb db respectively. Thus, re-opening
>> of tcp ports for nb and sb db on the slave that is getting promoted to
>> master is not happening automatically.
>>
>> Let me know if there is a valid way/approach too which I am missing to
>> handle it during slave promote logic?  Will do further code changes
>> accordingly.
>>
>> Note: Current code changes for use with LB will needs to be handled for
>> ssl too. Will have to handle that separately but want to get the tcp
>> working first and we can add ssl support later.
>>
>>
>> Regards,
>> Aliasgar
>>
>> On Wed, May 9, 2018 at 12:19 PM, Numan Siddique <nusid...@redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Thu, May 10, 2018 at 12:44 AM, Han Zhou <zhou...@gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Wed, May 9, 2018 at 11:51 AM, Numan Siddique <nusid...@redhat.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, May 10, 2018 at 12:15 AM, Han Zhou <zhou...@gmail.com> wrote:
>>>>>
>>>>>> Thanks Ali for the quick patch. Please see my comments inline.
>>>>>>
>>>>>> On Wed, May 9, 2018 at 9:30 AM, aginwala <aginw...@asu.edu> wrote:
>>>>>> >
>>>>>> > Thanks Han and Numan for the clarity to help sort it out.
>>>>>> >
>>>>>> > For making vip work with using LB in my two node setup, I had
>>>>>> changed below code to skip setting master IP  when creating pcs resource
>>>>>> for ovndbs and listen on 0.0.0.0 instead. Hence, the discussion seems
>>>>>> inline with the code change which is small for sure as below:
>>>>>> >
>>>>>> >
>>>>>> > diff --git a/ovn/utilities/ovndb-servers.ocf
>>>>>> b/ovn/utilities/ovndb-servers.ocf
>>>>>> > index 164b6bc..d4c9ad7 100755
>>>>>> > --- a/ovn/utilities/ovndb-servers.ocf
>>>>>> > +++ b/ovn/utilities/ovndb-servers.ocf
>>>>>> > @@ -295,8 +295,8 @@ ovsdb_server_start() {
>>>>>> >
>>>>>> >      set ${OVN_CTL}
>>>>>> >
>>>>>> > -    set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
>>>>>> > -    set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
>>>>>> > +    set $@ --db-nb-port=${NB_MASTER_PORT}
>>>>>> > +    set $@ --db-sb-port=${SB_MASTER_PORT}
>>>>>> >
>>>>>> >      if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
>>>>>> >          set $@ --db-nb-create-insecure-remote=yes
>>>>>> >
>>>>>>
>>>>>> This change solves the IP binding problem. It will just listen on
>>>>>> 0.0.0.0.
>>>>>>
>>>>>
>>>>> One problem with this approach I see is that it would listen on all
>>>>> the IPs. May be it's not a good idea and may have some security issues.
>>>>>
>>>>> Can we instead check the value of  MASTER_IP param something like
>>>>> below ?
>>>>>
>>>>>  if [ "$MASTER_IP" == "0.0.0.0" ]; then
>>>>>      set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
>>>>>      set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
>>>>> else
>>>>>      set $@ --db-nb-port=${NB_MASTER_PORT}
>>>>>      set $@ --db-sb-port=${SB_MASTER_PORT}
>>>>> fi
>>>>>
>>>>> And when you create OVN pacemaker resource in your deployment, you can
>>>>> pass master_ip=0.0.0.0
>>>>>
>>>>> Will this work ?
>>>>>
>>>>>
>>>> Maybe some misunderstanding here. We still need to use master_ip = LB
>>>> VIP, so that the standby nodes can "sync-from" the active node. So we
>>>> cannot pass 0.0.0.0 explicitly.
>>>>
>>>
>>> I misunderstood earlier. I thought you wouldn't need master ip at all.
>>> Thanks for the clarification.
>>>
>>>>
>>>> I didn't understand your code above either. Why would we specify the
>>>> master_ip if we know it is 0.0.0.0? Or do you mean the other way around but
>>>> just a typo in the code?
>>>>
>>>> For security of listening on any IP, I am not quit sure. It may be a
>>>> problem if the nodes sits on multiple networks and some of them are
>>>> considered insecure, and you want to listen on the security one only. If
>>>> this is the concern, we can add a parameter e.g. LISTEN_ON_MASTER_IP_ONLY,
>>>> and set it to true by default. What do you think?
>>>>
>>>
>>> I would prefer adding the parameter as you have suggested so that the
>>> existing behavior remain intact.
>>>
>>> Thanks
>>> Numan
>>>
>>>
>>>> Thanks,
>>>> Han
>>>>
>>>>
>>>
>>
>
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to