On Wed, Aug 2, 2017 at 1:18 AM, Russell Bryant <russ...@ovn.org> wrote:

> On Tue, Aug 1, 2017 at 3:26 PM, Han Zhou <zhou...@gmail.com> wrote:
> >
> >
> > On Tue, Aug 1, 2017 at 9:19 AM, Russell Bryant <russ...@ovn.org> wrote:
> >>
> >> Add native support for active-standby HA in ovn-northd by having each
> >> instance attempt to acquire an OVSDB lock.  Only the instance of
> >> ovn-northd that currently holds the lock will make active changes to
> >> the OVN databases.
> >>
> >> Signed-off-by: Russell Bryant <russ...@ovn.org>
> >> ---
> >>  NEWS                        |  1 +
> >>  ovn/northd/ovn-northd.8.xml |  9 +++++++++
> >>  ovn/northd/ovn-northd.c     | 40 ++++++++++++++++++++++++++++++
> +---------
> >>  3 files changed, 41 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/NEWS b/NEWS
> >> index facea0228..f3cdd2443 100644
> >> --- a/NEWS
> >> +++ b/NEWS
> >> @@ -49,6 +49,7 @@ Post-v2.7.0
> >>         one chassis is specified, OVN will manage high availability for
> >> that
> >>         gateway.
> >>       * Add support for ACL logging.
> >> +     * ovn-northd now has native support for active-standby high
> >> availability.
> >>     - Tracing with ofproto/trace now traces through recirculation.
> >>     - OVSDB:
> >>       * New support for role-based access control (see ovsdb-server(1)).
> >> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
> >> index 1527e8a60..0d85ec0d2 100644
> >> --- a/ovn/northd/ovn-northd.8.xml
> >> +++ b/ovn/northd/ovn-northd.8.xml
> >> @@ -72,6 +72,15 @@
> >>        </dl>
> >>      </p>
> >>
> >> +    <h1>Active-Standby for High Availability</h1>
> >> +    <p>
> >> +      You may run <code>ovn-northd</code> more than once in an OVN
> >> deployment.
> >> +      OVN will automatically ensure that only one of them is active at
> a
> >> time.
> >> +      If multiple instances of <code>ovn-northd</code> are running and
> >> the
> >> +      active <code>ovn-northd</code> fails, one of the hot standby
> >> instances
> >> +      of <code>ovn-northd</code> will automatically take over.
> >> +    </p>
> >> +
> >>      <h1>Logical Flow Table Structure</h1>
> >>
> >>      <p>
> >> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> >> index 10e0c7ce0..3d2be4267 100644
> >> --- a/ovn/northd/ovn-northd.c
> >> +++ b/ovn/northd/ovn-northd.c
> >> @@ -6531,6 +6531,12 @@ main(int argc, char *argv[])
> >>      ovsdb_idl_add_column(ovnsb_idl_loop.idl,
> &sbrec_chassis_col_nb_cfg);
> >>      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name);
> >>
> >> +    /* Ensure that only a single ovn-northd is active in the deployment
> >> by
> >> +     * acquiring a lock called "ovn_northd" on the southbound database
> >> +     * and then only performing DB transactions if the lock is held. */
> >> +    ovsdb_idl_set_lock(ovnsb_idl_loop.idl, "ovn_northd");
> >> +    bool had_lock = false;
> >> +
> >>      /* Main loop. */
> >>      exiting = false;
> >>      while (!exiting) {
> >> @@ -6541,15 +6547,29 @@ main(int argc, char *argv[])
> >>              .ovnsb_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
> >>          };
> >>
> >> -        struct chassis_index chassis_index;
> >> -        chassis_index_init(&chassis_index, ctx.ovnsb_idl);
> >> +        if (!had_lock && ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
> >> +            VLOG_INFO("ovn-northd lock acquired. "
> >> +                      "This ovn-northd instance is now active.");
> >> +            had_lock = true;
> >> +        } else if (had_lock && !ovsdb_idl_has_lock(ovnsb_idl_loop.idl))
> {
> >> +            VLOG_INFO("ovn-northd lock lost. "
> >> +                      "This ovn-northd instance is now on standby.");
> >
> > Should it try lock again, if we want it to be standby? Otherwise, this
> > instance won't have a chance to be active any more.
>
> Good question ... I was assuming this scenario was due to a lost
> connection, and that the IDL would automatically try to re-acquire the
> lock.
>
> I tested to make sure I saw a second ovn-northd go from standby to
> active, but I have not tested active -> standby -> active again.
>
> I'll take a closer look at this before applying the patch.
>
>
I tested it and it works fine. active -> standby -> active scenario also
works fine.
I also tested by restarting southbound ovsdb-server. Once ovsdb-server is
up again, the idl clients  try to
acquire the lock and one of the ovn-northd instance becomes active again.
I don't think it is required to try lock again as idl client takes care of
it.

How about starting another instance of ovn-northd in the sandbox/test
environment so that active/standby
scenario gets tested for all the ovn tests ?

Acked-by: Numan Siddique <nusid...@redhat.com>

Thanks
Numan

>
> >> +            had_lock = false;
> >> +        }
> >>
> >> -        ovnnb_db_run(&ctx, &chassis_index, &ovnsb_idl_loop);
> >> -        ovnsb_db_run(&ctx, &ovnsb_idl_loop);
> >> -        if (ctx.ovnsb_txn) {
> >> -            check_and_add_supported_dhcp_opts_to_sb_db(&ctx);
> >> -            check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx);
> >> -            check_and_update_rbac(&ctx);
> >> +        struct chassis_index chassis_index;
> >> +        bool destroy_chassis_index = false;
> >> +        if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
> >> +            chassis_index_init(&chassis_index, ctx.ovnsb_idl);
> >> +            destroy_chassis_index = true;
> >> +
> >> +            ovnnb_db_run(&ctx, &chassis_index, &ovnsb_idl_loop);
> >> +            ovnsb_db_run(&ctx, &ovnsb_idl_loop);
> >> +            if (ctx.ovnsb_txn) {
> >> +                check_and_add_supported_dhcp_opts_to_sb_db(&ctx);
> >> +                check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx);
> >> +                check_and_update_rbac(&ctx);
> >> +            }
> >>          }
> >>
> >>          unixctl_server_run(unixctl);
> >> @@ -6565,7 +6585,9 @@ main(int argc, char *argv[])
> >>              exiting = true;
> >>          }
> >>
> >> -        chassis_index_destroy(&chassis_index);
> >> +        if (destroy_chassis_index) {
> >> +            chassis_index_destroy(&chassis_index);
> >> +        }
> >>      }
> >>
> >>      unixctl_server_destroy(unixctl);
> >> --
> >> 2.13.3
> >>
> >
> > Acked-by: Han Zhou <zhou...@gmail.com>
>
> Thanks for the review!
>
> --
> Russell Bryant
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to