The branch, master has been updated via f5536ce ctdb: add test script for ctdb_mutex_ceph_rados_helper via 8aba284 ctdb/doc: man page for Ceph RADOS cluster mutex helper via d8b6186 ctdb: cluster mutex helper using Ceph RADOS via cbc81dd ctdb-build: configure time switch for etcd support via 8327117 ctdb-build: move ctdb_etcd_lock to utils/etcd via c7c2f15 ctdb-build: Generate pre-built documentation in wscript itself via 27bd4c9 ctdb-build: Avoid duplicate list of man pages from ee0475d lib/util: Fix indentation within routine description for dbghdrclass
https://git.samba.org/?p=samba.git;a=shortlog;h=master - Log ----------------------------------------------------------------- commit f5536ce1f6ab4e0764e4e806fb0fca5c43051f86 Author: David Disseldorp <dd...@samba.org> Date: Tue Dec 6 13:03:27 2016 +0100 ctdb: add test script for ctdb_mutex_ceph_rados_helper This standalone test script performs the following: - using ctdb_mutex_ceph_rados_helper, take a lock on the Ceph RADOS object a CLUSTER/$POOL/$OBJECT using the Ceph keyring for $USER + confirm that lock is obtained, via ctdb_mutex_ceph_rados_helper "0" output - check RADOS object lock state, using the "rados lock info" command - attempt to obtain the lock again, using ctdb_mutex_ceph_rados_helper + confirm that the lock is not successfully taken - tell the first locker to drop the lock and exit, via SIGTERM - once the first locker has exited, attempt to get the lock again + confirm that this attempt succeeds Signed-off-by: David Disseldorp <dd...@samba.org> Reviewed-by: Amitay Isaacs <ami...@gmail.com> Autobuild-User(master): Amitay Isaacs <ami...@samba.org> Autobuild-Date(master): Fri Dec 9 07:59:33 CET 2016 on sn-devel-144 commit 8aba284fc493815592de77847c6a990866a7afce Author: David Disseldorp <dd...@samba.org> Date: Thu Dec 1 14:22:45 2016 +0100 ctdb/doc: man page for Ceph RADOS cluster mutex helper Signed-off-by: David Disseldorp <dd...@samba.org> Reviewed-by: Amitay Isaacs <ami...@gmail.com> commit d8b61863ece6c5c231ac3e5b783c725864cfdad0 Author: David Disseldorp <dd...@samba.org> Date: Thu Dec 1 13:33:22 2016 +0100 ctdb: cluster mutex helper using Ceph RADOS ctdb_mutex_ceph_rados_helper implements the cluster mutex helper API atop Ceph using the librados rados_lock_exclusive()/rados_unlock() functionality. Once configured, split brain avoidance during CTDB recovery will be handled using locks against an object located in a Ceph RADOS pool. Signed-off-by: David Disseldorp <dd...@samba.org> Reviewed-by: Amitay Isaacs <ami...@gmail.com> commit cbc81dd78e4fe3c54e5930db0d1b89d1cdca367d Author: David Disseldorp <dd...@samba.org> Date: Tue Dec 6 13:52:47 2016 +0100 ctdb-build: configure time switch for etcd support Disable generation/installation of the etcd cluster mutex helper by default. Support can be explicitly enabled at configure time with --enable-etcd-reclock. Signed-off-by: David Disseldorp <dd...@samba.org> Reviewed-by: Amitay Isaacs <ami...@gmail.com> Reviewed-by: Martin Schwenke <mar...@meltin.net> commit 832711718e12d31b1cec185ed39227cfdf932c81 Author: David Disseldorp <dd...@samba.org> Date: Tue Dec 6 13:38:45 2016 +0100 ctdb-build: move ctdb_etcd_lock to utils/etcd Signed-off-by: David Disseldorp <dd...@samba.org> Reviewed-by: Amitay Isaacs <ami...@gmail.com> Reviewed-by: Martin Schwenke <mar...@meltin.net> commit c7c2f1588366e344fe8d909bb3d85e167a4eaa5f Author: Amitay Isaacs <ami...@gmail.com> Date: Thu Dec 8 16:47:16 2016 +1100 ctdb-build: Generate pre-built documentation in wscript itself Signed-off-by: Amitay Isaacs <ami...@gmail.com> Reviewed-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: David Disseldorp <dd...@samba.org> commit 27bd4c9eebc66387499aa0ce45a197ee57169690 Author: Amitay Isaacs <ami...@gmail.com> Date: Thu Dec 8 15:38:36 2016 +1100 ctdb-build: Avoid duplicate list of man pages Signed-off-by: Amitay Isaacs <ami...@gmail.com> Reviewed-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: David Disseldorp <dd...@samba.org> ----------------------------------------------------------------------- Summary of changes: ctdb/doc/Makefile | 23 -- ...er.1.xml => ctdb_mutex_ceph_rados_helper.7.xml} | 72 ++--- ctdb/utils/ceph/ctdb_mutex_ceph_rados_helper.c | 328 +++++++++++++++++++++ ctdb/utils/ceph/test_ceph_rados_reclock.sh | 151 ++++++++++ ctdb/{tools => utils/etcd}/ctdb_etcd_lock | 0 ctdb/wscript | 112 +++++-- 6 files changed, 595 insertions(+), 91 deletions(-) delete mode 100644 ctdb/doc/Makefile copy ctdb/doc/{ctdbd_wrapper.1.xml => ctdb_mutex_ceph_rados_helper.7.xml} (55%) create mode 100644 ctdb/utils/ceph/ctdb_mutex_ceph_rados_helper.c create mode 100755 ctdb/utils/ceph/test_ceph_rados_reclock.sh rename ctdb/{tools => utils/etcd}/ctdb_etcd_lock (100%) Changeset truncated at 500 lines: diff --git a/ctdb/doc/Makefile b/ctdb/doc/Makefile deleted file mode 100644 index 50ab719..0000000 --- a/ctdb/doc/Makefile +++ /dev/null @@ -1,23 +0,0 @@ -DOCS = ctdb.1 ctdb.1.html \ - ctdb_diagnostics.1 ctdb_diagnostics.1.html \ - ctdbd.1 ctdbd.1.html \ - ctdbd_wrapper.1 ctdbd_wrapper.1.html \ - onnode.1 onnode.1.html \ - ltdbtool.1 ltdbtool.1.html \ - ping_pong.1 ping_pong.1.html \ - ctdbd.conf.5 ctdbd.conf.5.html \ - ctdb.7 ctdb.7.html \ - ctdb-statistics.7 ctdb-statistics.7.html \ - ctdb-etcd.7 ctdb-etcd.7.html \ - ctdb-tunables.7 ctdb-tunables.7.html - -all: $(DOCS) - -%: %.xml - xsltproc -o $@ http://docbook.sourceforge.net/release/xsl/current/manpages/docbook.xsl $< - -%.html: %.xml - xsltproc -o $@ http://docbook.sourceforge.net/release/xsl/current/html/docbook.xsl $< - -distclean: - rm -f $(DOCS) diff --git a/ctdb/doc/ctdbd_wrapper.1.xml b/ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml similarity index 55% copy from ctdb/doc/ctdbd_wrapper.1.xml copy to ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml index b119681..bde5d72 100644 --- a/ctdb/doc/ctdbd_wrapper.1.xml +++ b/ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml @@ -2,68 +2,55 @@ <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"> - -<refentry id="ctdbd_wrapper.1"> +<refentry id="ctdb_mutex_ceph_rados_helper.7"> <refmeta> - <refentrytitle>ctdbd_wrapper</refentrytitle> - <manvolnum>1</manvolnum> + <refentrytitle>Ceph RADOS Mutex</refentrytitle> + <manvolnum>7</manvolnum> <refmiscinfo class="source">ctdb</refmiscinfo> <refmiscinfo class="manual">CTDB - clustered TDB database</refmiscinfo> </refmeta> <refnamediv> - <refname>ctdbd_wrapper</refname> - <refpurpose>Wrapper for ctdbd</refpurpose> + <refname>ctdb_mutex_ceph_rados_helper</refname> + <refpurpose>Ceph RADOS cluster mutex helper</refpurpose> </refnamediv> - <refsynopsisdiv> - <cmdsynopsis> - <command>ctdbd_wrapper</command> - <arg choice="req"><replaceable>PIDFILE</replaceable></arg> - <group choice="req"> - <arg choice="plain">start</arg> - <arg choice="plain">stop</arg> - </group> - </cmdsynopsis> - </refsynopsisdiv> - <refsect1> <title>DESCRIPTION</title> <para> - ctdbd_wrapper is used to start or stop the main CTDB daemon. - </para> - - <para> - <replaceable>PIDFILE</replaceable> specifies the location of the - file containing the PID of the main CTDB daemon. - </para> - - <para> - ctdbd_wrapper constructs command-line options for ctdbd from - configuration variables specified in - <citerefentry><refentrytitle>ctdbd.conf</refentrytitle> - <manvolnum>5</manvolnum></citerefentry>. + ctdb_mutex_ceph_rados_helper can be used as a recovery lock provider + for CTDB. When configured, split brain avoidance during CTDB recovery + will be handled using locks against an object located in a Ceph RADOS + pool. + To enable this functionality, include the following line in your CTDB + config file: </para> + <screen format="linespecific"> +CTDB_RECOVERY_LOCK="!ctdb_mutex_ceph_rados_helper [Cluster] [User] [Pool] [Object]" +Cluster: Ceph cluster name (e.g. ceph) +User: Ceph cluster user name (e.g. client.admin) +Pool: Ceph RADOS pool name +Object: Ceph RADOS object name + </screen> <para> - See <citerefentry><refentrytitle>ctdb</refentrytitle> - <manvolnum>7</manvolnum></citerefentry> for an overview of CTDB. + The Ceph cluster <parameter>Cluster</parameter> must be up and running, + with a configuration, and keyring file for <parameter>User</parameter> + located in a librados default search path (e.g. /etc/ceph/). + <parameter>Pool</parameter> must already exist. </para> </refsect1> <refsect1> <title>SEE ALSO</title> <para> - <citerefentry><refentrytitle>ctdbd</refentrytitle> - <manvolnum>1</manvolnum></citerefentry>, - - <citerefentry><refentrytitle>ctdbd.conf</refentrytitle> - <manvolnum>5</manvolnum></citerefentry>, - <citerefentry><refentrytitle>ctdb</refentrytitle> <manvolnum>7</manvolnum></citerefentry>, + <citerefentry><refentrytitle>ctdbd</refentrytitle> + <manvolnum>1</manvolnum></citerefentry>, + <ulink url="http://ctdb.samba.org/"/> </para> </refsect1> @@ -71,16 +58,13 @@ <refentryinfo> <author> <contrib> - This documentation was written by - Amitay Isaacs, - Martin Schwenke + This documentation was written by David Disseldorp </contrib> </author> <copyright> - <year>2007</year> - <holder>Andrew Tridgell</holder> - <holder>Ronnie Sahlberg</holder> + <year>2016</year> + <holder>David Disseldorp</holder> </copyright> <legalnotice> <para> diff --git a/ctdb/utils/ceph/ctdb_mutex_ceph_rados_helper.c b/ctdb/utils/ceph/ctdb_mutex_ceph_rados_helper.c new file mode 100644 index 0000000..326a0b0 --- /dev/null +++ b/ctdb/utils/ceph/ctdb_mutex_ceph_rados_helper.c @@ -0,0 +1,328 @@ +/* + CTDB mutex helper using Ceph librados locks + + Copyright (C) David Disseldorp 2016 + + Based on ctdb_mutex_fcntl_helper.c, which is: + Copyright (C) Martin Schwenke 2015 + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, see <http://www.gnu.org/licenses/>. +*/ + +#include "replace.h" + +#include "tevent.h" +#include "talloc.h" +#include "rados/librados.h" + +#define CTDB_MUTEX_CEPH_LOCK_NAME "ctdb_reclock_mutex" +#define CTDB_MUTEX_CEPH_LOCK_COOKIE CTDB_MUTEX_CEPH_LOCK_NAME +#define CTDB_MUTEX_CEPH_LOCK_DESC "CTDB recovery lock" + +#define CTDB_MUTEX_STATUS_HOLDING "0" +#define CTDB_MUTEX_STATUS_CONTENDED "1" +#define CTDB_MUTEX_STATUS_TIMEOUT "2" +#define CTDB_MUTEX_STATUS_ERROR "3" + +static char *progname = NULL; + +static int ctdb_mutex_rados_ctx_create(const char *ceph_cluster_name, + const char *ceph_auth_name, + const char *pool_name, + rados_t *_ceph_cluster, + rados_ioctx_t *_ioctx) +{ + rados_t ceph_cluster = NULL; + rados_ioctx_t ioctx = NULL; + int ret; + + ret = rados_create2(&ceph_cluster, ceph_cluster_name, ceph_auth_name, 0); + if (ret < 0) { + fprintf(stderr, "%s: failed to initialise Ceph cluster %s as %s" + " - (%s)\n", progname, ceph_cluster_name, ceph_auth_name, + strerror(-ret)); + return ret; + } + + /* path=NULL tells librados to use default locations */ + ret = rados_conf_read_file(ceph_cluster, NULL); + if (ret < 0) { + fprintf(stderr, "%s: failed to parse Ceph cluster config" + " - (%s)\n", progname, strerror(-ret)); + rados_shutdown(ceph_cluster); + return ret; + } + + ret = rados_connect(ceph_cluster); + if (ret < 0) { + fprintf(stderr, "%s: failed to connect to Ceph cluster %s as %s" + " - (%s)\n", progname, ceph_cluster_name, ceph_auth_name, + strerror(-ret)); + rados_shutdown(ceph_cluster); + return ret; + } + + + ret = rados_ioctx_create(ceph_cluster, pool_name, &ioctx); + if (ret < 0) { + fprintf(stderr, "%s: failed to create Ceph ioctx for pool %s" + " - (%s)\n", progname, pool_name, strerror(-ret)); + rados_shutdown(ceph_cluster); + return ret; + } + + *_ceph_cluster = ceph_cluster; + *_ioctx = ioctx; + + return 0; +} + +static void ctdb_mutex_rados_ctx_destroy(rados_t ceph_cluster, + rados_ioctx_t ioctx) +{ + rados_ioctx_destroy(ioctx); + rados_shutdown(ceph_cluster); +} + +static int ctdb_mutex_rados_lock(rados_ioctx_t *ioctx, + const char *oid) +{ + int ret; + + ret = rados_lock_exclusive(ioctx, oid, + CTDB_MUTEX_CEPH_LOCK_NAME, + CTDB_MUTEX_CEPH_LOCK_COOKIE, + CTDB_MUTEX_CEPH_LOCK_DESC, + NULL, /* infinite duration */ + 0); + if ((ret == -EEXIST) || (ret == -EBUSY)) { + /* lock contention */ + return ret; + } else if (ret < 0) { + /* unexpected failure */ + fprintf(stderr, + "%s: Failed to get lock on RADOS object '%s' - (%s)\n", + progname, oid, strerror(-ret)); + return ret; + } + + /* lock obtained */ + return 0; +} + +static int ctdb_mutex_rados_unlock(rados_ioctx_t *ioctx, + const char *oid) +{ + int ret; + + ret = rados_unlock(ioctx, oid, + CTDB_MUTEX_CEPH_LOCK_NAME, + CTDB_MUTEX_CEPH_LOCK_COOKIE); + if (ret < 0) { + fprintf(stderr, + "%s: Failed to drop lock on RADOS object '%s' - (%s)\n", + progname, oid, strerror(-ret)); + return ret; + } + + return 0; +} + +struct ctdb_mutex_rados_state { + bool holding_mutex; + const char *ceph_cluster_name; + const char *ceph_auth_name; + const char *pool_name; + const char *object; + int ppid; + struct tevent_context *ev; + struct tevent_signal *sig_ev; + struct tevent_timer *timer_ev; + rados_t ceph_cluster; + rados_ioctx_t ioctx; +}; + +static void ctdb_mutex_rados_sigterm_cb(struct tevent_context *ev, + struct tevent_signal *se, + int signum, + int count, + void *siginfo, + void *private_data) +{ + struct ctdb_mutex_rados_state *cmr_state = private_data; + int ret; + + if (!cmr_state->holding_mutex) { + fprintf(stderr, "Sigterm callback invoked without mutex!\n"); + ret = -EINVAL; + goto err_ctx_cleanup; + } + + ret = ctdb_mutex_rados_unlock(cmr_state->ioctx, cmr_state->object); +err_ctx_cleanup: + ctdb_mutex_rados_ctx_destroy(cmr_state->ceph_cluster, + cmr_state->ioctx); + talloc_free(cmr_state); + exit(ret ? 1 : 0); +} + +static void ctdb_mutex_rados_timer_cb(struct tevent_context *ev, + struct tevent_timer *te, + struct timeval current_time, + void *private_data) +{ + struct ctdb_mutex_rados_state *cmr_state = private_data; + int ret; + + if (!cmr_state->holding_mutex) { + fprintf(stderr, "Timer callback invoked without mutex!\n"); + ret = -EINVAL; + goto err_ctx_cleanup; + } + + if ((kill(cmr_state->ppid, 0) == 0) || (errno != ESRCH)) { + /* parent still around, keep waiting */ + cmr_state->timer_ev = tevent_add_timer(cmr_state->ev, cmr_state, + tevent_timeval_current_ofs(5, 0), + ctdb_mutex_rados_timer_cb, + cmr_state); + if (cmr_state->timer_ev == NULL) { + fprintf(stderr, "Failed to create timer event\n"); + /* rely on signal cb */ + } + return; + } + + /* parent ended, drop lock and exit */ + ret = ctdb_mutex_rados_unlock(cmr_state->ioctx, cmr_state->object); +err_ctx_cleanup: + ctdb_mutex_rados_ctx_destroy(cmr_state->ceph_cluster, + cmr_state->ioctx); + talloc_free(cmr_state); + exit(ret ? 1 : 0); +} + +int main(int argc, char *argv[]) +{ + int ret; + struct ctdb_mutex_rados_state *cmr_state; + + progname = argv[0]; + + if (argc != 5) { + fprintf(stderr, "Usage: %s <Ceph Cluster> <Ceph user> " + "<RADOS pool> <RADOS object>\n", + progname); + ret = -EINVAL; + goto err_out; + } + + ret = setvbuf(stdout, NULL, _IONBF, 0); + if (ret != 0) { + fprintf(stderr, "Failed to configure unbuffered stdout I/O\n"); + } + + cmr_state = talloc_zero(NULL, struct ctdb_mutex_rados_state); + if (cmr_state == NULL) { + fprintf(stdout, CTDB_MUTEX_STATUS_ERROR); + ret = -ENOMEM; + goto err_out; + } + + cmr_state->ceph_cluster_name = argv[1]; + cmr_state->ceph_auth_name = argv[2]; + cmr_state->pool_name = argv[3]; + cmr_state->object = argv[4]; + + cmr_state->ppid = getppid(); + if (cmr_state->ppid == 1) { + /* + * The original parent is gone and the process has + * been reparented to init. This can happen if the + * helper is started just as the parent is killed + * during shutdown. The error message doesn't need to + * be stellar, since there won't be anything around to + * capture and log it... + */ + fprintf(stderr, "%s: PPID == 1\n", progname); + ret = -EPIPE; + goto err_state_free; + } + + cmr_state->ev = tevent_context_init(cmr_state); + if (cmr_state->ev == NULL) { + fprintf(stderr, "tevent_context_init failed\n"); + fprintf(stdout, CTDB_MUTEX_STATUS_ERROR); + ret = -ENOMEM; + goto err_state_free; + } + + /* wait for sigterm */ + cmr_state->sig_ev = tevent_add_signal(cmr_state->ev, cmr_state, SIGTERM, 0, + ctdb_mutex_rados_sigterm_cb, + cmr_state); + if (cmr_state->sig_ev == NULL) { + fprintf(stderr, "Failed to create signal event\n"); + fprintf(stdout, CTDB_MUTEX_STATUS_ERROR); + ret = -ENOMEM; + goto err_state_free; + } + + /* periodically check parent */ + cmr_state->timer_ev = tevent_add_timer(cmr_state->ev, cmr_state, + tevent_timeval_current_ofs(5, 0), + ctdb_mutex_rados_timer_cb, + cmr_state); + if (cmr_state->timer_ev == NULL) { + fprintf(stderr, "Failed to create timer event\n"); + fprintf(stdout, CTDB_MUTEX_STATUS_ERROR); + ret = -ENOMEM; + goto err_state_free; + } + + ret = ctdb_mutex_rados_ctx_create(cmr_state->ceph_cluster_name, + cmr_state->ceph_auth_name, + cmr_state->pool_name, + &cmr_state->ceph_cluster, + &cmr_state->ioctx); + if (ret < 0) { + fprintf(stdout, CTDB_MUTEX_STATUS_ERROR); + goto err_state_free; + } + + ret = ctdb_mutex_rados_lock(cmr_state->ioctx, cmr_state->object); + if ((ret == -EEXIST) || (ret == -EBUSY)) { + fprintf(stdout, CTDB_MUTEX_STATUS_CONTENDED); + goto err_ctx_cleanup; + } else if (ret < 0) { + fprintf(stdout, CTDB_MUTEX_STATUS_ERROR); + goto err_ctx_cleanup; + } + + cmr_state->holding_mutex = true; + fprintf(stdout, CTDB_MUTEX_STATUS_HOLDING); + + /* wait for the signal / timer events to do their work */ + ret = tevent_loop_wait(cmr_state->ev); + if (ret < 0) { + goto err_ctx_cleanup; + } +err_ctx_cleanup: + ctdb_mutex_rados_ctx_destroy(cmr_state->ceph_cluster, + cmr_state->ioctx); +err_state_free: + talloc_free(cmr_state); +err_out: + return ret ? 1 : 0; +} diff --git a/ctdb/utils/ceph/test_ceph_rados_reclock.sh b/ctdb/utils/ceph/test_ceph_rados_reclock.sh new file mode 100755 index 0000000..1adacf6 --- /dev/null +++ b/ctdb/utils/ceph/test_ceph_rados_reclock.sh @@ -0,0 +1,151 @@ +#!/bin/bash +# standalone test for ctdb_mutex_ceph_rados_helper +# +# Copyright (C) David Disseldorp 2016 +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# -- Samba Shared Repository