On 14.08.2019 18:47, Michele Baldessari wrote:
> In some of our destructive testing of ovn-dbs inside containers managed
> by pacemaker we reached a situation where /var/run/openvswitch had
> empty .pid files. The current code does not deal well with them
> and pidfile_is_running() returns true in such a case and this confuses
> the OCF resource agent.
>
> - Before this change:
> Inside a container run:
> killall ovsdb-server;
> echo -n '' > /var/run/openvswitch/ovnnb_db.pid; echo -n '' >
> /var/run/openvswitch/ovnsb_db.pid
What about whitespaces?
I mean, if you'll write ' ' instead of '', the check 'test -n "$1"'
will succeed and the test will fail.
To handle this case we need to trim off whitespaces by the 'tr' utility
or change the proc checker to something like 'test -f /proc/"$1"/status'.
What do you think?
>
> We will observe that the cluster is unable to ever recover because
> it believes the ovn processes to be running when they really aren't and
> eventually just fails:
> podman container set: ovn-dbs-bundle
> [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest]
> ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master controller-0
> ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Stopped controller-1
> ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-2
>
> Let's make sure pid_exists() returns false when the pid is an empty
> string.
>
> - After this change the cluster is able to recover from this state and
> correctly start the resource:
> podman container set: ovn-dbs-bundle
> [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest]
> ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master controller-0
> ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Slave controller-1
> ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-2
>
> Fixes: 3028ce2595c8 ("ovs-lib: Allow "status" command to work as non-root.")
>
> Signed-off-by: Michele Baldessari <[email protected]>
> ---
> v1 -> v2
> ========
> - Implemented Ilya's suggestion and moved the check from
> pidfile_is_running() to pid_exists() and re-run my tests
> ---
> utilities/ovs-lib.in | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in
> index fa840ec637f5..dc485413ef0c 100644
> --- a/utilities/ovs-lib.in
> +++ b/utilities/ovs-lib.in
> @@ -127,7 +127,7 @@ fi
> pid_exists () {
> # This is better than "kill -0" because it doesn't require permission to
> # send a signal (so daemon_status in particular works as non-root).
> - test -d /proc/"$1"
> + test -n "$1" && test -d /proc/"$1"
> }
>
> pid_comm_check () {
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev