[ClusterLabs Developers] resource-agents send_arp pull request, fixes RH Bugz#1250728

2015-09-11 Thread Lars Ellenberg
Just to make you all aware of

IPaddr2 send_arp causes a buffer overflow on infiniband devices
https://bugzilla.redhat.com/show_bug.cgi?id=1250728

send_arp: fix for infiniband, re-merge from upstream iputils arping #654 
https://github.com/ClusterLabs/resource-agents/pull/654

If no-one has any objections, I'll merge that next week.

We can then discuss how to best clean up the IPaddr* RA
(drop references to "ipoibarping").

Lars


___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] Proposed future feature: multiple notification scripts

2015-12-07 Thread Lars Ellenberg
On Thu, Dec 03, 2015 at 12:56:36PM -0600, Ken Gaillot wrote:
> On 12/03/2015 10:58 AM, Klaus Wenninger wrote:
> > On 12/03/2015 05:39 PM, Jan Pokorný wrote:
> >> On 03/12/15 17:06 +0100, Klaus Wenninger wrote:
> >>> On 12/03/2015 04:45 PM, Jan Pokorný wrote:
> >>>> What I am up to is a proposal of an alternative/parallel mechanism
> >>>> that better fits the asymmetric (and asynchronous from cluster life
> >>>> cycle POV) use cases: old good drop-in files.  There would simply
> >>>> be a dedicated directory (say /usr/share/pacemaker/notify.d) where
> >>>> the software interested in notifications would craft it's own
> >>>> listener script (or a symlink thereof), script is then discovered
> >>>> by Pacemaker upon subsequent dir rescan or inotify event, done.
> 
> I wouldn't want to introduce a dependency on inotify (especially since
> we support non-Linux OSes), and scanning a directory is an expensive
> operation. We definitely wouldn't want to do rescan for every cluster
> event. Possibly every recheck-interval would be OK. So one question is
> how do we trigger a rescan, and whether that might lead to confusion as
> to why a drop-in isn't (immediately) working.
> 
> But I do think it's an interesting approach worth exploring.

Why. Don't put that burden on Pacemaker.

If someone wants this,
have pacemaker call out to one "master" script,
which then does something similar as "run-parts",
with whatever is "there" at that point.

Passing through arguments/environment/whatever.

Implement one such "example" script,
and maybe ship it together with agents, maybe at
/usr/lib/ocf/notification/pacemaker/run-parts

Craft a few other example notify scripts which would just send a
standardized email blob, or snmp or whatever, which could be used either
directly (instead of the "run-parts" thingy, or via the "run-parts"
thing by "symlinking" them in place.
Should work just the same,
if we cleanly pass through environment and arguments.

Have users contribute their solutions.

> >>>> --> no configuration needed (or is external to the CIB, or is
> >>>> interspersed in a non-invasive way there), install and go

Exactly.
If you want this, you call just one script, which is part of the
pacemaker infrastructure, and as such in place on every node.
If you then chose to put different "sub" scripts in place
on different nodes, that will be your responsibility.

> >>> Why a dedicated directory so that you can't see in the cib that
> >>> some kind of notification is enabled?
> >> But you cannot see it truthfully in CIB either!

Now you can see in the CIB that notifications are enabled,
but if you have the one script being called notify directly,
or call a bunch of other scripts sequentially/in parallel/in the
background, that's your thing, and not on pacemaker.

Keep it simple on the CIB and Pacemaker side.

If you want to multiplex in the notification script,
do so.

If not, don't.

Pacemaker should not care.


-- 
: Lars Ellenberg
: http://www.LINBIT.com | Your Way to High Availability
: DRBD, Linux-HA  and  Pacemaker support and consulting

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] [booth][sbd] GPLv2.1+ clarification request

2016-03-20 Thread Lars Ellenberg
On Thu, Mar 17, 2016 at 07:12:19PM +0100, Dejan Muhamedagic wrote:
> Hi Jan,
> 
> On Thu, Mar 17, 2016 at 06:47:37PM +0100, Jan Pokorný wrote:
> > Hello all,
> > 
> > during latest reviews of packages building on core cluster
> > infrastructure, it turned out there is a frequented (viral?) issue
> > with source files declaring unusual licence: GPLv2.1+.
> 
> Yes, I'd say that it's all coming from a single source. I suspect
> that nobody's looking at the license, just copies another source
> file from the same project. Anyway, that's what I did in booth.
> 
> Who created the first file with this non-existent license is
> anybody's guess. It could probably be traced, but I doubt that
> it'd help in any way.

Actually it might.

I think that what happened was this:

in the early days of heartbeat, way back when,
source code got "batch tagged" with the license statement:
http://hg.linux-ha.org/heartbeat-STABLE_3_0/rev/4a67fde00b0b#l1.10
2000/07/26 05:17:18

Most stuff got tagged with the LGPL 2.1.

Some time later, someone noticed that in some cases,
a "program" is not a "library", and tried to re-tag
e.g. "api_test.c" with the GPL 2,
but without properly taking the actual suggested GPL 2 stanza,
but by simply dropping "Lesser" and changing "library" to "software".
http://hg.linux-ha.org/heartbeat-STABLE_3_0/rev/bc508513c4dc#l2.10
2000/08/31 05:23:36

 :-(

Both changes predate the GPLv3 by seven years.

>From there it propagated to ipfail.c and attrd.c, which both became
*the* template files to start from when writing daemons and extensions
using the API.

Developers quickly browse their "template",
their "auto-correct" filter reads "GPL 2",
which they are content with,
and in good faith they hack away.

I think it is safe to assume that any developer copying from there meant
to "stay in project" regarding the licensing.

So I move to change it to GPLv2+, for everything that is a "program",
and LGPLv2.1 for everything that may be viewed as a libraray.

At least that's how I will correct the wording in the
affected files in the heartbeat mercurial.

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support

DRBD® and LINBIT® are registered trademarks of LINBIT

___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


[ClusterLabs Developers] RFC PATCH crm_mon consistently report ms resource states

2016-06-01 Thread Lars Ellenberg

I had sent this as a pull request,
but I seem to have github troubles right now :-/

Thanks,
Lars
>From 4acd6327a949a3836fa7bb1851f758d4474cd05d Mon Sep 17 00:00:00 2001
From: Lars Ellenberg 
Date: Wed, 1 Jun 2016 14:16:49 +0200
Subject: [PATCH] crm_mon: consistently print ms resource state

 * consistently use 'Slave', not sometimes 'Slave', sometimes 'Started'
 * only report target role, if it is the limitation (target-role:Slave)
   Both target-role Master and target-role Started
   do not limit us in any way.
---
 lib/pengine/clone.c  |  2 +-
 lib/pengine/native.c | 19 ++-
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/lib/pengine/clone.c b/lib/pengine/clone.c
index af37a9d..663a935 100644
--- a/lib/pengine/clone.c
+++ b/lib/pengine/clone.c
@@ -528,7 +528,7 @@ clone_print(resource_t * rsc, const char *pre_text, long options, void *print_da
 if(rsc->variant == pe_master) {
 enum rsc_role_e role = configured_role(rsc);
 
-if(role > RSC_ROLE_STOPPED && role < RSC_ROLE_MASTER) {
+if(role == RSC_ROLE_SLAVE) {
 short_print(list_text, child_text, "Slaves (target-role)", NULL, options, print_data);
 } else {
 short_print(list_text, child_text, "Slaves", NULL, options, print_data);
diff --git a/lib/pengine/native.c b/lib/pengine/native.c
index 151b235..d25de74 100644
--- a/lib/pengine/native.c
+++ b/lib/pengine/native.c
@@ -440,6 +440,7 @@ native_print(resource_t * rsc, const char *pre_text, long options, void *print_d
 const char *class = crm_element_value(rsc->xml, XML_AGENT_ATTR_CLASS);
 const char *kind = crm_element_value(rsc->xml, XML_ATTR_TYPE);
 const char *target_role = NULL;
+enum rsc_role_e role = rsc->role;
 
 int offset = 0;
 int flagOffset = 0;
@@ -458,6 +459,10 @@ native_print(resource_t * rsc, const char *pre_text, long options, void *print_d
 target_role = g_hash_table_lookup(rsc->meta, XML_RSC_ATTR_TARGET_ROLE);
 }
 
+if(role == RSC_ROLE_STARTED && uber_parent(rsc)->variant == pe_master) {
+role = RSC_ROLE_SLAVE;
+}
+
 if (pre_text == NULL && (options & pe_print_printf)) {
 pre_text = " ";
 }
@@ -508,18 +513,17 @@ native_print(resource_t * rsc, const char *pre_text, long options, void *print_d
 if(is_set(rsc->flags, pe_rsc_orphan)) {
 offset += snprintf(buffer + offset, LINE_MAX - offset, " ORPHANED ");
 }
-if(rsc->role > RSC_ROLE_SLAVE && is_set(rsc->flags, pe_rsc_failed)) {
-offset += snprintf(buffer + offset, LINE_MAX - offset, "FAILED %s", role2text(rsc->role));
+if(role > RSC_ROLE_SLAVE && is_set(rsc->flags, pe_rsc_failed)) {
+offset += snprintf(buffer + offset, LINE_MAX - offset, "FAILED %s", role2text(role));
 } else if(is_set(rsc->flags, pe_rsc_failed)) {
 offset += snprintf(buffer + offset, LINE_MAX - offset, "FAILED");
 } else {
 const char *rsc_state = NULL;
-
 if (options & pe_print_pending) {
 rsc_state = native_pending_state(rsc);
 }
 if (rsc_state == NULL) {
-rsc_state = role2text(rsc->role);
+rsc_state = role2text(role);
 }
 offset += snprintf(buffer + offset, LINE_MAX - offset, "%s", rsc_state);
 }
@@ -545,16 +549,13 @@ native_print(resource_t * rsc, const char *pre_text, long options, void *print_d
 
 /* Ignore target role Started, as it is the default anyways
  * (and would also allow a Master to be Master).
- * Show if current role differs from target role,
- * or if target role limits our abilities. */
+ * Show if target role limits our abilities. */
 if (target_role_e == RSC_ROLE_STOPPED) {
 flagOffset += snprintf(flagBuffer + flagOffset, LINE_MAX - flagOffset, "%sdisabled", flagOffset?", ":"");
 rsc->cluster->disabled_resources++;
 
 } else if (uber_parent(rsc)->variant == pe_master
-   && target_role_e > RSC_ROLE_STOPPED
-   && target_role_e < RSC_ROLE_MASTER
-   && safe_str_neq(target_role, role2text(rsc->role))) {
+   && target_role_e == RSC_ROLE_SLAVE) {
 flagOffset += snprintf(flagBuffer + flagOffset, LINE_MAX - flagOffset, "%starget-role:%s", flagOffset?", ":"", target_role);
 rsc->cluster->disabled_resources++;
 }
-- 
1.9.1

___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] Resurrecting OCF

2016-07-19 Thread Lars Ellenberg
On Mon, Jul 18, 2016 at 11:13:42AM -0500, Ken Gaillot wrote:
> A suggestion came up recently to formalize a new version of the OCF
> resource agent API standard[1].
> 
> The main goal would be to formalize the API as it is actually used
> today, and to replace the "unique" meta-data attribute with two new
> attributes indicating uniqueness and reloadability. We could also add
> the fence agent API as a new spec, or expand the RA spec to cover both.
> 
> The Open Cluster Framework (OCF) was originally created in 2003 as a
> working group of the Free Standards Group, which itself was absorbed
> into the Linux Foundation in 2007. Today, OCF isn't even listed among
> the Linux Foundation's workgroups[2].
> 
> The opencf.org domain name (and, IIRC, website) is owned by Linbit.

To link with the last few times this came up:

 From: Lars Ellenberg, 2015-02-10 20:44:40 GMT
 Reviving the OCF OpenCF.org Open cluster framework
http://thread.gmane.org/gmane.comp.clustering.opencf.general/31/focus=31
http://thread.gmane.org/gmane.comp.clustering.opencf.general/31/focus=46

^^ that one seems relevant.
Basically "we all" agreed that $someone should do $something,
and we would point all available pointers we still
have control over to the result.

As it frequently is, no-one felt to be $someone,
and so $something still needs to be done :-)

Thanks for picking it up again!

Re "LF work group": I'm very bad with bureaucracy myself...

Older:

 From: Marek "marx" Grac, 2014-10-27 10:23:05 GMT
 Who will maintain OCF standard? Proposal for changes in DTD for fence/resource 
agents
http://thread.gmane.org/gmane.comp.clustering.opencf.general/21
(in which Linbit was given opencf and linux-ha domains)

 ...
http://thread.gmane.org/gmane.comp.clustering.opencf.general/

> Does anyone have opinions about this, or suggestions about where to
> start? Should we keep it on the mailing lists, and focus on the RA API?
> Or should we figure out the status of OCF itself as an entity, and
> resurrect it somehow?
> 
> [1]
> http://www.opencf.org/cgi-bin/viewcvs.cgi/specs/ra/resource-agent-api.txt?rev=HEAD
> [2] http://www.linuxfoundation.org/collaborate/workgroups
> -- 
> Ken Gaillot 

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support

DRBD® and LINBIT® are registered trademarks of LINBIT

___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] [Linux-ha-dev] moving cluster-glue to github

2016-10-10 Thread Lars Ellenberg
On Mon, Oct 10, 2016 at 12:07:48PM +0200, Kristoffer Grönlund wrote:
> Adam Spiers  writes:
> 
> > Kristoffer Gronlund  wrote:
> >> We've discussed moving cluster-glue to github.com and the ClusterLabs
> >> organization, but no one has actually done it yet. ;)
> >
> > Out of curiosity what needs to be done for this, other than the
> > obvious "git push" to github, and maybe updating a README / wiki page
> > or two?
> >
> 
> The main thing would be to ensure that everyone who maintains it agrees
> to the move. AFAIK at least Lars Ellenberg and Dejan are both in favor,
> but I am not sure who else might be considered an owner of
> cluster-glue.

Appart from LINBIT, SuSE is the only relevant entity here, I'd say.
We are apparently the only ones that (still) ship cluster-glue.

I'd argue to "let cluster-glue die in peace" [*]

Contributions, if any, will probably only be new stonith/external scripts.
Bring them over to "fence_agents".

Cannot be that difficult to generically bring "stonith/*"
or even only "stonith/external" agents over to the
"fence_agents" compatible API.

Not even necessarily convert to python,
probably one python wrapper would be good enough
for all stonith/external style agents.

[*] the same as you all let "heartbeat"
(the membership and comm layer, not the concept)
die in (mostly) peace.
Yes, still working and maintained and all.  Still alive.
But not really kicking.



It's not that there is a steady stream of changes
and contributions for cluster glue.

The only relevant commits in the last six years have been from Linbit
(almost exclusively me), or from SuSE (almost exclusively by Dejan), in
part inspired or on behalf of work or contributions by NTT.

Apart from a few cosmetic patches, the only changes have been to
"hb_report" (which pacemaker feels is obsoleted by crm_report, which
was a fork of hb_report; probably any feature in hb_report that is not
(yet/anymore) in crm_report could and should be merged back, and
hb_report be dropped; maybe that has already happened).

Now, with Dejan no longer being SuSE, that probably leaves lge, lmb, and
krig to "agree" on anything there.

I have no problem using git-remote-hg or mercurial-git to push it over,
or even keep them in sync, having one reflect the other.

Not that it makes any difference, really.
But as I said, I'd prefer to leave it alone if possible.



Some historical background for those interested:
the original idea of "stonith plugins" was to have
stonith agents available as a collection of
binary shared objects that could be loaded and mlock'ed
early on, so fencing would work even in "bad situations",
without requiring additional IO or (much) allocations.

Then that was weakened for the "stonith/external" agents,
which would usually be (shell) scripts.

The result was that usually only those scripts would be used.

Now, the "fence agents" are a collection of python scripts,
with few agents still being in C, but the idea of having the
stonith mechanism be "preloaded" has been abandoned completely.

Arguably, if you have not enough resources left to timely execute some
fencing script, you should not be in control of the cluster, and
certainly not try to take over more resources of the to-be-fenced node.

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support

DRBD® and LINBIT® are registered trademarks of LINBIT

___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] moving cluster-glue to github

2016-10-19 Thread Lars Ellenberg
On Tue, Oct 11, 2016 at 08:59:45AM +0200, Kristoffer Grönlund wrote:
> Andrew Beekhof  writes:
> 
> >> On 10 Oct 2016, at 8:45 PM, Adam Spiers  wrote:
> >> 
> >> Kristoffer Gronlund  wrote:
> >>> We've discussed moving cluster-glue to github.com and the ClusterLabs
> >>> organization, but no one has actually done it yet. ;)
> >
> > I’m not sure I realised that.
> > I’ve created a blank repo, just:
> >
> > git remote add origin g...@github.com:ClusterLabs/cluster-glue.git
> > git push -u origin master
> >
> > I cheated and used the same team as crmsh, let me know if you’d prefer a 
> > separate one.
> >
> 
> Thanks!
> 
> I have imported the cluster-glue repository, so it should now be in the
> same state as the mercurial repository. I also updated the README to
> actually contain a brief description of the project (stolen from the
> linux-ha.org wiki).
> 
> https://github.com/ClusterLabs/cluster-glue

The import-by-github of the mercurial resulted in a usable tree,
but in a very bogus history.
bogus as in needless merges, number of commits without parents
just popping into existence out of nowhere, other similar crap.

Just compare the resulting commit history using gitk, qgit,
git log --graph or whatever you like, and also compare that
to the original mercurial using hg log -G or similar.

I re-imported using git-remote-hg
(git remote add lha hg::http://hg.linux-ha.org/glue)
and force-pushed that over to github ClusterLabs/cluster-glue master

kept the old github imported master for reference for now as
master-as-originally-imported-by-github-hg-importer
08bec629045c285746046796c84ae71f2ef34016

But I intend to delete that branch eventually.

Every interested party:
please re-clone, or hard reset,
and rebase anything you may have against that new clusterlabs master.

This also should make it "easy" to keep the hg repo in sync,
as long as we may want to do that. Before we finally close it down
completely.

> What remains is to update the mercurial repository and the linux-ha.org
> wiki to point to the github repository instead of the hg one. Dejan or
> Lars, could you take care of this part?

I'll look into that "later".
Feel free to poke me in case it falls through...

Thanks,

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support

DRBD® and LINBIT® are registered trademarks of LINBIT

___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] checking all procs on system enough during stop action?

2017-04-24 Thread Lars Ellenberg
On Mon, Apr 24, 2017 at 04:34:07PM +0200, Jehan-Guillaume de Rorthais wrote:
> Hi all,
> 
> In the PostgreSQL Automatic Failover (PAF) project, one of most frequent
> negative feedback we got is how difficult it is to experience with it because 
> of
> fencing occurring way too frequently. I am currently hunting this kind of
> useless fencing to make life easier.
> 
> It occurs to me, a frequent reason of fencing is because during the stop
> action, we check the status of the PostgreSQL instance using our monitor
> function before trying to stop the resource. If the function does not return
> OCF_NOT_RUNNING, OCF_SUCCESS or OCF_RUNNING_MASTER, we just raise an error,
> leading to a fencing. See:
> https://github.com/dalibo/PAF/blob/d50d0d783cfdf5566c3b7c8bd7ef70b11e4d1043/script/pgsqlms#L1291-L1301
> 
> I am considering adding a check to define if the instance is stopped even if 
> the
> monitor action returns an error. The idea would be to parse **all** the local
> processes looking for at least one pair of "/proc//{comm,cwd}" related to
> the PostgreSQL instance we want to stop. If none are found, we consider the
> instance is not running. Gracefully or not, we just know it is down and we can
> return OCF_SUCCESS.
> 
> Just for completeness, the piece of code would be:
> 
>my @pids;
>foreach my $f (glob "/proc/[0-9]*") {
>push @pids => basename($f)
>if -r $f
>and basename( readlink( "$f/exe" ) ) eq "postgres"
>and readlink( "$f/cwd" ) eq $pgdata;
>}
> 
> I feels safe enough to me. The only risk I could think of is in a shared disk
> cluster with multiple nodes accessing the same data in RW (such setup can
> fail in so many ways :)). However, PAF is not supposed to work in such 
> context,
> so I can live with this.
> 
> Do you guys have some advices? Do you see some drawbacks? Hazards?

Isn't that the wrong place to "fix" it?
Why did your _monitor  return something "weird"?
What did it return?
Should you not fix it there?

Just thinking out loud.

Cheers,
Lars


___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2017-09-26 Thread Lars Ellenberg
On Mon, May 22, 2017 at 12:26:36PM -0500, Ken Gaillot wrote:
> Resurrecting an old thread, because I stumbled on something relevant ...

/me too :-)

> There had been some discussion about having the ability to run a more
> useful monitor operation on an otherwise systemd-based resource. We had
> talked about a couple approaches with advantages and disadvantages.
> 
> I had completely forgotten about an older capability of pacemaker that
> could be repurposed here: the (undocumented) "container" meta-attribute.

Which is nice to know.

The wrapper approach is appealing as well, though.

I have just implemented a PoC ocf:pacemaker:systemd "wrapper" RA,
to give my brain something different to do for a change.

Takes two parameters,
unit=(systemd unit), and
monitor_hook=(some executable)

The monitor_hook has access to the environment, obviously,
in case it needs that.  For monitor, it will only be called,
if "systemctl is-active" thinks the thing is active.

It is expected to return 0 (OCF_SUCCESS) for "running",
and 7 (OCF_NOT_RUNNING) for "not running".  It can return anything else,
all exit codes are directly propagated for the "monitor" action.
"Unexpected" exit codes will be logged with ocf_exit_reason
(does that make sense?).

systemctl start and stop commands apparently are "synchronous"
(have always been? only nowadays? is that relevant?)
but to be so, they need properly written unit files.
If there is an ExecStop command defined which will only trigger
stopping, but not wait for it, systemd cannot wait, either
(it has no way to know what it should wait for in that case),
and no-one should blame systemd for that.

That's why you would need to fix such systemd units,
but that's also why I added the additional _monitor loops
after systemctl start / stop.

Maybe it should not be named systemd, but systemd-wrapper.

Other comments?

Lars


So here is my RFC,
tested only "manually" via

for x in monitor stop monitor start monitor ; do
  for try in 1 2; do
OCF_ROOT=/usr/lib/ocf \
OCF_RESKEY_monitor_hook=/usr/local/bin/my-monitoring-hook \
OCF_RESKEY_unit=postfix@- ./systemd $x ; echo $try. $x $?
  done
done

-- /usr/local/bin/my-monitoring-hook 
#!/bin/sh
echo quit | nc 127.0.0.1 25  2>/dev/null | grep -q ^220 || exit 7

- /usr/lib/ocf/resource.d/pacemaker/systemd -
#!/bin/bash

: ${OCF_FUNCTIONS=${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs}
. ${OCF_FUNCTIONS}
: ${__OCF_ACTION=$1}


meta_data() {
cat <


1.0


This Resource Agent delegates start and stop to systemctl start and stop,
but monitor will in addition to systemctl status also run the monitor_hook you 
specify.

systemd service with monitor hook




What systemd unit to manage.

systemd unit






What executable to run in addition to systemctl status.

monitor hook













END
}

_monitor()
{
local ex check
if [[ -n "$OCF_RESKEY_monitor_hook" ]] &&
   [[ -x "$OCF_RESKEY_monitor_hook" ]]; then
"$OCF_RESKEY_monitor_hook"
ex=$?
:  $__OCF_ACTION/$ex 
case $__OCF_ACTION/$ex in
stop/7) : "not running after stop: expected" ;;
stop/*) ocf_exit_reason "returned exit code $ex after stop: 
$OCF_RESKEY_monitor_hook" ;;
start/0) : "running after start: expected";;
start/*) ocf_exit_reason "returned exit code $ex after start: 
$OCF_RESKEY_monitor_hook" ;;
monitor/0|monitor/7) : "expected running (0) or not running 
(7)" ;;
monitor/*)
ocf_exit_reason "returned exit code $ex during monitor: 
$OCF_RESKEY_monitor_hook" ;;
esac
return $ex
else
ocf_exit_reason "missing or not executable: 
$OCF_RESKEY_monitor_hook"
fi
return $OCF_ERR_GENERIC
}

case $__OCF_ACTION in
meta-data) meta_data ;;
validate-all) : "Tbd. Maybe." ;;
stop)   systemctl stop $OCF_RESKEY_unit || exit $OCF_ERR_GENERIC
# TODO make time/retries of monitor after stop configurable
while _monitor; do sleep 1; done
exit $OCF_SUCCESS
;;
start)  systemctl start $OCF_RESKEY_unit || exit $OCF_ERR_GENERIC
# TODO make time/retries of monitor after start configurable
while ! _monitor; do sleep 1; done
exit $OCF_SUCCESS
;;
monitor)
systemctl is-active --quiet $OCF_RESKEY_unit || exit $OCF_NOT_RUNNING
_monitor
;;
*)
ocf_exit_reason "not implemented: $__OCF_ACTION"
exit $OCF_ERR_GENERIC
esac

exit $?

___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


[ClusterLabs Developers] libqb: Re: 633f262 logging: Remove linker 'magic' and just use statics for logging callsites (#322)

2019-01-16 Thread Lars Ellenberg
Back then when this "dynamic" logging was introduced,
I thought the whole point was that "quiet" callsites
are "cheap".

So I think you want to
qb_log_callsite_get() only *once* per callsite,
assign that to a static pointer (as you say in the commit message).
And do the actual qb_log_real_() function call
only conditionally based on if (cs->targets). 

That way, a disabled trace logging boils down to a
if (cs && cs->targets)
;
Much cheaper than what you have now.

Or was always calling into both qb_log_callsite_get() and qb_log_real_()
intentional for some obscure (to me) reason,
even for disabled call sites?

Below I also added a test for (cs->tags & QB_LOG_TAG_LIBQB_MSG),
in case someone used qb_util_set_log_function().
If that special tag flag could be folded into cs->targets (e.g. bit 0),
I'd like it even better.

Cheers,
Lars


diff --git a/include/qb/qblog.h b/include/qb/qblog.h
index 1943b94..f63f4ad 100644
--- a/include/qb/qblog.h
+++ b/include/qb/qblog.h
@@ -485,11 +485,17 @@ void qb_log_from_external_source_va(const char *function,
qb_log_real_(&descriptor, ##args);  \
 } while(0)
 #else
-#define qb_logt(priority, tags, fmt, args...) do { \
-   struct qb_log_callsite* descriptor_pt = \
-   qb_log_callsite_get(__func__, __FILE__, fmt,\
-   priority, __LINE__, tags);  \
-   qb_log_real_(descriptor_pt, ##args);\
+#define qb_logt(priority, tags, fmt, args...) do { \
+   static struct qb_log_callsite* descriptor_pt;   \
+   if (!descriptor_pt) {   \
+   descriptor_pt = \
+   qb_log_callsite_get(__func__, __FILE__, fmt,\
+   priority, __LINE__, tags);  \
+   }   \
+   if (descriptor_pt->targets ||   \
+   qb_bit_is_set(descriptor_pt->tags,  \
+   QB_LOG_TAG_LIBQB_MSG_BIT))  \
+   qb_log_real_(descriptor_pt, ##args);\
 } while(0)
 #endif /* QB_HAVE_ATTRIBUTE_SECTION */
___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] libqb: Re: 633f262 logging: Remove linker 'magic' and just use statics for logging callsites (#322)

2019-01-16 Thread Lars Ellenberg
On Wed, Jan 16, 2019 at 03:44:22PM +0100, Lars Ellenberg wrote:
> Back then when this "dynamic" logging was introduced,
> I thought the whole point was that "quiet" callsites
> are "cheap".
> 
> So I think you want to
> qb_log_callsite_get() only *once* per callsite,
> assign that to a static pointer (as you say in the commit message).
> And do the actual qb_log_real_() function call
> only conditionally based on if (cs->targets). 
> 
> That way, a disabled trace logging boils down to a
> if (cs && cs->targets)
> ;
> Much cheaper than what you have now.
> 
> Or was always calling into both qb_log_callsite_get() and qb_log_real_()
> intentional for some obscure (to me) reason,
> even for disabled call sites?
> 
> Below I also added a test for (cs->tags & QB_LOG_TAG_LIBQB_MSG),
> in case someone used qb_util_set_log_function().
> If that special tag flag could be folded into cs->targets (e.g. bit 0),
> I'd like it even better.
> 
> Cheers,
> Lars
> 
> 
> diff --git a/include/qb/qblog.h b/include/qb/qblog.h
> index 1943b94..f63f4ad 100644

Oops, that patch was against *before* the 633f262 commit :-)
and that's why I did not notice the clash in macro argument "tags"
and struct member "tags" when compile testing...
I forgot I jumped between checkouts :-(

anyways, the (now even make check testet)
patch for *after* 633f262 commit:

diff --git a/include/qb/qblog.h b/include/qb/qblog.h
index 31981b8..ae1d25c 100644
--- a/include/qb/qblog.h
+++ b/include/qb/qblog.h
@@ -340,11 +340,17 @@ void qb_log_from_external_source_va(const char *function,
  * @param fmt usual printf style format specifiers
  * @param args usual printf style args
  */
-#define qb_logt(priority, tags, fmt, args...) do { \
-   struct qb_log_callsite* descriptor_pt = \
-   qb_log_callsite_get(__func__, __FILE__, fmt,\
-   priority, __LINE__, tags);  \
-   qb_log_real_(descriptor_pt, ##args);\
+#define qb_logt(priority, tags_, fmt, args...) do {\
+   static struct qb_log_callsite* descriptor_pt;   \
+   if (!descriptor_pt) {   \
+   descriptor_pt = \
+   qb_log_callsite_get(__func__, __FILE__, fmt,\
+   priority, __LINE__, tags_); \
+   }   \
+   if (descriptor_pt && (descriptor_pt->targets || \
+   qb_bit_set(descriptor_pt->tags, \
+   QB_LOG_TAG_LIBQB_MSG_BIT))) \
+   qb_log_real_(descriptor_pt, ##args);\
 } while(0)
 
___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] libqb: Re: 633f262 logging: Remove linker 'magic' and just use statics for logging callsites (#322)

2019-01-18 Thread Lars Ellenberg
On Thu, Jan 17, 2019 at 09:09:11AM +1100, Andrew Beekhof wrote:
> 
> 
> > On 17 Jan 2019, at 2:59 am, Ken Gaillot  wrote:
> > 
> > I'm not familiar with the reasoning for the current setup, but
> > pacemaker's crm_crit(), crm_error(), etc. use qb_logt(), while
> > crm_debug() and crm_trace() (which won't be used in ordinary runs) do
> > something similar to what you propose.
> > 
> > Pacemaker has about 1,700 logging calls that would be affected (not
> > counting another 2,000 debug/trace). Presumably that means Pacemaker
> > currently has about +16KB of memory overhead and binary size for
> > debug/trace logging static pointers, and that would almost double using
> > them for all logs. Not a big deal today? Or meaningful in an embedded
> > context?
> > 
> > Not sure if that overhead vs runtime trade-off is the original
> > motivation or not, but that's the first thing that comes to mind.
> 
> I believe my interest was the ability to turn them on dynamically in a
> running program (yes, i used it plenty back in the day) and have the
> overhead be minimal for the normal case when they weren't in use.

Also, with libqb before the commit mentioned in the subject (633f262)
and that is what pacemaker is using right now,
you'd get one huge static array of "struct callsites" (in a special
linker section; that's the linker magic that patch removes).

Note: the whole struct was statically allocated,
it is an array of structs, not just an array of pointers.

sizeof(struct qb_log_callsite) is 40

Now, those structs get dynamically allocated,
and put in some lineno based lookup hash.
(so already at least additional 16 bytes),
not counting malloc overhead for all the tiny objects.

The additional 8 byte static pointer
is certainly not "doubling" that overhead.

But can be used to skip the additional lookup,
sprintf, memcpy and whatnot, and even the function call,
if the callsite at hand is currently disabled,
which is probably the case for most >= trace
callsites most of the time.

Any volunteers to benchmark the cpu usage?
I think we'd need
(trace logging: {enabled, disabled})
x ({before 633f262,
after 633f262,
after 633f262 + lars patch})

BTW,
I think without the "linker magic"
(static array of structs),
the _log_filter_apply() becomes a no-op?
That's qb_log_filter_ctl2() at runtime.
It would have to iterate over all the collision lists in all the buckets
of the dynamically allocated callsites, instead of iterating the
(now non-existing) static array of callsites.

One side-effect of not using a static pointer,
but *always* doing the lookup (qb_log_calsite_get()) again,
is that a potentially set _custom_filter_fn() would be called
and that filter applied to the callsite, at each invocation.

But I don't think that that is intentional?

Anyways.
"just saying" :-)

Cheers,

Lars

___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


[ClusterLabs Developers] strange migration-threshold overflow, and fail-count update aborting it's own recovery transition

2019-04-05 Thread Lars Ellenberg


As mentioned in #clusterlabs,
but I think I post it here, so it won't get lost:

pacemaker 1.1.19, in case that matters.

"all good".

provoking resource monitoring failure
(manually umount of some file system)

monitoring failure triggers pengine run,
(input here is no fail-count in status section yet,
but failed monitoring operation)
results in "Recovery" transition.


which is then aborted by the fail-count=1 update of the very failure
that this recovery transition is about.

Meanwhile, the "stop" operation was already scheduled,
and results in "OK", so the second pengine run
now has as input a fail-count=1, and a stopped resource.

The second pengine run would usually come to the same result,
minus already completed actions, and no-one would notice.
I assume it has been like that for a long time?

But in this case, someone tried to be smart
and set a migration-threshold of "very large",
in this case the string in xml was: , 
and that probably is "parsed" into some negative value,
which means the fail-count=1 now results in "forcing away ...",
different resource placements,
and the file system placement elsewhere now results in much more
actions, demoting/role changes/movement of other dependent resources ...


So I think we have two issues here:

a) I think the fail-count update should be visible as input in the cib
   before the pengine calculates the recovery transition.

b) migration-theshold (and possibly other scores) should be properly
   parsed/converted/capped/scaled/rejected


What do you think?

"someone" probably finds the relevant lines of code faster than I do ;-)

Cheers,

Lars

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs Developers] strange migration-threshold overflow, and fail-count update aborting it's own recovery transition

2019-04-05 Thread Lars Ellenberg
On Fri, Apr 05, 2019 at 09:56:51AM -0500, Ken Gaillot wrote:
> On Fri, 2019-04-05 at 09:44 -0500, Ken Gaillot wrote:
> > On Fri, 2019-04-05 at 15:50 +0200, Lars Ellenberg wrote:
> > > As mentioned in #clusterlabs,
> > > but I think I post it here, so it won't get lost:
> > > 
> > > pacemaker 1.1.19, in case that matters.
> > > 
> > > "all good".
> > > 
> > > provoking resource monitoring failure
> > > (manually umount of some file system)
> > > 
> > > monitoring failure triggers pengine run,
> > > (input here is no fail-count in status section yet,
> > > but failed monitoring operation)
> > > results in "Recovery" transition.
> 
> Ah, I misunderstood ... I was thinking the two writes were the fail
> count and the last failure time node attributes, but that's not it (and
> probably already handled sufficiently).
> 
> The two writes are the recording of the operation status result,
> followed by the recording of the fail count and last failure time node
> attributes.
> 
> That's more challenging. The controller needs to see the status result
> to determine whether it was a failure or not, at which point it sends
> the fail count change to the attribute manager, which writes it to the
> CIB, and the controller can only act on it then.
> 
> A possible solution would be to skip running the scheduler for the
> status result if we're sending a fail count change. But that's risky
> because it assumes everything involved in the fail count change will
> succeed, and in a reasonable time. That's why the current approach is
> to run the scheduler immediately and then re-run when the fail-count
> change is confirmed.

I see.  So there is that.
We can live with that, I guess,
it is very unlikely to cause issues "in the real world".

> > > which is then aborted by the fail-count=1 update of the very
> > > failure
> > > that this recovery transition is about.
> > > 
> > > Meanwhile, the "stop" operation was already scheduled,
> > > and results in "OK", so the second pengine run
> > > now has as input a fail-count=1, and a stopped resource.
> > > 
> > > The second pengine run would usually come to the same result,
> > > minus already completed actions, and no-one would notice.
> > > I assume it has been like that for a long time?
> > 
> > Yep, that's how it's always worked
> > 
> > > But in this case, someone tried to be smart
> > > and set a migration-threshold of "very large",
> > > in this case the string in xml was: , 
> > > and that probably is "parsed" into some negative value,
> > 
> > Anything above "INFINITY" (actually 1,000,000) should be mapped to
> > INFINITY. If that's not what happens, there's a bug. Running
> > crm_simulate in verbose mode should be helpful.

I think I found it already.

char2score() does crm_parse_int(),
and reasonably assumes that the result is the parsed int.
Which it is not, if the result is -1, and errno is set to EINVAL or
ERANGE ;-)

 char2score -> crm_parse_int 
 ""  -> result of strtoll is > INT_MAX,
 result -1, errno ERANGE
 migration_threshold = -1;

Not sure what to do there, though.
Yet an other helper,
mapping ERANGE to appropriate MIN/MAX for the conversion?

But any "sane" configuration would not even trigger that.
Where and how would we point out the "in-sane-ness" to the user, though?

> > > which means the fail-count=1 now results in "forcing away ...",
> > > different resource placements,
> > > and the file system placement elsewhere now results in much more
> > > actions, demoting/role changes/movement of other dependent
> > > resources
> > > ...
> > > 
> > > 
> > > So I think we have two issues here:
> > > 
> > > a) I think the fail-count update should be visible as input in the
> > > cib
> > >before the pengine calculates the recovery transition.
> > 
> > Agreed, but that's a nontrivial change to the current design.
> > 
> > Currently, once the controller detects a failure, it sends a request
> > to the attribute manager to update the fail-count attribute, then
> > immediately sends another request to update the last-failure time.
> > The attribute manager writes these out to the CIB as it gets them,
> > the CIB notifies the controller as each write completes, and the
> &g