Re: [ClusterLabs] multiple action= lines sent to STDIN of fencing agents - why?

2015-10-15 Thread Jan Pokorný
On 15/10/15 12:25 +0100, Adam Spiers wrote:
> I inserted some debugging into fencing.py and found that stonithd
> sends stuff like this to STDIN of the fencing agents it forks:
> 
> action=list
> param1=value1
> param2=value2
> param3=value3
> action=list
> 
> where paramX and valueX come from the configuration of the primitive
> for the fencing agent.
> 
> As a corollary, if the primitive for the fencing agent has 'action'
> defined as one of its parameters, this means that there will be three
> 'action=' lines, and the middle one could have a different value to
> the two sandwiching it.
> 
> When I first saw this, I had an extended #wtf moment and thought it
> was a bug.  But on closer inspection, it seems very deliberate, e.g.
> 
>   
> https://github.com/ClusterLabs/pacemaker/commit/bfd620645f151b71fafafa279969e9d8bd0fd74f
> 
> The "regardless of what the admin configured" comment suggests to me
> that there is an underlying assumption that any fencing agent will
> ensure that if the same parameter is duplicated on STDIN, the final
> value will override any previous ones.  And indeed fencing.py ensures
> this, but presumably it is possible to write agents which don't use
> fencing.py.
> 
> Is my understanding correct?  If so:
> 
> 1) Is the first 'action=' line intended in order to set some kind of
>default action, in the case that the admin didn't configure the
>primitive with an 'action=' parameter *and* _action wasn't one of
>list/status/monitor/metadata?  In what circumstances would this
>happen?
> 
> 2) Is this assumption of the agents always being order-sensitive
>(i.e. last value always wins) documented anywhere?  The best
>documentation on the API I could find was here:
> 
>   https://fedorahosted.org/cluster/wiki/FenceAgentAPI
> 
>but it doesn't mention this.

Agreed that this implicit and undocumented arrangement smells fishy.

In fact I raised this point within RH team 3 month back but there
wasn't any feedback at that time.  Haven't filed a bug because it's
unclear to me what's intentional and what's not between the two
interacting sides.  Back then, I was proposing that fencing.py should,
at least, emit a message about subsequent parameter overrides.

Note that "port" parameter (or whatever is the correct way to specify
a victim in the particular case) is affected as well.

-- 
Jan (Poki)


pgpYNZysQmwGG.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] multiple action= lines sent to STDIN of fencing agents - why?

2015-10-15 Thread Ken Gaillot
On 10/15/2015 06:25 AM, Adam Spiers wrote:
> I inserted some debugging into fencing.py and found that stonithd
> sends stuff like this to STDIN of the fencing agents it forks:
> 
> action=list
> param1=value1
> param2=value2
> param3=value3
> action=list
> 
> where paramX and valueX come from the configuration of the primitive
> for the fencing agent.
> 
> As a corollary, if the primitive for the fencing agent has 'action'
> defined as one of its parameters, this means that there will be three
> 'action=' lines, and the middle one could have a different value to
> the two sandwiching it.
> 
> When I first saw this, I had an extended #wtf moment and thought it
> was a bug.  But on closer inspection, it seems very deliberate, e.g.
> 
>   
> https://github.com/ClusterLabs/pacemaker/commit/bfd620645f151b71fafafa279969e9d8bd0fd74f
> 
> The "regardless of what the admin configured" comment suggests to me
> that there is an underlying assumption that any fencing agent will
> ensure that if the same parameter is duplicated on STDIN, the final
> value will override any previous ones.  And indeed fencing.py ensures
> this, but presumably it is possible to write agents which don't use
> fencing.py.
> 
> Is my understanding correct?  If so:

Yes, good sleuthing.

> 1) Is the first 'action=' line intended in order to set some kind of
>default action, in the case that the admin didn't configure the
>primitive with an 'action=' parameter *and* _action wasn't one of
>list/status/monitor/metadata?  In what circumstances would this
>happen?

The first action line is usually the only one.

Ideally, admins don't configure "action" as a parameter of a fence
device. They either specify nothing (in which case the cluster does what
it thinks should be done -- reboot, off, etc.), or they specify
pcmk_*_action to override the cluster's choice. For example,
pcmk_reboot_action=off tells the cluster to actually send the fence
agent "action=off" when a reboot is desired. (Perhaps the admin prefers
that flaky nodes stay down until investigated, or the fence device
doesn't handle reboots well.)

So the first action line is the result of that. If the admin configured
a pcmk_*_action for the requested action, the agent will get that,
otherwise it gets the requested action.

Second, any parameters in the device configuration are copied to the
agent. So if the admin did specify "action" there, it will get copied
(as a second instance, possibly different from the first).

But that would override *all* requested actions, which is a bad idea. No
one wants a recurring monitor action to shoot a node! :) So that last
step is a failsafe, if the admin did supply an "action", re-send the
original line if the requested action was informational
(list/status/monitor/metadata) and not a "real" fencing action (off/reboot).

> 2) Is this assumption of the agents always being order-sensitive
>(i.e. last value always wins) documented anywhere?  The best
>documentation on the API I could find was here:
> 
>   https://fedorahosted.org/cluster/wiki/FenceAgentAPI
> 
>but it doesn't mention this.

Good point. It would be a good idea to add that to the API since it's
established practice, but it probably would also be a good idea for
pacemaker to send only the final value of any parameter.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] heartbeat/pacemaker: stonith-ng does a broadcast that doesnt reach it back resulting stonith_async_timeout_handler() called

2015-10-15 Thread Shyam
Hi Dejan,

Thanks a lot for your input!

I cherry picked this commit & this solves the problem. I will raise a
ubuntu launchpad bug for them to pull this correction in trusty stable.
Thanks!

--Shyam

On Thu, Oct 15, 2015 at 1:21 PM, Dejan Muhamedagic 
wrote:

> Hi,
>
> On Wed, Oct 14, 2015 at 10:31:40PM +0530, Shyam wrote:
> > Hi all,
> >
> > We were previously using Pacemaker 1.1.6 on Ubuntu Precise. We are
> shifting
> > to Ubuntu Trusty & it has Pacemaker 1.1.10.
> >
> > We have our own STONITH resource & with the pacemaker upgrade this is
> > broken. The problem that I am hitting is not something new & is exactly
> > that was reported in
> > http://clusterlabs.org/pipermail/users/2015-June/000686.html
> >
> > Basically Hearbeat + Pacemaker stack suffers this problem (even with the
> > github mainline code) & since a lot of people are using corosync I think
> > the problem is not felt with.
> >
> > The basic problem is this. Between Pacemaker 1.1.6 to 1.1.10, the
> following
> > change was done
> >
> > fencing/remote.c
> >
> > static void remote_op_done(remote_fencing_op_t * op, xmlNode * data, int
> > rc, int dup)
> >
> > if (dup == FALSE && safe_str_neq(subt, "broadcast")) {
> > /* Defer notification until the bcast message arrives */
> > bcast_result_to_peers(op, rc);
> > goto remote_op_done_cleanup;
> >
> > remote_op_done() previously used to send a direct reply. Now it was
> changed
> > that it does a broadcast of the result & this broadcast is supposed to
> come
> > back to the same fencing agent upon which finishes the operation. However
> > this broadcast that is being sent by stonithd doesnt reach back itself.
> > This causes eventually crmd to timeout on stonith_async_timeout_handler()
> > to be triggered & keep retrying STONITH. so essentially STONITH is
> broken.
>
> I can vaguelly recall some stonith/heartbeat issue which got
> fixed by Lars:
>
> commit cc34288a7b2276aa238546f4aa92fa79b8cbcf88
> Author: Lars Ellenberg 
> Date:   Tue Dec 9 10:48:33 2014 +0100
>
> Sounds like the same problem.
>
> Thanks,
>
> Dejan
>
>
> > I tried setting up bcast in /etc/ha.d/ha.cf, but it doesnt help. To
> confirm
> > the issue, I removed the above flow for bcast_result_to_peers() & then
> > STONITH works perfectly fine. Any pointers/help appreciated on how to
> > resolve this issue in the right way?
> >
> >
> > From logs
> >
> > CRMD issues STONITH reboot to stonith-ng
> >
> > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG: Dumping message
> with
> > 19 fields
> > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[0] :
> > [__name__=stonith_command]
> > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[1] :
> > [__name__=stonith_command]
> > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[2] : [t=stonith-ng]
> > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[3] :
> > [st_async_id=617dce02-057f-42b8-9df2-33bc4e64780d]
> > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[4] :
> [st_op=st_fence]
> > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[5] : [st_callid=2]
> > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[6] : [st_callopt=0]
> > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[7] :
> > [st_remote_op=617dce02-057f-42b8-9df2-33bc4e64780d]
> > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[8] :
> [st_target=node1]
> > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[9] :
> > [st_device_action=reboot]
> > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[10] :
> [st_origin=node0]
> > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[11] :
> > [st_clientid=346fea6b-a55d-4873-b630-14287d5bc71e]
> > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[12] :
> > [st_clientname=crmd.14483]
> > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[13] :
> [st_timeout=90]
> > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[14] :
> [st_mode=smart]
> > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[15] : [dest=node0]
> > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[16] : [oseq=2]
> > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[17] :
> > [from_id=stonith-ng]
> > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[18] :
> > [to_id=stonith-ng]
> >
> >
> > stonith-ng successfully finishes the STONITH
> > Oct 14 14:57:03 node0 stonith-ng[14481]:   notice: log_operation:
> Operation
> > 'reboot' [14657] (call 2 from crmd.14483) for host 'node1' with device
> > 'Z-FENCE:0' returned: 0 (OK)
> > Oct 14 14:57:03 node0 stonith-ng[14481]:debug:
> > stonith_send_async_reply: Directed reply to node0
> > Oct 14 14:57:03 node0 stonith-ng[14481]:debug: send_ha_message:
> > outbound> st_device_id="Z-FENCE:0"
> > st_remote_op="617dce02-057f-42b8-9df2-33bc4e64780d"
> > st_clientid="346fea6b-a55d-4873-b630-14287d5bc71e"
> > st_clientname="crmd.14483" st_target="node1" st_device_action="reboot"
> > st_callid="2" st_callopt="0" st_rc="0" st_output="Performing: