Re: [ClusterLabs] Early VM resource migration

2015-12-17 Thread Klechomir
Hi Ken,

I've tried with and without colocation. The rule was:
colocation bla2 inf: VM_VM1 AA_Filesystem_CDrive1

In both cases the VM_VM1 tries to live migrate back the coming after standby 
node while cloned AA_Filesystem_CDrive1 isn't up on it yet.
Same result with pacemaker 1.14-rc2

Regards,

On 16.12.2015 11:08:35 Ken Gaillot wrote:
> On 12/16/2015 10:30 AM, Klechomir wrote:
> > On 16.12.2015 17:52, Ken Gaillot wrote:
> >> On 12/16/2015 02:09 AM, Klechomir wrote:
> >>> Hi list,
> >>> I have a cluster with VM resources on a cloned active-active storage.
> >>> 
> >>> VirtualDomain resource migrates properly during failover (node standby),
> >>> but tries to migrate back too early, during failback, ignoring the
> >>> "order" constraint, telling it to start when the cloned storage is up.
> >>> This causes unnecessary VM restart.
> >>> 
> >>> Is there any way to make it wait, until its storage resource is up?
> >> 
> >> Hi Klecho,
> >> 
> >> If you have an order constraint, the cluster will not try to start the
> >> VM until the storage resource agent returns success for its start. If
> >> the storage isn't fully up at that point, then the agent is faulty, and
> >> should be modified to wait until the storage is truly available before
> >> returning success.
> >> 
> >> If you post all your constraints, I can look for anything that might
> >> affect the behavior.
> > 
> > Thanks for the reply, Ken
> > 
> > Seems to me that that the constraints for a cloned resources act a a bit
> > different.
> > 
> > Here is my config:
> > 
> > primitive p_AA_Filesystem_CDrive1 ocf:heartbeat:Filesystem \
> > 
> > params device="/dev/CSD_CDrive1/AA_CDrive1"
> > 
> > directory="/volumes/AA_CDrive1" fstype="ocfs2" options="rw,noatime"
> > primitive VM_VM1 ocf:heartbeat:VirtualDomain \
> > 
> > params config="/volumes/AA_CDrive1/VM_VM1/VM1.xml"
> > 
> > hypervisor="qemu:///system" migration_transport="tcp" \
> > 
> > meta allow-migrate="true" target-role="Started"
> > 
> > clone AA_Filesystem_CDrive1 p_AA_Filesystem_CDrive1 \
> > 
> > meta interleave="true" resource-stickiness="0"
> > 
> > target-role="Started"
> > order VM_VM1_after_AA_Filesystem_CDrive1 inf: AA_Filesystem_CDrive1 VM_VM1
> > 
> > Every time when a node comes back from standby, the VM tries to live
> > migrate to it long before the filesystem is up.
> 
> In most cases (including this one), when you have an order constraint,
> you also need a colocation constraint.
> 
> colocation = two resources must be run on the same node
> 
> order = one resource must be started/stopped/whatever before another
> 
> Or you could use a group, which is essentially a shortcut for specifying
> colocation and order constraints for any sequence of resources.
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Early VM resource migration

2015-12-17 Thread Klechomir
Hi Ulrich,
This is only a part of the config, which concerns the problem.
Even with dummy resources, the behaviour will be identical, so don't think 
that dlm/clvmd res. config will help solving the problem.

Regards,
KIecho

On 17.12.2015 08:19:43 Ulrich Windl wrote:
> >>> Klechomir  schrieb am 16.12.2015 um 17:30 in Nachricht
> 
> <5671918e.40...@gmail.com>:
> > On 16.12.2015 17:52, Ken Gaillot wrote:
> >> On 12/16/2015 02:09 AM, Klechomir wrote:
> >>> Hi list,
> >>> I have a cluster with VM resources on a cloned active-active storage.
> >>> 
> >>> VirtualDomain resource migrates properly during failover (node standby),
> >>> but tries to migrate back too early, during failback, ignoring the
> >>> "order" constraint, telling it to start when the cloned storage is up.
> >>> This causes unnecessary VM restart.
> >>> 
> >>> Is there any way to make it wait, until its storage resource is up?
> >> 
> >> Hi Klecho,
> >> 
> >> If you have an order constraint, the cluster will not try to start the
> >> VM until the storage resource agent returns success for its start. If
> >> the storage isn't fully up at that point, then the agent is faulty, and
> >> should be modified to wait until the storage is truly available before
> >> returning success.
> >> 
> >> If you post all your constraints, I can look for anything that might
> >> affect the behavior.
> > 
> > Thanks for the reply, Ken
> > 
> > Seems to me that that the constraints for a cloned resources act a a bit
> > different.
> > 
> > Here is my config:
> > 
> > primitive p_AA_Filesystem_CDrive1 ocf:heartbeat:Filesystem \
> > 
> >  params device="/dev/CSD_CDrive1/AA_CDrive1"
> > 
> > directory="/volumes/AA_CDrive1" fstype="ocfs2" options="rw,noatime"
> > primitive VM_VM1 ocf:heartbeat:VirtualDomain \
> > 
> >  params config="/volumes/AA_CDrive1/VM_VM1/VM1.xml"
> > 
> > hypervisor="qemu:///system" migration_transport="tcp" \
> > 
> >  meta allow-migrate="true" target-role="Started"
> > 
> > clone AA_Filesystem_CDrive1 p_AA_Filesystem_CDrive1 \
> > 
> >  meta interleave="true" resource-stickiness="0"
> > 
> > target-role="Started"
> > order VM_VM1_after_AA_Filesystem_CDrive1 inf: AA_Filesystem_CDrive1 VM_VM1
> > 
> > Every time when a node comes back from standby, the VM tries to live
> > migrate to it long before the filesystem is up.
> 
> Hi!
> 
> To me your config looks rather incomplete: What about DLM, O2CB, cLVM, etc.?
> >> ___
> >> Users mailing list: Users@clusterlabs.org
> >> http://clusterlabs.org/mailman/listinfo/users
> >> 
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> > 
> > ___
> > Users mailing list: Users@clusterlabs.org
> > http://clusterlabs.org/mailman/listinfo/users
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Re: Early VM resource migration

2015-12-17 Thread Klechomir
Here is what pacemaker says right after node1 comes back after standby:

Dec 16 16:11:41 [4512] CLUSTER-2pengine:debug: native_assign_node: 
All nodes for resource VM_VM1 are unavailable, unclean or shutting down 
(CLUSTER-1: 1, -100)

Dec 16 16:11:41 [4512] CLUSTER-2pengine:debug: native_assign_node:  
Could not allocate a node for VM_VM1

Dec 16 16:11:41 [4512] CLUSTER-2pengine:debug: native_assign_node:  
Processing VM_VM1_monitor_1

Dec 16 16:11:41 [4512] CLUSTER-2pengine: info: native_color:
Resource VM_VM1 cannot run anywhere



VM_VM1 gets immediately stopped as soon as node1 re-appears and stays down 
until its "order/colocation AA resource" comes up on node1.

The curious part is that in the opposite case (node2 comes from standby), the 
failback is ok.

Regards,

On 17.12.2015 14:51:21 Ulrich Windl wrote:
> >>> Klechomir  schrieb am 17.12.2015 um 14:16 in Nachricht
> 
> <2102747.TPh6pTdk8c@bobo>:
> > Hi Ulrich,
> > This is only a part of the config, which concerns the problem.
> > Even with dummy resources, the behaviour will be identical, so don't think
> > that dlm/clvmd res. config will help solving the problem.
> 
> You could send logs with the actual startup sequence then.
> 
> > Regards,
> > KIecho
> > 
> > On 17.12.2015 08:19:43 Ulrich Windl wrote:
> >> >>> Klechomir  schrieb am 16.12.2015 um 17:30 in
> >> >>> Nachricht
> >> 
> >> <5671918e.40...@gmail.com>:
> >> > On 16.12.2015 17:52, Ken Gaillot wrote:
> >> >> On 12/16/2015 02:09 AM, Klechomir wrote:
> >> >>> Hi list,
> >> >>> I have a cluster with VM resources on a cloned active-active storage.
> >> >>> 
> >> >>> VirtualDomain resource migrates properly during failover (node
> >> >>> standby),
> >> >>> but tries to migrate back too early, during failback, ignoring the
> >> >>> "order" constraint, telling it to start when the cloned storage is
> >> >>> up.
> >> >>> This causes unnecessary VM restart.
> >> >>> 
> >> >>> Is there any way to make it wait, until its storage resource is up?
> >> >> 
> >> >> Hi Klecho,
> >> >> 
> >> >> If you have an order constraint, the cluster will not try to start the
> >> >> VM until the storage resource agent returns success for its start. If
> >> >> the storage isn't fully up at that point, then the agent is faulty,
> >> >> and
> >> >> should be modified to wait until the storage is truly available before
> >> >> returning success.
> >> >> 
> >> >> If you post all your constraints, I can look for anything that might
> >> >> affect the behavior.
> >> > 
> >> > Thanks for the reply, Ken
> >> > 
> >> > Seems to me that that the constraints for a cloned resources act a a
> >> > bit
> >> > different.
> >> > 
> >> > Here is my config:
> >> > 
> >> > primitive p_AA_Filesystem_CDrive1 ocf:heartbeat:Filesystem \
> >> > 
> >> >  params device="/dev/CSD_CDrive1/AA_CDrive1"
> >> > 
> >> > directory="/volumes/AA_CDrive1" fstype="ocfs2" options="rw,noatime"
> >> > primitive VM_VM1 ocf:heartbeat:VirtualDomain \
> >> > 
> >> >  params config="/volumes/AA_CDrive1/VM_VM1/VM1.xml"
> >> > 
> >> > hypervisor="qemu:///system" migration_transport="tcp" \
> >> > 
> >> >  meta allow-migrate="true" target-role="Started"
> >> > 
> >> > clone AA_Filesystem_CDrive1 p_AA_Filesystem_CDrive1 \
> >> > 
> >> >  meta interleave="true" resource-stickiness="0"
> >> > 
> >> > target-role="Started"
> >> > order VM_VM1_after_AA_Filesystem_CDrive1 inf: AA_Filesystem_CDrive1
> >> > VM_VM1
> >> > 
> >> > Every time when a node comes back from standby, the VM tries to live
> >> > migrate to it long before the filesystem is up.
> >> 
> >> Hi!
> >> 
> >> To me your config looks rather incomplete: What about DLM, O2CB, cLVM,
> >> etc.?>> 
> >> >> ___
> >> >> Users mailing list: Users@clusterlabs.org
> >> >> http://clusterlabs.org/mailman/listinfo/users
> >> >> 
> >> >> Project Home: http://www.clusterlabs.org
> >> >> Getting started:
> >> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> >> Bugs: http://bugs.clusterlabs.org
> >> > 
> >> > ___
> >> > Users mailing list: Users@clusterlabs.org
> >> > http://clusterlabs.org/mailman/listinfo/users
> >> > 
> >> > Project Home: http://www.clusterlabs.org
> >> > Getting started:
> >> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> > Bugs: http://bugs.clusterlabs.org
> >> 
> >> ___
> >> Users mailing list: Users@clusterlabs.org
> >> http://clusterlabs.org/mailman/listinfo/users
> >> 
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> > 
> > ___
> > Users mailing list: Users@clusterlabs.org
> > http://clusterlabs.org/mailman/listinfo/users
> > 
> > Project Home: 

[ClusterLabs] successful ipmi stonith still times out

2015-12-17 Thread Ron Kerry
I have a customer (running SLE 11 SP4 HAE) who is seeing the following 
stonith behavior running the ipmi stonith plugin.


Dec 15 14:21:43 test4 pengine[24002]:  warning: pe_fence_node: Node 
test3 will be fenced because termination was requested
Dec 15 14:21:43 test4 pengine[24002]:  warning: determine_online_status: 
Node test3 is unclean
Dec 15 14:21:43 test4 pengine[24002]:  warning: stage6: Scheduling Node 
test3 for STONITH


... it issues the reset and it is noted ...
Dec 15 14:21:45 test4 external/ipmi(STONITH-test3)[177184]: [177197]: 
debug: ipmitool output: Chassis Power Control: Reset
Dec 15 14:21:46 test4 stonith-ng[23999]:   notice: log_operation: 
Operation 'reboot' [177179] (call 2 from crmd.24003) for host 'test3' 
with device 'STONITH-test3' returned: 0 (OK)


... test3 does go down ...
Dec 15 14:22:21 test4 kernel: [90153.906461] Cell 2 (test3) left the 
membership


... but the stonith operation times out (it said OK earlier) ...
Dec 15 14:22:56 test4 stonith-ng[23999]:   notice: remote_op_timeout: 
Action reboot (a399a8cb-541a-455e-8d7c-9072d48667d1) for test3 
(crmd.24003) timed out
Dec 15 14:23:05 test4 external/ipmi(STONITH-test3)[177667]: [177678]: 
debug: ipmitool output: Chassis Power is on


Dec 15 14:23:56 test4 crmd[24003]:error: 
stonith_async_timeout_handler: Async call 2 timed out after 132000ms
Dec 15 14:23:56 test4 crmd[24003]:   notice: tengine_stonith_callback: 
Stonith operation 2/51:100:0:f43dc87c-faf0-4034-8b51-be0c13c95656: Timer 
expired (-62)
Dec 15 14:23:56 test4 crmd[24003]:   notice: tengine_stonith_callback: 
Stonith operation 2 for test3 failed (Timer expired): aborting transition.
Dec 15 14:23:56 test4 crmd[24003]:   notice: abort_transition_graph: 
Transition aborted: Stonith failed (source=tengine_stonith_callback:697, 0)


This looks like a bug but a quick search did not turn up anything. Does 
anyone recognize this problem?


--

Ron Kerry


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] successful ipmi stonith still times out

2015-12-17 Thread Ken Gaillot
On 12/17/2015 10:32 AM, Ron Kerry wrote:
> I have a customer (running SLE 11 SP4 HAE) who is seeing the following
> stonith behavior running the ipmi stonith plugin.
> 
> Dec 15 14:21:43 test4 pengine[24002]:  warning: pe_fence_node: Node
> test3 will be fenced because termination was requested
> Dec 15 14:21:43 test4 pengine[24002]:  warning: determine_online_status:
> Node test3 is unclean
> Dec 15 14:21:43 test4 pengine[24002]:  warning: stage6: Scheduling Node
> test3 for STONITH
> 
> ... it issues the reset and it is noted ...
> Dec 15 14:21:45 test4 external/ipmi(STONITH-test3)[177184]: [177197]:
> debug: ipmitool output: Chassis Power Control: Reset
> Dec 15 14:21:46 test4 stonith-ng[23999]:   notice: log_operation:
> Operation 'reboot' [177179] (call 2 from crmd.24003) for host 'test3'
> with device 'STONITH-test3' returned: 0 (OK)
> 
> ... test3 does go down ...
> Dec 15 14:22:21 test4 kernel: [90153.906461] Cell 2 (test3) left the
> membership
> 
> ... but the stonith operation times out (it said OK earlier) ...
> Dec 15 14:22:56 test4 stonith-ng[23999]:   notice: remote_op_timeout:
> Action reboot (a399a8cb-541a-455e-8d7c-9072d48667d1) for test3
> (crmd.24003) timed out
> Dec 15 14:23:05 test4 external/ipmi(STONITH-test3)[177667]: [177678]:
> debug: ipmitool output: Chassis Power is on
> 
> Dec 15 14:23:56 test4 crmd[24003]:error:
> stonith_async_timeout_handler: Async call 2 timed out after 132000ms
> Dec 15 14:23:56 test4 crmd[24003]:   notice: tengine_stonith_callback:
> Stonith operation 2/51:100:0:f43dc87c-faf0-4034-8b51-be0c13c95656: Timer
> expired (-62)
> Dec 15 14:23:56 test4 crmd[24003]:   notice: tengine_stonith_callback:
> Stonith operation 2 for test3 failed (Timer expired): aborting transition.
> Dec 15 14:23:56 test4 crmd[24003]:   notice: abort_transition_graph:
> Transition aborted: Stonith failed (source=tengine_stonith_callback:697, 0)
> 
> This looks like a bug but a quick search did not turn up anything. Does
> anyone recognize this problem?

Fence timeouts can be tricky to troubleshoot because there are multiple
timeouts involved. The process goes like this:

1. crmd asks the local stonithd to do the fence.

2. The local stonithd queries all stonithd's to ensure it has the latest
status of all fence devices.

3. The local stonithd chooses a fence device (or possibly devices, if
topology is involved) and picks the best stonithd (or stonithd's) to
actually execute the fencing.

4. The chosen stonithd (or stonithd's) runs the fence agent to do the
actual fencing, then replies to the original stonithd, which replies to
the original requester.

So the crmd can timeout waiting for a reply from stonithd, the local
stonithd can timeout waiting for query replies from all stonithd's, the
local stonithd can timeout waiting for a reply from one or more
executing stonithd's, or an executing stonithd can timeout waiting for a
reply from the fence device.

Another factor is that some reboots can be remapped to off then on. This
will happen, for example, if the fence device doesn't have a reboot
command, or if it's in a fence topology level with other devices. So in
that case, there's the possibility of a timeout for the off command, and
the on command.

In this case, one thing that's odd is that the "Async call 2 timed out"
message is the timeout for the crmd waiting for a reply from stonithd.
The crmd timeout is always a minute longer than stonithd's timeout,
which should be more than enough time for stonithd to reply. I'm not
sure what's going on there.

I'd look closely at the entire fence configuration (is topology
involved? what are the configured timeouts? are the configuration
options correct?), and trace through the logs to see what step or steps
are actually timing out.

I do see here that the reboot times out before the "Chassis Power is on"
message, so it's possible the reboot timeout is too short to account for
a full cycle. But I'm not sure why it would report OK before that,
unless maybe that was for one step of the larger process.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org