Re: [ClusterLabs] Corosync+Pacemaker error during failover

2016-01-15 Thread priyanka

On 2015-10-08 20:52, emmanuel segura wrote:

please check if you drbd is configured to call fence-handler
https://drbd.linbit.com/users-guide/s-pacemaker-fencing.html


yes.


2015-10-08 17:16 GMT+02:00 priyanka :

Hi,

We are trying to build a HA setup for our servers using DRBD + 
Corosync +

pacemaker stack.

Attached is the configuration file for corosync/pacemaker and drbd.

We are getting errors while testing this setup.
1. When we stop corosync on Master machine say server1(lock), it is
Stonith'ed. In this case slave-server2(sher) is promoted to master.
   But when server1(lock) reboots res_exportfs_export1 is started on 
both
the servers and that resource goes into failed state followed by 
servers

going into unclean state.
   Then server1(lock) reboots and server2(sher) is master but in 
unclean

state. After server1(lock) comes up, server2(sher) is stonith'ed and
server1(lock) is slave(the only online node).
   When server2(sher) comes up, both the servers are slaves and 
resource

group(rg_export) is stopped. Then server2(sher) becomes Master and
server1(lock) is slave and resource group is started.
   At this point configuration becomes stable.


PFA logs(syslog) of server2(sher) after it is promoted to master 
till it is

first rebooted when resource exportfs goes into failed state.

Please let us know if the configuration is appropriate. From the 
logs we

could not figure out exact reason of resource failure.
Your comment on this scenario will be very helpful.

Thanks,
Priyanka


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Bugs: http://bugs.clusterlabs.org



--
Regards,
Priyanka
MTech3 Sysad
IIT Powai

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Corosync+Pacemaker error during failover

2016-01-15 Thread priyanka

On 2015-10-08 21:05, Digimer wrote:

On 08/10/15 11:16 AM, priyanka wrote:

fencing resource-only;


This needs to be 'fencing resource-and-stonith;'.
I did set the suggested parameter but error persists. Apparently node 
which comes back after fail-over is not able to detect res_exportfs_root 
on current master. Following is the log trace:



Jan 14 16:37:18 sher pengine[1383]:   notice: unpack_config: On loss of 
CCM Quorum: Ignore
Jan 14 16:37:18 sher pengine[1383]:  warning: unpack_rsc_op: Processing 
failed op monitor for res_exportfs_root:0 on sher: not running (7)
Jan 14 16:37:18 sher pengine[1383]:  warning: unpack_rsc_op: Processing 
failed op monitor for fence_lock on sher: unknown error (1)
Jan 14 16:37:18 sher pengine[1383]:error: native_create_actions: 
Resource res_exportfs_export1 (ocf::exportfs) is active on 2 nodes 
attempting recovery
Jan 14 16:37:18 sher pengine[1383]:  warning: native_create_actions: 
See http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more 
information.
Jan 14 16:37:18 sher pengine[1383]:   notice: LogActions: Start   
fence_sher#011(lock)
Jan 14 16:37:18 sher pengine[1383]:   notice: LogActions: Start   
res_drbd_export:1#011(lock)
Jan 14 16:37:18 sher pengine[1383]:   notice: LogActions: Restart 
res_exportfs_export1#011(Started sher)
Jan 14 16:37:18 sher pengine[1383]:   notice: LogActions: Start   
res_nfsserver:1#011(lock)
Jan 14 16:37:18 sher pengine[1383]:error: process_pe_message: 
Calculated Transition 7: /var/lib/pacemaker/pengine/pe-error-352.bz2
Jan 14 16:37:18 sher crmd[1384]:   notice: te_rsc_command: Initiating 
action 11: start fence_sher_start_0 on lock
Jan 14 16:37:18 sher crmd[1384]:   notice: te_rsc_command: Initiating 
action 50: stop res_exportfs_export1_stop_0 on lock
Jan 14 16:37:18 sher crmd[1384]:   notice: te_rsc_command: Initiating 
action 49: stop res_exportfs_export1_stop_0 on sher (local)
Jan 14 16:37:18 sher crmd[1384]:   notice: te_rsc_command: Initiating 
action 68: monitor res_exportfs_root_monitor_3 on lock
Jan 14 16:37:18 sher crmd[1384]:   notice: te_rsc_command: Initiating 
action 76: notify res_drbd_export_pre_notify_start_0 on sher (local)
Jan 14 16:37:18 sher crmd[1384]:   notice: te_rsc_command: Initiating 
action 58: start res_nfsserver_start_0 on lock



I have pacemaker 1.1.10 installed in my setup, should I try upgrade?

--
Regards,
Priyanka


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Corosync+Pacemaker error during failover

2015-10-08 Thread Ken Gaillot
On 10/08/2015 10:16 AM, priyanka wrote:
> Hi,
> 
> We are trying to build a HA setup for our servers using DRBD + Corosync
> + pacemaker stack.
> 
> Attached is the configuration file for corosync/pacemaker and drbd.

A few things I noticed:

* Don't set become-primary-on in the DRBD configuration in a Pacemaker
cluster; Pacemaker should handle all promotions to primary.

* I'm no NFS expert, but why is res_exportfs_root cloned? Can both
servers export it at the same time? I would expect it to be in the group
before res_exportfs_export1.

* Your constraints need some adjustment. Partly it depends on the answer
to the previous question, but currently res_fs (via the group) is
ordered after res_exportfs_root, and I don't see how that could work.

> We are getting errors while testing this setup.
> 1. When we stop corosync on Master machine say server1(lock), it is
> Stonith'ed. In this case slave-server2(sher) is promoted to master.
>But when server1(lock) reboots res_exportfs_export1 is started on
> both the servers and that resource goes into failed state followed by
> servers going into unclean state.
>Then server1(lock) reboots and server2(sher) is master but in unclean
> state. After server1(lock) comes up, server2(sher) is stonith'ed and
> server1(lock) is slave(the only online node).
>When server2(sher) comes up, both the servers are slaves and resource
> group(rg_export) is stopped. Then server2(sher) becomes Master and
> server1(lock) is slave and resource group is started.
>At this point configuration becomes stable.
> 
> 
> PFA logs(syslog) of server2(sher) after it is promoted to master till it
> is first rebooted when resource exportfs goes into failed state.
> 
> Please let us know if the configuration is appropriate. From the logs we
> could not figure out exact reason of resource failure.
> Your comment on this scenario will be very helpful.
> 
> Thanks,
> Priyanka
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org