Re: [Linux-HA] Fwd: Problem promoting slave to master

Andreas Kurz Tue, 19 Mar 2013 15:32:51 -0700

On 2013-03-19 16:02, Fredrik Hudner wrote:
> Just wanted to change what document it*s been built from.. It should be
> "LINBIT DRBD 8.4 Configuration Guide: NFS on RHEL 6


There is again that fencing-constraint in your configuration .... what
does "drbdadm dump all" look like? Any chance you only specified a
fence-peer handler in you resource configuration but don't overwrite
that after-resync-target handler you specified in your
global_common.conf ... that would explain that dangling constraint that
will prevent a failover.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> ---------- Forwarded message ----------
> From: Fredrik Hudner <fredrik.hud...@gmail.com>
> Date: Mon, Mar 18, 2013 at 11:06 AM
> Subject: Re: [Linux-HA] Problem promoting slave to master
> To: General Linux-HA mailing list <linux-ha@lists.linux-ha.org>
> 
> 
> 
> 
> On Fri, Mar 15, 2013 at 1:04 AM, Andreas Kurz <andr...@hastexo.com> wrote:
> 
>> On 2013-03-14 15:52, Fredrik Hudner wrote:
>>> I set no-quorum-policy to ignore and removed the constraint you
>> mentioned.
>>> It then managed to failover once to the slave node, but I still have
>> those.
>>>
>>> Failed actions:
>>>
>>>      p_exportfs_root:0_monitor_
>>>>
>>>> 30000 (node=testclu01, call=12, rc=7,
>>>>   status=complete): not running
>>>>
>>>>      p_exportfs_root:1_monitor_30000 (node=testclu02, call=12, rc=7,
>>>>   status=complete): not running
>>
>> This only tells you that monitoring of these resources found them once
>> not running .... logs should tell you what & when that happens
>>
> 
> I have attached the logs from master and slave.. I can see that it stops,
> but not really why (to limited knowledge to read the logs)
> 
>>
>>>
>>> I then stoped the new maste-node to see if it fell over to the other node
>>> with no success.. It remains slave.
>>
>> Hard to say without seeing current cluster state like a "crm_mon -1frA",
>> "cat /proc/drbd" and some logs ... not enough information ...
>>
>> I have attached the output from crm_mon, the new crm configure and
> /proc/drbd
> 
> 
>>> I also noticed that the constraint drbd-fence-by-handler-nfs-ms_drbd_nfs
>>> was back in the crm configure. Seems like cib makes a replace
>>
>> This constraint is added by the DRBD primary if it looses connection to
>> its peer and is perfectly fine if you stopped one node.
>>
>> Seems like the cluster have a problem attaching to the cluster node ip,
> but I'm not sure why
> 
> i would like to add, that I took over this configuration from a guy that
> has left, but I know that it's configured by using the technical
> documentation from LINBIT "Highly available NFS storage with DRBD and
> Pacemaker".
> 
>>
>>> Mar 14 15:06:18 [1786] tdtestclu02       crmd:     info:
>>> abort_transition_graph:        te_update_diff:126 - Triggered transition
>>> abort (complete=1, tag=diff, id=(null), magic=NA, cib=0.781.1) :
>> Non-status
>>> change
>>> Mar 14 15:06:18 [1786] tdtestclu02       crmd:   notice:
>>> do_state_transition:   State transition S_IDLE -> S_POLICY_ENGINE [
>>> input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
>>> Mar 14 15:06:18 [1781] tdtestclu02        cib:     info:
>>> cib_replace_notify:    Replaced: 0.780.39 -> 0.781.1 from tdtestclu01
>>>
>>> So not sure how to remove that constraint on a permanent basis.. it's
>> gone
>>> as long as I don't stop pacemaker.
>>
>> Once the DRBD resync is finished it will be removed from the cluster
>> configuration again automatically... you typically never need to remove
>> such drbd-fence-constraints manually only in some rare failure scenarios.
>>
>> Regards,
>> Andreas
>>
>>
>>>
>>> But it used to work booth with the no-quorom-policy=freeze and that
>>> constraint
>>>
>>> Kind regards
>>> /Fredrik
>>>
>>>
>>>
>>> On Thu, Mar 14, 2013 at 2:49 PM, Andreas Kurz <andr...@hastexo.com>
>> wrote:
>>>
>>>> On 2013-03-14 13:30, Fredrik Hudner wrote:
>>>>> Hi all,
>>>>>
>>>>> I have a problem after I removed a node with the force command from my
>>>> crm
>>>>> config.
>>>>>
>>>>> Originally I had 2 nodes running HA cluster (corosync 1.4.1-7.el6,
>>>>> pacemaker 1.1.7-6.el6)
>>>>>
>>>>>
>>>>>
>>>>> Then I wanted to add a third node acting as quorum node, but was not
>> able
>>>>> to get it to work (probably because I don’t understand how to set it
>> up).
>>>>>
>>>>> So I removed the 3rd node, but had to use the force command as crm
>>>>> complained when I tried to remove it.
>>>>>
>>>>>
>>>>>
>>>>> Now when I start up Pacemaker the resources doesn’t look like they come
>>>> up
>>>>> correctly
>>>>>
>>>>>
>>>>>
>>>>> Online: [ testclu01 testclu02 ]
>>>>>
>>>>>
>>>>>
>>>>> Master/Slave Set: ms_drbd_nfs [p_drbd_nfs]
>>>>>
>>>>>      Masters: [ testclu01 ]
>>>>>
>>>>>      Slaves: [ testclu02 ]
>>>>>
>>>>> Clone Set: cl_lsb_nfsserver [p_lsb_nfsserver]
>>>>>
>>>>>      Started: [ tdtestclu01 tdtestclu02 ]
>>>>>
>>>>> Resource Group: g_nfs
>>>>>
>>>>>      p_lvm_nfs  (ocf::heartbeat:LVM):   Started testclu01
>>>>>
>>>>>      p_fs_shared        (ocf::heartbeat:Filesystem):    Started
>> testclu01
>>>>>
>>>>>      p_fs_shared2       (ocf::heartbeat:Filesystem):    Started
>> testclu01
>>>>>
>>>>>      p_ip_nfs   (ocf::heartbeat:IPaddr2):       Started testclu01
>>>>>
>>>>> Clone Set: cl_exportfs_root [p_exportfs_root]
>>>>>
>>>>>      Started: [ testclu01 testclu02 ]
>>>>>
>>>>>
>>>>>
>>>>> Failed actions:
>>>>>
>>>>>     p_exportfs_root:0_monitor_30000 (node=testclu01, call=12, rc=7,
>>>>> status=complete): not running
>>>>>
>>>>>     p_exportfs_root:1_monitor_30000 (node=testclu02, call=12, rc=7,
>>>>> status=complete): not running
>>>>>
>>>>>
>>>>>
>>>>> The filesystems mount correctly on the master at this stage and can be
>>>>> written to.
>>>>>
>>>>> When I stop the services on the master node for it to failover, it
>>>> doesn’t
>>>>> work.. Looses cluster-ip connectivity
>>>>
>>>> fix your "no-quorum-policy", you want to "ignore" the quorum in a
>>>> two-node cluster to allow failover ... and if your drbd device is
>>>> already in sync, remove that drbd-fence-by-handler-nfs-ms_drbd_nfs
>>>> constraint.
>>>>
>>>> Regards,
>>>> Andreas
>>>>
>>>> --
>>>> Need help with Pacemaker?
>>>> http://www.hastexo.com/now
>>>>
>>>>>
>>>>>
>>>>>
>>>>> Corosync.log from master after I stopped pacemaker on master node :
>>  see
>>>>> attached file
>>>>>
>>>>>
>>>>>
>>>>> Additional files (attached): crm-configure show
>>>>>
>>>>>                                                           Corosync.conf
>>>>>
>>>>>
>>>> Global_common.conf
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I’m not sure how to proceed to get it up in a fair state now
>>>>>
>>>>> So if anyone could help me it would be much appreciated
>>>>>
>>>>>
>>>>>
>>>>> Kind regards
>>>>>
>>>>> /Fredrik
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Linux-HA mailing list
>>>>> Linux-HA@lists.linux-ha.org
>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Linux-HA mailing list
>>>> Linux-HA@lists.linux-ha.org
>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>> See also: http://linux-ha.org/ReportingProblems
>>>>
>>> _______________________________________________
>>> Linux-HA mailing list
>>> Linux-HA@lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>>>
>>
>>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
> 
> 
> 
> 
> 
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 



_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Fwd: Problem promoting slave to master

Reply via email to