Re: [ClusterLabs] Major problem with iSCSITarget resource on top of DRBD M/S resource.

2015-09-27 Thread Digimer
On 27/09/15 11:02 AM, Alex Crow wrote:
> 
> 
> On 27/09/15 15:54, Digimer wrote:
>> On 27/09/15 10:40 AM, Alex Crow wrote:
>>> Hi List,
>>>
>>> I'm trying to set up a failover iSCSI storage system for oVirt using a
>>> self-hosted engine. I've set up DRBD in Master-Slave for two iSCSI
>>> targets, one for the self-hosted engine and one for the VMs. I has this
>>> all working perfectly, then after trying to move the engine's LUN to the
>>> opposite host, all hell broke loose. The VMS LUN is still fine, starts
>> I'm guessing no fencing?
> 
> Hi Digimer,
> 
> No, but I've tried turning off one machine and still no success as a
> single node :-(

You *must* have working fencing, anyway. So now strikes me as a
fantastic time to add it. Turning off the node alone doesn't help.

>>> and migrates as it should. However the engine LUN always seems to try to
>>> launch the target on the host that is *NOT* the master of the DRBD
>>> resource. My constraints look fine, and should be self-explanatory about
>>> which is which:
>>>
>>> [root@granby ~]# pcs constraint --full
>>> Location Constraints:
>>> Ordering Constraints:
>>>promote drbd-vms-iscsi then start iscsi-vms-ip (kind:Mandatory)
>>> (id:vm_iscsi_ip_after_drbd)
>>>start iscsi-vms-target then start iscsi-vms-lun (kind:Mandatory)
>>> (id:vms_lun_after_target)
>>>promote drbd-vms-iscsi then start iscsi-vms-target (kind:Mandatory)
>>> (id:vms_target_after_drbd)
>>>promote drbd-engine-iscsi then start iscsi-engine-ip (kind:Mandatory)
>>> (id:ip_after_drbd)
>>>start iscsi-engine-target then start iscsi-engine-lun
>>> (kind:Mandatory)
>>> (id:lun_after_target)
>>>promote drbd-engine-iscsi then start iscsi-engine-target
>>> (kind:Mandatory) (id:target_after_drbd)
>>> Colocation Constraints:
>>>iscsi-vms-ip with drbd-vms-iscsi (score:INFINITY) (rsc-role:Started)
>>> (with-rsc-role:Master) (id:vms_ip-with-drbd)
>>>iscsi-vms-lun with drbd-vms-iscsi (score:INFINITY) (rsc-role:Started)
>>> (with-rsc-role:Master) (id:vms_lun-with-drbd)
>>>iscsi-vms-target with drbd-vms-iscsi (score:INFINITY)
>>> (rsc-role:Started) (with-rsc-role:Master) (id:vms_target-with-drbd)
>>>iscsi-engine-ip with drbd-engine-iscsi (score:INFINITY)
>>> (rsc-role:Started) (with-rsc-role:Master) (id:ip-with-drbd)
>>>iscsi-engine-lun with drbd-engine-iscsi (score:INFINITY)
>>> (rsc-role:Started) (with-rsc-role:Master) (id:lun-with-drbd)
>>>iscsi-engine-target with drbd-engine-iscsi (score:INFINITY)
>>> (rsc-role:Started) (with-rsc-role:Master) (id:target-with-drbd)
>>>
>>> But see this from pcs status, the iSCSI target has FAILED on glenrock,
>>> but the DRBD master is on granby!:
>>>
>>> [root@granby ~]# pcs status
>>> Cluster name: storage
>>> Last updated: Sun Sep 27 15:30:08 2015
>>> Last change: Sun Sep 27 15:20:58 2015
>>> Stack: cman
>>> Current DC: glenrock - partition with quorum
>>> Version: 1.1.11-97629de
>>> 2 Nodes configured
>>> 10 Resources configured
>>>
>>>
>>> Online: [ glenrock granby ]
>>>
>>> Full list of resources:
>>>
>>>   Master/Slave Set: drbd-vms-iscsi [drbd-vms]
>>>   Masters: [ glenrock ]
>>>   Slaves: [ granby ]
>>>   iscsi-vms-target(ocf::heartbeat:iSCSITarget): Started glenrock
>>>   iscsi-vms-lun(ocf::heartbeat:iSCSILogicalUnit): Started glenrock
>>>   iscsi-vms-ip(ocf::heartbeat:IPaddr2):Started glenrock
>>>   Master/Slave Set: drbd-engine-iscsi [drbd-engine]
>>>   Masters: [ granby ]
>>>   Slaves: [ glenrock ]
>>>   iscsi-engine-target(ocf::heartbeat:iSCSITarget): FAILED glenrock
>>> (unmanaged)
>>>   iscsi-engine-ip(ocf::heartbeat:IPaddr2):Stopped
>>>   iscsi-engine-lun(ocf::heartbeat:iSCSILogicalUnit): Stopped
>>>
>>> Failed actions:
>>>  iscsi-engine-target_stop_0 on glenrock 'unknown error' (1):
>>> call=177, status=Timed Out, last-rc-change='Sun Sep 27 15:20:59 2015',
>>> queued=0ms, exec=10003ms
>>>  iscsi-engine-target_stop_0 on glenrock 'unknown error' (1):
>>> call=177, status=Timed Out, last-rc-change='Sun Sep 27 15:20:59 2015',
>>> queued=0ms, exec=10003ms
>>>
>>> I have tried various combinations of pcs resource clear and cleanup, but
>>> that all result in the same outcome - apart from on some occasions when
>>> one or other of the two hosts suddenly reboots!
>>>
>>> Here is a log right after a "pcs resource cleanup" - first on the master
>>> for the DRBD m/s resource:
>>> [root@granby ~]# pcs resource cleanup; tail -f /var/log/messages
>>> All resources/stonith devices successfully cleaned up
>>> Sep 27 15:33:42 granby crmd[3358]:   notice: process_lrm_event:
>>> granby-drbd-engine_monitor_0:117 [ \n ]
>>> Sep 27 15:33:42 granby attrd[3356]:   notice: attrd_trigger_update:
>>> Sending flush op to all hosts for: probe_complete (true)
>>> Sep 27 15:33:42 granby attrd[3356]:   notice: attrd_perform_update: Sent
>>> update 54: probe_complete=true
>>> Sep 27 15:33:42 granby crmd[3358]:   notice: process_lrm_event:
>>> Operation drbd-engine_monitor_1: 

Re: [ClusterLabs] Major problem with iSCSITarget resource on top of DRBD M/S resource.

2015-09-27 Thread Digimer
On 27/09/15 10:40 AM, Alex Crow wrote:
> Hi List,
> 
> I'm trying to set up a failover iSCSI storage system for oVirt using a
> self-hosted engine. I've set up DRBD in Master-Slave for two iSCSI
> targets, one for the self-hosted engine and one for the VMs. I has this
> all working perfectly, then after trying to move the engine's LUN to the
> opposite host, all hell broke loose. The VMS LUN is still fine, starts

I'm guessing no fencing?

> and migrates as it should. However the engine LUN always seems to try to
> launch the target on the host that is *NOT* the master of the DRBD
> resource. My constraints look fine, and should be self-explanatory about
> which is which:
> 
> [root@granby ~]# pcs constraint --full
> Location Constraints:
> Ordering Constraints:
>   promote drbd-vms-iscsi then start iscsi-vms-ip (kind:Mandatory)
> (id:vm_iscsi_ip_after_drbd)
>   start iscsi-vms-target then start iscsi-vms-lun (kind:Mandatory)
> (id:vms_lun_after_target)
>   promote drbd-vms-iscsi then start iscsi-vms-target (kind:Mandatory)
> (id:vms_target_after_drbd)
>   promote drbd-engine-iscsi then start iscsi-engine-ip (kind:Mandatory)
> (id:ip_after_drbd)
>   start iscsi-engine-target then start iscsi-engine-lun (kind:Mandatory)
> (id:lun_after_target)
>   promote drbd-engine-iscsi then start iscsi-engine-target
> (kind:Mandatory) (id:target_after_drbd)
> Colocation Constraints:
>   iscsi-vms-ip with drbd-vms-iscsi (score:INFINITY) (rsc-role:Started)
> (with-rsc-role:Master) (id:vms_ip-with-drbd)
>   iscsi-vms-lun with drbd-vms-iscsi (score:INFINITY) (rsc-role:Started)
> (with-rsc-role:Master) (id:vms_lun-with-drbd)
>   iscsi-vms-target with drbd-vms-iscsi (score:INFINITY)
> (rsc-role:Started) (with-rsc-role:Master) (id:vms_target-with-drbd)
>   iscsi-engine-ip with drbd-engine-iscsi (score:INFINITY)
> (rsc-role:Started) (with-rsc-role:Master) (id:ip-with-drbd)
>   iscsi-engine-lun with drbd-engine-iscsi (score:INFINITY)
> (rsc-role:Started) (with-rsc-role:Master) (id:lun-with-drbd)
>   iscsi-engine-target with drbd-engine-iscsi (score:INFINITY)
> (rsc-role:Started) (with-rsc-role:Master) (id:target-with-drbd)
> 
> But see this from pcs status, the iSCSI target has FAILED on glenrock,
> but the DRBD master is on granby!:
> 
> [root@granby ~]# pcs status
> Cluster name: storage
> Last updated: Sun Sep 27 15:30:08 2015
> Last change: Sun Sep 27 15:20:58 2015
> Stack: cman
> Current DC: glenrock - partition with quorum
> Version: 1.1.11-97629de
> 2 Nodes configured
> 10 Resources configured
> 
> 
> Online: [ glenrock granby ]
> 
> Full list of resources:
> 
>  Master/Slave Set: drbd-vms-iscsi [drbd-vms]
>  Masters: [ glenrock ]
>  Slaves: [ granby ]
>  iscsi-vms-target(ocf::heartbeat:iSCSITarget): Started glenrock
>  iscsi-vms-lun(ocf::heartbeat:iSCSILogicalUnit): Started glenrock
>  iscsi-vms-ip(ocf::heartbeat:IPaddr2):Started glenrock
>  Master/Slave Set: drbd-engine-iscsi [drbd-engine]
>  Masters: [ granby ]
>  Slaves: [ glenrock ]
>  iscsi-engine-target(ocf::heartbeat:iSCSITarget): FAILED glenrock
> (unmanaged)
>  iscsi-engine-ip(ocf::heartbeat:IPaddr2):Stopped
>  iscsi-engine-lun(ocf::heartbeat:iSCSILogicalUnit): Stopped
> 
> Failed actions:
> iscsi-engine-target_stop_0 on glenrock 'unknown error' (1):
> call=177, status=Timed Out, last-rc-change='Sun Sep 27 15:20:59 2015',
> queued=0ms, exec=10003ms
> iscsi-engine-target_stop_0 on glenrock 'unknown error' (1):
> call=177, status=Timed Out, last-rc-change='Sun Sep 27 15:20:59 2015',
> queued=0ms, exec=10003ms
> 
> I have tried various combinations of pcs resource clear and cleanup, but
> that all result in the same outcome - apart from on some occasions when
> one or other of the two hosts suddenly reboots!
> 
> Here is a log right after a "pcs resource cleanup" - first on the master
> for the DRBD m/s resource:
> [root@granby ~]# pcs resource cleanup; tail -f /var/log/messages
> All resources/stonith devices successfully cleaned up
> Sep 27 15:33:42 granby crmd[3358]:   notice: process_lrm_event:
> granby-drbd-engine_monitor_0:117 [ \n ]
> Sep 27 15:33:42 granby attrd[3356]:   notice: attrd_trigger_update:
> Sending flush op to all hosts for: probe_complete (true)
> Sep 27 15:33:42 granby attrd[3356]:   notice: attrd_perform_update: Sent
> update 54: probe_complete=true
> Sep 27 15:33:42 granby crmd[3358]:   notice: process_lrm_event:
> Operation drbd-engine_monitor_1: master (node=granby, call=131,
> rc=8, cib-update=83, confirmed=false)
> Sep 27 15:33:42 granby crmd[3358]:   notice: process_lrm_event:
> granby-drbd-engine_monitor_1:131 [ \n ]
> Sep 27 15:33:42 granby crmd[3358]:   notice: process_lrm_event:
> Operation drbd-vms_monitor_2: ok (node=granby, call=130, rc=0,
> cib-update=84, confirmed=false)
> Sep 27 15:34:46 granby crmd[3358]:   notice: do_lrm_invoke: Forcing the
> status of all resources to be redetected
> Sep 27 15:34:46 granby attrd[3356]:   notice: attrd_trigger_update:
>