Re: [ClusterLabs] Antw: RES: Pacemaker and OCFS2 on stand alone mode

2016-07-09 Thread Andrei Borzenkov
08.07.2016 09:11, Ulrich Windl пишет:
 "Carlos Xavier"  schrieb am 07.07.2016 um 18:57 
 in
> Nachricht <00e901d1d870$ae418000$0ac48000$@com.br>:
>> Tank you for the fast reply
>>
>>>
>>> have you configured the stonith and drbd stonith handler?
>>>
>>
>> Yes. they were configured.
>> The cluster was running fine for more than 4 years, until we loose one host 
>> by power supply failure.
>> Now I need to access the files on the host that is working.
> 
> Hi,
> 
> MHO: Have you ever tested the configuration? I wonder why the cluster did not 
> do everything to continue.
> 

Stonith most likely failed if node experience complete power failure. We
were not shown cluster state, so it is just guess; but normally the way
to recover is to manually declare node as down. Although this does it
for pacemaker only; I do not know how to do the same for DRBD (unless
pacemaker somehow forwards this information to it).

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: RES: Pacemaker and OCFS2 on stand alone mode

2016-07-08 Thread Ulrich Windl
>>> "Carlos Xavier"  schrieb am 07.07.2016 um 18:57 
>>> in
Nachricht <00e901d1d870$ae418000$0ac48000$@com.br>:
> Tank you for the fast reply
> 
>> 
>> have you configured the stonith and drbd stonith handler?
>> 
> 
> Yes. they were configured.
> The cluster was running fine for more than 4 years, until we loose one host 
> by power supply failure.
> Now I need to access the files on the host that is working.

Hi,

MHO: Have you ever tested the configuration? I wonder why the cluster did not 
do everything to continue.

Regards,
Ulrich

> 
>> 2016-07-07 16:43 GMT+02:00 Carlos Xavier :
>> > Hi.
>> > We had a Pacemaker cluster running OCFS2 filesystem over a DRBD device and 
> we completely lost one of
>> the hosts.
>> > Now I need some help to recover the data on the remaining machine.
>> > I was able to load the DRBD module by hand bring up the devices using the 
> drbdadm command line:
>> > apolo:~ # modprobe drbd
>> > apolo:~ # cat /proc/drbd
>> > version: 8.3.9 (api:88/proto:86-95)
>> > srcversion: A67EB2D25C5AFBFF3D8B788
>> >
>> > apolo:~ # drbd-overview
>> >   0:backup
>> >   1:export
>> > apolo:~ # drbdadm attach backup
>> > apolo:~ # drbdadm attach export
>> > apolo:~ # drbd-overview
>> >   0:backup  StandAlone Secondary/Unknown UpToDate/DUnknown r-
>> >   1:export  StandAlone Secondary/Unknown UpToDate/DUnknown r-
>> > apolo:~ # drbdadm primary backup apolo:~ # drbdadm primary export apolo:~ 
>> > # 
> drbd-overview
>> >   0:backup  StandAlone Primary/Unknown   UpToDate/DUnknown r-
>> >   1:export  StandAlone Primary/Unknown UpToDate/DUnknown r-
>> >
>> > We have these resources and constraints configured:
>> > primitive resDLM ocf:pacemaker:controld \
>> > op monitor interval="120s"
>> > primitive resDRBD_0 ocf:linbit:drbd \
>> > params drbd_resource="backup" \
>> > operations $id="resDRBD_0-operations" \
>> > op start interval="0" timeout="240" \
>> > op stop interval="0" timeout="100" \
>> > op monitor interval="20" role="Master" timeout="20" \
>> > op monitor interval="30" role="Slave" timeout="20"
>> > primitive resDRBD_1 ocf:linbit:drbd \
>> > params drbd_resource="export" \
>> > operations $id="resDRBD_1-operations" \
>> > op start interval="0" timeout="240" \
>> > op stop interval="0" timeout="100" \
>> > op monitor interval="20" role="Master" timeout="20" \
>> > op monitor interval="30" role="Slave" timeout="20"
>> > primitive resFS_BACKUP ocf:heartbeat:Filesystem \
>> > params device="/dev/drbd/by-res/backup" directory="/backup"
>> > fstype="ocfs2" options="rw,noatime" \
>> > op monitor interval="120s"
>> > primitive resFS_EXPORT ocf:heartbeat:Filesystem \
>> > params device="/dev/drbd/by-res/export" directory="/export"
>> > fstype="ocfs2" options="rw,noatime" \
>> > op monitor interval="120s"
>> > primitive resO2CB ocf:ocfs2:o2cb \
>> > op monitor interval="120s"
>> > group DRBD_01 resDRBD_0 resDRBD_1
>> > ms msDRBD_01 DRBD_01 \
>> > meta resource-stickines="100" notify="true" master-max="2"
>> > interleave="true" target-role="Started"
>> > clone cloneDLM resDLM \
>> > meta globally-unique="false" interleave="true"
>> > target-role="Started"
>> > clone cloneFS_BACKUP resFS_BACKUP \
>> > meta interleave="true" ordered="true" target-role="Started"
>> > clone cloneFS_EXPORT resFS_EXPORT \
>> > meta interleave="true" ordered="true" target-role="Started"
>> > clone cloneO2CB resO2CB \
>> > meta globally-unique="false" interleave="true"
>> > target-role="Started"
>> > colocation colDLMDRBD inf: cloneDLM msDRBD_01:Master colocation
>> > colFS_BACKUP-O2CB inf: cloneFS_BACKUP cloneO2CB colocation
>> > colFS_EXPORT-O2CB inf: cloneFS_EXPORT cloneO2CB colocation colO2CBDLM inf: 
> cloneO2CB cloneDLM order
>> ordDLMO2CB 0: cloneDLM cloneO2CB order ordDRBDDLM 0: msDRBD_01:promote 
> cloneDLM:start order ordO2CB-
>> FS_BACKUP 0: cloneO2CB cloneFS_BACKUP order ordO2CB-FS_EXPORT 0:
>> > cloneO2CB cloneFS_EXPORT
>> >
>> > As the DRBD devices were brought up by hand, Pacemaker doesn't
>> > recognize they are up and so it doesn't start the DLM resource and all 
> resources that depends on it
>> stay stopped.
>> > Is there any way I can circumvent this issue?
>> > Is it possible to bring the OCFS2 resources working on standalone mode?
>> > Please, any help will be very welcome.
>> >
>> > Best regards,
>> > Carlos.
>> >
>> >
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 




___
Users mailing list: Users@clusterlabs.org