[ClusterLabs] Antw: RES: Pacemaker and OCFS2 on stand alone mode

2016-07-10 Thread Ulrich Windl
>>> "Carlos Xavier"  schrieb am 09.07.2016 um 00:43 
>>> in
Nachricht <00f201d1d96a$38b76980$aa263c80$@com.br>:
> Tank you very much to every one that tryed to help me
> 
>> 
>> "Carlos Xavier"  writes:
>> 
>> > 1467918891 Is dlm missing from kernel? No misc devices found.
>> > 1467918891 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2
>> > 1467918891 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2
>> > 1467918891 No /sys/kernel/config, is configfs loaded?
>> > 1467918891 shutdown
>> 
>> Try following the above hints:
>> 
>> modprobe configfs
>> modprobe dlm
>> mount -t configfs configfs /sys/kernel/config
>> 
> 
> I tryed those tips, they helped to go ahead, but it wasn't enough to get the 
> OCFS2 started on snad alone mode in order to recover
> the data.
> 
>> and then start the control daemon again.  But this is pretty much what the 
> controld resource should do
>> anyway.  The main question is why your cluster does not do it by itself.  If 
> you give up after all,
>> try this:
>> https://www.drbd.org/en/doc/users-guide-83/s-ocfs2-legacy 
>> --
> 
> I decided to make the hole install of another machine, to take the place of 
> the burned one, just to recover de data.

One pitfall I had was this: The OCFS stack has to be up when you FORMAT the 
OCFS filesystem; otherwise it wouldn't mount.
(Maybe one could do some post-tweaks, but I just wanted to tell you)

> 
> Once again, many tanks to you.
> 
> Regards,
> Carlos
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 





___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: RES: Pacemaker and OCFS2 on stand alone mode

2016-07-09 Thread Andrei Borzenkov
08.07.2016 09:11, Ulrich Windl пишет:
 "Carlos Xavier"  schrieb am 07.07.2016 um 18:57 
 in
> Nachricht <00e901d1d870$ae418000$0ac48000$@com.br>:
>> Tank you for the fast reply
>>
>>>
>>> have you configured the stonith and drbd stonith handler?
>>>
>>
>> Yes. they were configured.
>> The cluster was running fine for more than 4 years, until we loose one host 
>> by power supply failure.
>> Now I need to access the files on the host that is working.
> 
> Hi,
> 
> MHO: Have you ever tested the configuration? I wonder why the cluster did not 
> do everything to continue.
> 

Stonith most likely failed if node experience complete power failure. We
were not shown cluster state, so it is just guess; but normally the way
to recover is to manually declare node as down. Although this does it
for pacemaker only; I do not know how to do the same for DRBD (unless
pacemaker somehow forwards this information to it).

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: RES: Pacemaker and OCFS2 on stand alone mode

2016-07-07 Thread Ulrich Windl
>>> "Carlos Xavier"  schrieb am 07.07.2016 um 18:57 
>>> in
Nachricht <00e901d1d870$ae418000$0ac48000$@com.br>:
> Tank you for the fast reply
> 
>> 
>> have you configured the stonith and drbd stonith handler?
>> 
> 
> Yes. they were configured.
> The cluster was running fine for more than 4 years, until we loose one host 
> by power supply failure.
> Now I need to access the files on the host that is working.

Hi,

MHO: Have you ever tested the configuration? I wonder why the cluster did not 
do everything to continue.

Regards,
Ulrich

> 
>> 2016-07-07 16:43 GMT+02:00 Carlos Xavier :
>> > Hi.
>> > We had a Pacemaker cluster running OCFS2 filesystem over a DRBD device and 
> we completely lost one of
>> the hosts.
>> > Now I need some help to recover the data on the remaining machine.
>> > I was able to load the DRBD module by hand bring up the devices using the 
> drbdadm command line:
>> > apolo:~ # modprobe drbd
>> > apolo:~ # cat /proc/drbd
>> > version: 8.3.9 (api:88/proto:86-95)
>> > srcversion: A67EB2D25C5AFBFF3D8B788
>> >
>> > apolo:~ # drbd-overview
>> >   0:backup
>> >   1:export
>> > apolo:~ # drbdadm attach backup
>> > apolo:~ # drbdadm attach export
>> > apolo:~ # drbd-overview
>> >   0:backup  StandAlone Secondary/Unknown UpToDate/DUnknown r-
>> >   1:export  StandAlone Secondary/Unknown UpToDate/DUnknown r-
>> > apolo:~ # drbdadm primary backup apolo:~ # drbdadm primary export apolo:~ 
>> > # 
> drbd-overview
>> >   0:backup  StandAlone Primary/Unknown   UpToDate/DUnknown r-
>> >   1:export  StandAlone Primary/Unknown UpToDate/DUnknown r-
>> >
>> > We have these resources and constraints configured:
>> > primitive resDLM ocf:pacemaker:controld \
>> > op monitor interval="120s"
>> > primitive resDRBD_0 ocf:linbit:drbd \
>> > params drbd_resource="backup" \
>> > operations $id="resDRBD_0-operations" \
>> > op start interval="0" timeout="240" \
>> > op stop interval="0" timeout="100" \
>> > op monitor interval="20" role="Master" timeout="20" \
>> > op monitor interval="30" role="Slave" timeout="20"
>> > primitive resDRBD_1 ocf:linbit:drbd \
>> > params drbd_resource="export" \
>> > operations $id="resDRBD_1-operations" \
>> > op start interval="0" timeout="240" \
>> > op stop interval="0" timeout="100" \
>> > op monitor interval="20" role="Master" timeout="20" \
>> > op monitor interval="30" role="Slave" timeout="20"
>> > primitive resFS_BACKUP ocf:heartbeat:Filesystem \
>> > params device="/dev/drbd/by-res/backup" directory="/backup"
>> > fstype="ocfs2" options="rw,noatime" \
>> > op monitor interval="120s"
>> > primitive resFS_EXPORT ocf:heartbeat:Filesystem \
>> > params device="/dev/drbd/by-res/export" directory="/export"
>> > fstype="ocfs2" options="rw,noatime" \
>> > op monitor interval="120s"
>> > primitive resO2CB ocf:ocfs2:o2cb \
>> > op monitor interval="120s"
>> > group DRBD_01 resDRBD_0 resDRBD_1
>> > ms msDRBD_01 DRBD_01 \
>> > meta resource-stickines="100" notify="true" master-max="2"
>> > interleave="true" target-role="Started"
>> > clone cloneDLM resDLM \
>> > meta globally-unique="false" interleave="true"
>> > target-role="Started"
>> > clone cloneFS_BACKUP resFS_BACKUP \
>> > meta interleave="true" ordered="true" target-role="Started"
>> > clone cloneFS_EXPORT resFS_EXPORT \
>> > meta interleave="true" ordered="true" target-role="Started"
>> > clone cloneO2CB resO2CB \
>> > meta globally-unique="false" interleave="true"
>> > target-role="Started"
>> > colocation colDLMDRBD inf: cloneDLM msDRBD_01:Master colocation
>> > colFS_BACKUP-O2CB inf: cloneFS_BACKUP cloneO2CB colocation
>> > colFS_EXPORT-O2CB inf: cloneFS_EXPORT cloneO2CB colocation colO2CBDLM inf: 
> cloneO2CB cloneDLM order
>> ordDLMO2CB 0: cloneDLM cloneO2CB order ordDRBDDLM 0: msDRBD_01:promote 
> cloneDLM:start order ordO2CB-
>> FS_BACKUP 0: cloneO2CB cloneFS_BACKUP order ordO2CB-FS_EXPORT 0:
>> > cloneO2CB cloneFS_EXPORT
>> >
>> > As the DRBD devices were brought up by hand, Pacemaker doesn't
>> > recognize they are up and so it doesn't start the DLM resource and all 
> resources that depends on it
>> stay stopped.
>> > Is there any way I can circumvent this issue?
>> > Is it possible to bring the OCFS2 resources working on standalone mode?
>> > Please, any help will be very welcome.
>> >
>> > Best regards,
>> > Carlos.
>> >
>> >
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
G