30.11.2011 14:08, Vadim Bulst wrote: > Hello, > > first of all I'd like to ask you a general question: > > Does somebody successfully set up a clvm cluster with pacemaker and run > it in productive mode?
I will say yes after I finally resolve remaining dlm&fencing issues. > > Now back to the concrete problem: > > I configured two interfaces for corosync: > > root@bbzclnode04:~# corosync-cfgtool -s > Printing ring status. > Local node ID 897624256 > RING ID 0 > id = 192.168.128.53 > status = ring 0 active with no faults > RING ID 1 > id = 192.168.129.23 > status = ring 1 active with no faults > > RRD set to passive > > I also made some changes to my cib: > > node bbzclnode04 > node bbzclnode06 > node bbzclnode07 > primitive clvm ocf:lvm2:clvmd \ > params daemon_timeout="30" \ > meta target-role="Started" Please instruct clvmd to use corosync stack instead of openais (-I corosync): otherwise it uses LCK service which is not mature and I observed major problems with it. > primitive dlm ocf:pacemaker:controld \ > meta target-role="Started" > group dlm-clvm dlm clvm > clone dlm-clvm-clone dlm-clvm \ > meta interleave="true" ordered="true" > property $id="cib-bootstrap-options" \ > dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="3" \ > no-quorum-policy="ignore" \ > stonith-enabled="false" \ > last-lrm-refresh="1322643084" > > I cleaned and restarted the resources - nothing! : > > crm(live)resource# cleanup dlm-clvm-clone > Cleaning up dlm:0 on bbzclnode04 > Cleaning up dlm:0 on bbzclnode06 > Cleaning up dlm:0 on bbzclnode07 > Cleaning up clvm:0 on bbzclnode04 > Cleaning up clvm:0 on bbzclnode06 > Cleaning up clvm:0 on bbzclnode07 > Cleaning up dlm:1 on bbzclnode04 > Cleaning up dlm:1 on bbzclnode06 > Cleaning up dlm:1 on bbzclnode07 > Cleaning up clvm:1 on bbzclnode04 > Cleaning up clvm:1 on bbzclnode06 > Cleaning up clvm:1 on bbzclnode07 > Cleaning up dlm:2 on bbzclnode04 > Cleaning up dlm:2 on bbzclnode06 > Cleaning up dlm:2 on bbzclnode07 > Cleaning up clvm:2 on bbzclnode04 > Cleaning up clvm:2 on bbzclnode06 > Cleaning up clvm:2 on bbzclnode07 > Waiting for 19 replies from the CRMd................... OK > > crm_mon: > > ============ > Last updated: Wed Nov 30 10:15:09 2011 > Stack: openais > Current DC: bbzclnode04 - partition with quorum > Version: 1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f > 3 Nodes configured, 3 expected votes > 1 Resources configured. > ============ > > Online: [ bbzclnode04 bbzclnode06 bbzclnode07 ] > > > Failed actions: > clvm:1_start_0 (node=bbzclnode06, call=11, rc=1, status=complete): > unknown error > clvm:0_start_0 (node=bbzclnode04, call=11, rc=1, status=complete): > unknown error > clvm:2_start_0 (node=bbzclnode07, call=11, rc=1, status=complete): > unknown error > > > When I look in the log - there is a message which tells me that may be > another clvm process is already running - but it isn't so. > > "clvmd could not create local socket Another clvmd is probably already > running" > > Or is it a permission problem - writing to the filesystem? Is there a > way to get rid of it? You can try to run it manually under strace. It will show you what happens. > > Shell I use a different distro - our install from source? > > > Am 24.11.2011 22:59, schrieb Andreas Kurz: >> Hello, >> >> On 11/24/2011 10:12 PM, Vadim Bulst wrote: >>> Hi Andreas, >>> >>> I changed my cib: >>> >>> node bbzclnode04 >>> node bbzclnode06 >>> node bbzclnode07 >>> primitive clvm ocf:lvm2:clvmd \ >>> params daemon_timeout="30" >>> primitive dlm ocf:pacemaker:controld >>> group g_lock dlm clvm >>> clone g_lock-clone g_lock \ >>> meta interleave="true" >>> property $id="cib-bootstrap-options" \ >>> dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \ >>> cluster-infrastructure="openais" \ >>> expected-quorum-votes="3" \ >>> no-quorum-policy="ignore" \ >>> stonith-enabled="false" \ >>> last-lrm-refresh="1322049979 >>> >>> but no luck at all. >> I assume you did at least a cleanup on clvm and it still does not work >> ... next step would be to grep for ERROR in your cluster log and look >> for other suspicious messages to find out why clvm is not that motivated >> to start. >> >>> "And use Corosync 1.4.x with redundant rings and automatic ring recovery >>> feature enabled." >>> >>> I got two interfaces per server - there are bonded together and bridged >>> for virtualization. Only one untagged vlan. I tried to give a tagged >>> Vlan Bridge a Address but didn't worked. My network conf looks like that: >> One ore two extra nics are quite affordable today to build e.g. a direct >> connection between the nodes (if possible) >> >> Regards, >> Andreas >> >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > -- > Mit freundlichen Grüßen > > Vadim Bulst > Systemadministrator BBZ > > Biotechnologisch-Biomedizinisches Zentrum > Universität Leipzig > Deutscher Platz 5, 04103 Leipzig > Tel.: 0341 97 - 31 307 > Fax : 0341 97 - 31 309 > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org