I think it's better you use clvmd with cman I don't now why you use the lsb script of clvm
On Redhat clvmd need of cman and you try to running with pacemaker, i not sure this is the problem but this type of configuration it's so strange I made it a virtual cluster with kvm and i not foud a problems Il giorno 24 marzo 2012 13:09, William Seligman <selig...@nevis.columbia.edu > ha scritto: > On 3/24/12 4:47 AM, emmanuel segura wrote: > > How do you configure clvmd? > > > > with cman or with pacemaker? > > Pacemaker. Here's the output of 'crm configure show': > <http://pastebin.com/426CdVwN> > > > Il giorno 23 marzo 2012 22:14, William Seligman < > selig...@nevis.columbia.edu > >> ha scritto: > > > >> On 3/23/12 5:03 PM, emmanuel segura wrote: > >> > >>> Sorry but i would to know if can show me your /etc/cluster/cluster.conf > >> > >> Here it is: <http://pastebin.com/GUr0CEgZ> > >> > >>> Il giorno 23 marzo 2012 21:50, William Seligman < > >> selig...@nevis.columbia.edu > >>>> ha scritto: > >>> > >>>> On 3/22/12 2:43 PM, William Seligman wrote: > >>>>> On 3/20/12 4:55 PM, Lars Ellenberg wrote: > >>>>>> On Fri, Mar 16, 2012 at 05:06:04PM -0400, William Seligman wrote: > >>>>>>> On 3/16/12 12:12 PM, William Seligman wrote: > >>>>>>>> On 3/16/12 7:02 AM, Andreas Kurz wrote: > >>>>>>>>> > >>>>>>>>> s----- ... DRBD suspended io, most likely because of it's > >>>>>>>>> fencing-policy. For valid dual-primary setups you have to use > >>>>>>>>> "resource-and-stonith" policy and a working "fence-peer" handler. > >> In > >>>>>>>>> this mode I/O is suspended until fencing of peer was succesful. > >>>> Question > >>>>>>>>> is, why the peer does _not_ also suspend its I/O because > obviously > >>>>>>>>> fencing was not successful ..... > >>>>>>>>> > >>>>>>>>> So with a correct DRBD configuration one of your nodes should > >> already > >>>>>>>>> have been fenced because of connection loss between nodes (on > drbd > >>>>>>>>> replication link). > >>>>>>>>> > >>>>>>>>> You can use e.g. that nice fencing script: > >>>>>>>>> > >>>>>>>>> http://goo.gl/O4N8f > >>>>>>>> > >>>>>>>> This is the output of "drbdadm dump admin": < > >>>> http://pastebin.com/kTxvHCtx> > >>>>>>>> > >>>>>>>> So I've got resource-and-stonith. I gather from an earlier thread > >> that > >>>>>>>> obliterate-peer.sh is more-or-less equivalent in functionality > with > >>>>>>>> stonith_admin_fence_peer.sh: > >>>>>>>> > >>>>>>>> <http://www.gossamer-threads.com/lists/linuxha/users/78504#78504> > >>>>>>>> > >>>>>>>> At the moment I'm pursuing the possibility that I'm returning the > >>>> wrong return > >>>>>>>> codes from my fencing agent: > >>>>>>>> > >>>>>>>> <http://www.gossamer-threads.com/lists/linuxha/users/78572> > >>>>>>> > >>>>>>> I cleaned up my fencing agent, making sure its return code matched > >>>> those > >>>>>>> returned by other agents in /usr/sbin/fence_, and allowing for some > >>>> delay issues > >>>>>>> in reading the UPS status. But... > >>>>>>> > >>>>>>>> After that, I'll look at another suggestion with lvm.conf: > >>>>>>>> > >>>>>>>> <http://www.gossamer-threads.com/lists/linuxha/users/78796#78796> > >>>>>>>> > >>>>>>>> Then I'll try DRBD 8.4.1. Hopefully one of these is the source of > >> the > >>>> issue. > >>>>>>> > >>>>>>> Failure on all three counts. > >>>>>> > >>>>>> May I suggest you double check the permissions on your fence peer > >>>> script? > >>>>>> I suspect you may simply have forgotten the "chmod +x" . > >>>>>> > >>>>>> Test with "drbdadm fence-peer minor-0" from the command line. > >>>>> > >>>>> I still haven't solved the problem, but this advice has gotten me > >>>> further than > >>>>> before. > >>>>> > >>>>> First, Lars was correct: I did not have execute permissions set on my > >>>> fence peer > >>>>> scripts. (D'oh!) I turned them on, but that did not change anything: > >>>> cman+clvmd > >>>>> still hung on the vgdisplay command if I crashed the peer node. > >>>>> > >>>>> I started up both nodes again (cman+pacemaker+drbd+clvmd) and tried > >> Lars' > >>>>> suggested command. I didn't save the response for this message (d'oh > >>>> again!) but > >>>>> it said that the fence-peer script had failed. > >>>>> > >>>>> Hmm. The peer was definitely shutting down, so my fencing script is > >>>> working. I > >>>>> went over it, comparing the return codes to those of the existing > >>>> scripts, and > >>>>> made some changes. Here's my current script: < > >>>> http://pastebin.com/nUnYVcBK>. > >>>>> > >>>>> Up until now my fence-peer scripts had either been Lon Hohberger's > >>>>> obliterate-peer.sh or Digimer's rhcs_fence. I decided to try > >>>>> stonith_admin-fence-peer.sh that Andreas Kurz recommended; unlike the > >>>> first two > >>>>> scripts, which fence using fence_node, the latter script just calls > >>>> stonith_admin. > >>>>> > >>>>> When I tried the stonith_admin-fence-peer.sh script, it worked: > >>>>> > >>>>> # drbdadm fence-peer minor-0 > >>>>> stonith_admin-fence-peer.sh[10886]: stonith_admin successfully fenced > >>>> peer > >>>>> orestes-corosync.nevis.columbia.edu. > >>>>> > >>>>> Power was cut on the peer, the remaining node stayed up. Then I > brought > >>>> up the > >>>>> peer with: > >>>>> > >>>>> stonith_admin -U orestes-corosync.nevis.columbia.edu > >>>>> > >>>>> BUT: When the restored peer came up and started to run cman, the > clvmd > >>>> hung on > >>>>> the main node again. > >>>>> > >>>>> After cycling through some more tests, I found that if I brought down > >>>> the peer > >>>>> with drbdadm, then brought up with the peer with no HA services, then > >>>> started > >>>>> drbd and then cman, the cluster remained intact. > >>>>> > >>>>> If I crashed the peer, the scheme in the previous paragraph didn't > >> work. > >>>> I bring > >>>>> up drbd, check that the disks are both UpToDate, then bring up cman. > At > >>>> that > >>>>> point the vgdisplay on the main node takes so long to run that clvmd > >>>> will time out: > >>>>> > >>>>> vgdisplay Error locking on node orestes-corosync.nevis.columbia.edu > : > >>>> Command > >>>>> timed out > >>>>> > >>>>> I timed how long it took vgdisplay to run. I might be able to work > >>>> around this > >>>>> by setting the timeout on my clvmd resource to 300s, but that seems > to > >>>> be a > >>>>> band-aid for an underlying problem. Any suggestions on what else I > >> could > >>>> check? > >>>> > >>>> I've done some more tests. Still no solution, just an observation: The > >>>> "death > >>>> mode" appears to be: > >>>> > >>>> - Two nodes running cman+pacemaker+drbd+clvmd > >>>> - Take one node down = one remaining node w/cman+pacemaker+drbd+clvmd > >>>> - Start up dead node. If it ever gets into a state in which it's > running > >>>> cman > >>>> but not clvmd, clvmd on the uncrashed node hangs. > >>>> - Conversely, if I bring up drbd, make it primary, start cman+clvmd, > >>>> there's no > >>>> problem on the uncrashed node. > >>>> > >>>> My guess is that clvmd is getting the number of nodes it expects from > >>>> cman. When > >>>> the formally-dead node starts running cman, the number of cluster > nodes > >>>> goes to > >>>> 2 (I checked with 'cman_tool status') but the number of nodes running > >>>> clvmd is > >>>> still 1, hence the crash. > >>>> > >>>> Does this guess make sense? > > -- > Bill Seligman | mailto://selig...@nevis.columbia.edu > Nevis Labs, Columbia Univ | http://www.nevis.columbia.edu/~seligman/ > PO Box 137 | > Irvington NY 10533 USA | Phone: (914) 591-2823 > > > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > -- esta es mi vida e me la vivo hasta que dios quiera _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems