Please keep all replies on the list. On Apr 12, 2010, at 2:44 PM, Jürgen Herrmann wrote:
> > On Mon, 12 Apr 2010 14:25:55 +0200, Andrew Beekhof <and...@beekhof.net> > wrote: >> What versions of openais (corosync?) and pacemaker are you using? > > app1a:~# apt-show-versions |grep pacemaker > pacemaker/sid upgradeable from 1.0.8-3~bpo50+1 to 1.0.8+hg15494-2 > > app1a:~# apt-show-versions |grep openais > libopenais-dev/lenny uptodate 1.1.2-1~bpo50+1 > libopenais3/lenny uptodate 1.1.2-1~bpo50+1 > openais/lenny uptodate 1.1.2-1~bpo50+1 Looks ok. Perhaps ping the ocfs2 guys to see what control device its trying open. > >> >> On Mon, Apr 12, 2010 at 2:00 PM, Jürgen Herrmann >> <juergen.herrm...@xlhost.de> wrote: >>> >>> hi! >>> >>> i'm on debian lenny and trying to run ocfs2 on a dual primary >>> drbd device. the drbd device is already set up as msDRBD0. >>> >>> to get dlm_controld.pcmk i installed it from source (from >>> cluster-suite-3.0.10) >>> now i configured a resource "resDLM" with 2 clones: >>> primitive resDLM ocf:pacemaker:controld op monitor interval="120s" >>> clone cloneDLM resDLM meta globally-unique="false" interleave="true" >>> colocation colDLM_DRBD0 inf: cloneDLM msDRBD0:Master >>> order ordDRBD0_DLM inf: msDRBD0:promote cloneDLM:start >>> -> seems to work. >>> >>> >>> to get ocfs2_controld.pcmk i installed ocfs2-tools-1.4.3 from source. >>> after adding the resource: >>> primitive resO2CB ocf:pacemaker:o2cb op monitor interval="120s" >>> clone cloneO2CB resO2CB meta globally-unique="false" interleave="true" >>> colocation colO2CB_DLM inf: cloneO2CB cloneDLM >>> order ordDLM_O2CB inf: cloneDLM cloneO2CB >>> >>> i get the following errors in crm_mon: >>> ====================================== >>> Failed actions: >>> resO2CB:0_start_0 (node=app1b.xlhost.de, call=28, rc=1, >>> status=complete): unknown error >>> resO2CB:0_start_0 (node=app1a.xlhost.de, call=38, rc=1, >>> status=complete): unknown error >>> >>> >>> the relevant syslog entries: >>> ============================ >>> Apr 12 13:15:18 app1a corosync[4638]: [pcmk ] info: pcmk_notify: >>> Enabling node >>> notifications for child 8311 (0xd83090) >>> Apr 12 13:15:18 app1a ocfs2_controld.pcmk: Error opening control > device: >>> Unable to access cluster service >>> >>> >>> >>> if i start "ocfs2_controld.pcmk -D" i get: >>> ========================================== >>> ocfs2_controld[18489]: 2010/04/12_13:40:39 info: init_ais_connection: >>> Creating connection to our AIS plugin >>> ocfs2_controld[18489]: 2010/04/12_13:40:39 info: init_ais_connection: > AIS >>> connection established >>> ocfs2_controld[18489]: 2010/04/12_13:40:39 info: get_ais_nodeid: Server >>> details: id=569559765 uname=app1a.xlhost.de cname=pcmk >>> ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node >>> app1a.xlhost.de now has id: 569559765 >>> ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node >>> 569559765 is now known as app1a.xlhost.de >>> 1271072439 setup_st...@168: Cluster connection established. Local node >>> id: 569559765 >>> 1271072439 setup_st...@172: Added Pacemaker as client 1 with fd 5 >>> 1271072439 setup_c...@609: Initializing CKPT service (try 1) >>> 1271072439 setup_c...@615: Connected to CKPT service with handle >>> 0x327b23c600000000 >>> 1271072439 call_ckpt_o...@160: Opening checkpoint >>> "ocfs2:controld:21f2cad5" (try 1) >>> 1271072439 call_ckpt_o...@170: Opened checkpoint >>> "ocfs2:controld:21f2cad5" >>> with handle 0x6633487300000000 >>> 1271072439 call_section_wr...@340: Writing to section >>> "daemon_max_protocol" on checkpoint "ocfs2:controld:21f2cad5" (try 1) >>> 1271072439 call_section_cre...@292: Creating section >>> "daemon_max_protocol" >>> on checkpoint "ocfs2:controld:21f2cad5" (try 1) >>> 1271072439 call_section_cre...@300: Created section > "daemon_max_protocol" >>> on checkpoint "ocfs2:controld:21f2cad5" >>> 1271072439 call_section_wr...@340: Writing to section >>> "ocfs2_max_protocol" >>> on checkpoint "ocfs2:controld:21f2cad5" (try 1) >>> 1271072439 call_section_cre...@292: Creating section > "ocfs2_max_protocol" >>> on checkpoint "ocfs2:controld:21f2cad5" (try 1) >>> 1271072439 call_section_cre...@300: Created section > "ocfs2_max_protocol" >>> on checkpoint "ocfs2:controld:21f2cad5" >>> 1271072439 start_j...@588: Starting join for group "ocfs2:controld" >>> 1271072439 start_j...@592: cpg_join succeeded >>> 1271072439 l...@975: setup done >>> ocfs2_controld[18489]: 2010/04/12_13:40:39 notice: ais_dispatch: >>> Membership 156: quorum acquired >>> ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_update_peer: Node >>> app1a.xlhost.de: id=569559765 state=member (new) addr=r(0) >>> ip(213.202.242.161) (new) votes=1 (new) born=156 seen=156 >>> proc=00000000000000000000000000013312 (new) >>> ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node >>> app1b.xlhost.de now has id: 586336981 >>> ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node >>> 586336981 is now known as app1b.xlhost.de >>> ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_update_peer: Node >>> app1b.xlhost.de: id=586336981 state=member (new) addr=r(0) >>> ip(213.202.242.162) votes=1 born=148 seen=156 >>> proc=00000000000000000000000000013312 >>> 1271072439 confchg...@495: confchg called >>> 1271072439 daemon_cha...@398: ocfs2_controld (group "ocfs2:controld") >>> confchg: members 1, left 0, joined 1 >>> 1271072439 cpg_joi...@909: CPG is live, we are the first daemon >>> 1271072439 call_ckpt_o...@160: Opening checkpoint "ocfs2:controld" (try >>> 1) >>> 1271072439 call_ckpt_o...@170: Opened checkpoint "ocfs2:controld" with >>> handle 0x2ae8944a00000001 >>> 1271072439 call_section_wr...@340: Writing to section "daemon_protocol" >>> on >>> checkpoint "ocfs2:controld" (try 1) >>> 1271072439 call_section_cre...@292: Creating section "daemon_protocol" > on >>> checkpoint "ocfs2:controld" (try 1) >>> 1271072439 call_section_cre...@300: Created section "daemon_protocol" > on >>> checkpoint "ocfs2:controld" >>> 1271072439 call_section_wr...@340: Writing to section "ocfs2_protocol" > on >>> checkpoint "ocfs2:controld" (try 1) >>> 1271072439 call_section_cre...@292: Creating section "ocfs2_protocol" > on >>> checkpoint "ocfs2:controld" (try 1) >>> 1271072439 call_section_cre...@300: Created section "ocfs2_protocol" on >>> checkpoint "ocfs2:controld" >>> 1271072439 cpg_joi...@923: Daemon protocol is 1.0 >>> 1271072439 cpg_joi...@925: fs protocol is 1.0 >>> 1271072439 cpg_joi...@927: Connecting to dlm_controld >>>>>>>>>>>>>>>>>>>>>>>>>>> here's the error <<<<<<<<<<<<<<<<<<<<<< >>> 1271072439 cpg_joi...@934: Opening control device >>> 1271072439 cpg_joi...@938: Error opening control device: Unable to > access >>> cluster service >>> 1271072439 exit_dlmcont...@363: Closing dlm_controld connection >>> 1271072439 start_le...@613: leaving group "ocfs2:controld" >>> 1271072439 start_le...@626: cpg_leave succeeded >>> 1271072439 exit_...@760: closing cpg connection >>> 1271072439 call_ckpt_cl...@240: Closing checkpoint >>> "ocfs2:controld:21f2cad5" (try 1) >>> 1271072439 call_ckpt_cl...@246: Closed checkpoint >>> "ocfs2:controld:21f2cad5" >>> 1271072439 exit_c...@643: Disconnecting from CKPT service (try 1) >>> 1271072439 exit_c...@647: Disconnected from CKPT service >>> 1271072439 exit_st...@144: closing pacemaker connection >>> ocfs2_controld[18489]: 2010/04/12_13:40:39 notice: >>> terminate_ais_connection: Disconnected from AIS >>> >>> >>> obviously ocfs2_controld.pcmk can connect to the openais CKPT service > and >>> to dlm_controld.pcmk, which then terminates the connection. >>> here's the output from dlm_controld.pcmk -q 0 -D: >>> (the last 6 lines show 3 connection attempts from ocfs2_controld.pcmk!) >>> ======================================================================= >>> 1271072755 dlm_controld 3.0.10 started >>> cluster-dlm[20608]: 2010/04/12_13:45:55 info: init_ais_connection: >>> Creating connection to our AIS plugin >>> cluster-dlm[20608]: 2010/04/12_13:45:55 info: init_ais_connection: AIS >>> connection established >>> cluster-dlm[20608]: 2010/04/12_13:45:55 info: get_ais_nodeid: Server >>> details: id=569559765 uname=app1a.xlhost.de cname=pcmk >>> cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node >>> app1a.xlhost.de now has id: 569559765 >>> cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node >>> 569559765 >>> is now known as app1a.xlhost.de >>> 1271072755 found /dev/misc/dlm-control minor 58 >>> 1271072755 found /dev/misc/dlm-monitor minor 57 >>> 1271072755 found /dev/misc/dlm_plock minor 56 >>> 1271072755 /dev/misc/dlm-monitor fd 9 >>> 1271072755 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2 >>> 1271072755 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2 >>> 1271072755 confdb_key_get error 11 >>> 1271072755 group_mode 3 compat 0 >>> 1271072755 setup_cpg_daemon 11 >>> 1271072755 dlm:controld conf 2 1 0 memb 569559765 586336981 join >>> 569559765 >>> left >>> 1271072755 run protocol from nodeid 586336981 >>> 1271072755 daemon run 1.1.1 max 1.1.1 kernel run 1.1.1 max 1.1.1 >>> 1271072755 plocks 13 >>> 1271072755 plock cpg message size: 104 bytes >>> cluster-dlm[20608]: 2010/04/12_13:45:55 notice: ais_dispatch: > Membership >>> 156: quorum acquired >>> cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_update_peer: Node >>> app1a.xlhost.de: id=569559765 state=member (new) addr=r(0) >>> ip(213.202.242.161) (new) votes=1 (new) born=156 seen=156 >>> proc=00000000000000000000000000013312 (new) >>> cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node >>> app1b.xlhost.de now has id: 586336981 >>> cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node >>> 586336981 >>> is now known as app1b.xlhost.de >>> cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_update_peer: Node >>> app1b.xlhost.de: id=586336981 state=member (new) addr=r(0) >>> ip(213.202.242.162) votes=1 born=148 seen=156 >>> proc=00000000000000000000000000013312 >>> 1271072755 Processing membership 156 >>> 1271072755 Adding address ip(213.202.242.161) to configfs for node >>> 569559765 >>> 1271072755 set_configfs_node 569559765 213.202.242.161 local 1 >>> 1271072755 Added active node 569559765: born-on=156, last-seen=156, >>> this-event=156, last-event=0 >>> 1271072755 Adding address ip(213.202.242.162) to configfs for node >>> 586336981 >>> 1271072755 set_configfs_node 586336981 213.202.242.162 local 0 >>> 1271072755 Added active node 586336981: born-on=148, last-seen=156, >>> this-event=156, last-event=0 >>> 1271072763 client connection 5 fd 14 >>> 1271072763 connection 5 read error -1 >>> 1271072776 client connection 5 fd 14 >>> 1271072776 connection 5 read error -1 >>> 1271072779 client connection 5 fd 14 >>> 1271072779 connection 5 read error -1 >>> >>> >>> >>> i'm pretty lost at the moment, as there's nothing i can find via google >>> regarding the "core" problem: >>> 1271072439 cpg_joi...@934: Opening control device >>> 1271072439 cpg_joi...@938: Error opening control device: Unable to > access >>> cluster service >>> >>> >>> any help would be greatly appreciated. >>> >>> best regards, >>> jürgen herrmann >>> -- >>>>> XLhost.de - eXperts in Linux hosting ® << >>> >>> XLhost.de GmbH >>> Jürgen Herrmann, Geschäftsführer >>> Boelckestrasse 21, 93051 Regensburg, Germany >>> >>> Geschäftsführer: Volker Geith, Jürgen Herrmann >>> Registriert unter: HRB9918 >>> Umsatzsteuer-Identifikationsnummer: DE245931218 >>> >>> Fon: +49 (0)800 XLHOSTDE [0800 95467833] >>> Fax: +49 (0)800 95467830 >>> >>> WEB: http://www.XLhost.de >>> IRC: #xlh...@irc.quakenet.org >>> _______________________________________________ >>> Openais mailing list >>> Openais@lists.linux-foundation.org >>> https://lists.linux-foundation.org/mailman/listinfo/openais >> _______________________________________________ >> Openais mailing list >> Openais@lists.linux-foundation.org >> https://lists.linux-foundation.org/mailman/listinfo/openais > > -- >>> XLhost.de - eXperts in Linux hosting ® << > > XLhost.de GmbH > Jürgen Herrmann, Geschäftsführer > Boelckestrasse 21, 93051 Regensburg, Germany > > Geschäftsführer: Volker Geith, Jürgen Herrmann > Registriert unter: HRB9918 > Umsatzsteuer-Identifikationsnummer: DE245931218 > > Fon: +49 (0)800 XLHOSTDE [0800 95467833] > Fax: +49 (0)800 95467830 > > WEB: http://www.XLhost.de > IRC: #xlh...@irc.quakenet.org -- Andrew _______________________________________________ Openais mailing list Openais@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/openais