Some more info:
root 14170 14166 0 12:23 ? 00:00:00 /usr/lib64/heartbeat/stonithd nobody 14172 14166 0 12:23 ? 00:00:00 /usr/lib64/heartbeat/lrmd 82 14173 14166 0 12:23 ? 00:00:00 /usr/lib64/heartbeat/attrd 82 14174 14166 0 12:23 ? 00:00:00 /usr/lib64/heartbeat/pengine 82 14175 14166 0 12:23 ? 00:00:00 /usr/lib64/heartbeat/crmd --lrmd is running as nobody when it should have been root. I'm not sure why that would happen. Thanks Shravan On Wed, Sep 29, 2010 at 10:29 AM, Shravan Mishra <shravan.mis...@gmail.com> wrote: > Hi, > > > > I did a bt on the core, this is what I found: > > > ========== > Core was generated by `/usr/lib64/heartbeat/cib'. > Program terminated with signal 11, Segmentation fault. > [New process 12340] > #0 0x00007f23acc553fa in strncmp () from /lib64/libc.so.6 > (gdb) bt > #0 0x00007f23acc553fa in strncmp () from /lib64/libc.so.6 > #1 0x00007f23acf87c39 in __xmlParserInputBufferCreateFilename () from > /usr/lib64/libxml2.so.2 > #2 0x00007f23acf6147b in xmlNewInputFromFile () from /usr/lib64/libxml2.so.2 > #3 0x00007f23acf641d4 in xmlCreateURLParserCtxt () from > /usr/lib64/libxml2.so.2 > #4 0x00007f23acf78f3a in xmlReadFile () from /usr/lib64/libxml2.so.2 > #5 0x00007f23ad0167b1 in xmlRelaxNGParse () from /usr/lib64/libxml2.so.2 > #6 0x00007f23ae967321 in validate_with_relaxng (doc=0x626020, to_logs=1, > relaxng_file=0x7f23ae97ba10 > "/usr/share/pacemaker/pacemaker-1.2.rng") at xml.c:2222 > #7 0x00007f23ae967769 in validate_with (xml=0x6260d0, method=6, > to_logs=1) at xml.c:2287 > #8 0x00007f23ae967b9f in validate_xml (xml_blob=0x6260d0, > validation=0x626910 "pacemaker-1.2", > to_logs=1) at xml.c:2373 > #9 0x0000000000405b23 in readCibXmlFile (dir=0x41b580 > "/var/lib/heartbeat/crm", > file=0x41c40a "cib.xml", discard_status=1) at io.c:396 > #10 0x0000000000412285 in startCib (filename=0x41c40a "cib.xml") at main.c:613 > #11 0x0000000000411309 in cib_init () at main.c:408 > #12 0x000000000041064a in main (argc=1, argv=0x7fff942e0f58) at main.c:218 > > > ========== > > > > If it's a fresh install let's say then cib.xml will not exist. > Then why is it looking for this file on startup. > > > Sincerely > Shravan > > > On Tue, Sep 28, 2010 at 10:24 AM, Shravan Mishra > <shravan.mis...@gmail.com> wrote: >> Sorry forgot to attach my corosync.conf. >> >> >> ========= >> totem { >> version: 2 >> # token: 3000 >> # token_retransmits_before_loss_const: 10 >> # join: 60 >> # consensus: 1500 >> # vsftype: none >> # max_messages: 20 >> # clear_node_high_bit: yes >> secauth: off >> threads: 0 >> # rrp_mode: passive >> >> interface { >> ringnumber: 0 >> bindnetaddr: 192.168.2.0 >> #mcastaddr: 226.94.1.1 >> broadcast: yes >> mcastport: 5405 >> } >> # interface { >> # ringnumber: 1 >> # bindnetaddr: 172.20.20.0 >> #mcastaddr: 226.94.1.1 >> # broadcast: yes >> # mcastport: 5405 >> # } >> } >> >> logging { >> fileline: off >> to_stderr: yes >> to_logfile: yes >> to_syslog: yes >> logfile: /tmp/corosync.log >> debug: off >> timestamp: on >> logger_subsys { >> subsys: AMF >> debug: off >> } >> } >> >> service { >> name: pacemaker >> ver: 0 >> } >> >> aisexec { >> user:root >> group: root >> } >> >> amf { >> mode: disabled >> } >> >> >> >> >> ========= >> >> On Tue, Sep 28, 2010 at 10:10 AM, Shravan Mishra >> <shravan.mis...@gmail.com> wrote: >>> Hi Andrew, >>> >>> I'm attaching another log file as I reflashed my machine started >>> everything from scratch. >>> Looks like my old system got little messed up as I was trying to >>> install old HA libraries - corosyc/pacemaker that was initially >>> working for me. >>> >>> >>> Here are the details: >>> >>> As of now I just want to see cib/attrd up so I have only one machine >>> where I want to see things in a sane state. >>> >>> [r...@ha2 ~]# /usr/sbin/corosync -v >>> Corosync Cluster Engine, version '1.2.8' SVN revision '3035' >>> Copyright (c) 2006-2009 Red Hat, Inc. >>> >>> [r...@ha2 ~]# /usr/lib64/heartbeat/crmd version >>> CRM Version: 1.1.2 (e0d731c2b1be446b27a73327a53067bf6230fb6a) >>> >>> >>> >>> Pacemaker version is 1.1, the release based on the above output is >>> 1.1.2 if I correctly understand. >>> >>> This one is showing -- >>> >>> Sep 27 12:30:45 corosync [pcmk ] ERROR: pcmk_wait_dispatch: Child >>> process cib terminated with signal 11 (pid=9216, core=false) >>> >>> >>> Please find corosync logs attached. >>> >>> Thanks >>> Shravan >>> >>> >>> On Tue, Sep 28, 2010 at 5:47 AM, Andrew Beekhof <and...@beekhof.net> wrote: >>>> On Mon, Sep 27, 2010 at 6:26 AM, Shravan Mishra >>>> <shravan.mis...@gmail.com> wrote: >>>>> Thanks Raoul for the response. >>>>> >>>>> Changing the permission to hacluster:haclient did stop that error. >>>>> >>>>> Now I'm hitting another problem whereby cib is failing to start >>>> >>>> Very strange logs. >>>> Which distribution is this? >>>> What does your corosync.conf look like? >>>> >>>> >>>>> ===== >>>>> Sep 27 00:16:29 corosync [pcmk ] info: update_member: Node >>>>> ha2.itactics.com now has process list: >>>>> 00000000000000000000000000110012 (1114130) >>>>> Sep 27 00:16:29 corosync [pcmk ] info: update_member: Node >>>>> ha2.itactics.com now has 1 quorum votes (was 0) >>>>> Sep 27 00:16:29 corosync [pcmk ] info: send_member_notification: >>>>> Sending membership update 100 to 0 children >>>>> Sep 27 00:16:29 corosync [MAIN ] Completed service synchronization, >>>>> ready to provide service. >>>>> Sep 27 00:16:30 corosync [pcmk ] ERROR: pcmk_wait_dispatch: Child >>>>> process cib exited (pid=14889, rc=127) >>>>> Sep 27 00:16:30 corosync [pcmk ] notice: pcmk_wait_dispatch: >>>>> Respawning failed child process: cib >>>>> Sep 27 00:16:30 corosync [pcmk ] info: spawn_child: Forked child >>>>> 14896 for process cib >>>>> crmd[14893]: 2010/09/27_00:16:30 WARN: do_cib_control: Couldn't >>>>> complete CIB registration 1 times... pause and retry >>>>> Sep 27 00:16:31 corosync [pcmk ] ERROR: pcmk_wait_dispatch: Child >>>>> process cib exited (pid=14896, rc=127) >>>>> Sep 27 00:16:31 corosync [pcmk ] notice: pcmk_wait_dispatch: >>>>> Respawning failed child process: cib >>>>> Sep 27 00:16:31 corosync [pcmk ] info: spawn_child: Forked child >>>>> 14901 for process cib >>>>> Sep 27 00:16:32 corosync [pcmk ] ERROR: pcmk_wait_dispatch: Child >>>>> process cib exited (pid=14901, rc=1 >>>>> ====== >>>>> >>>>> >>>>> I have attached the full logs. >>>>> >>>>> We are using corosync 1.2.8 and pacemaker 1.1.3. >>>>> >>>>> >>>>> Thanks. >>>>> Shravan >>>>> >>>>> >>>>> >>>>> On Sat, Sep 25, 2010 at 4:36 AM, Raoul Bhatia [IPAX] <r.bha...@ipax.at> >>>>> wrote: >>>>>> On 24.09.2010 21:41, Shravan Mishra wrote: >>>>>>> >>>>>>> crmd[20612]: 2010/09/24_15:29:57 ERROR: crm_log_init_worker: Cannot >>>>>>> change active directory to /var/lib/heartbeat/cores/hacluster: >>>>>>> Permission denied (13) >>>>>> >>>>>> ls -ald /var/lib/heartbeat/cores/hacluster /var/lib/heartbeat/cores/ >>>>>> /var/lib/heartbeat/ /var/lib/ /var/ >>>>>> >>>>>> is haclient allowed to cd all the way into >>>>>> /var/lib/heartbeat/cores/hacluster ? >>>>>> >>>>>> cheers, >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>> >>>>> Project Home: http://www.clusterlabs.org >>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> Bugs: >>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: >>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >>>> >>> >> > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker