Dejan, >>On Sun, Nov 25, 2007 at 02:50:34PM -0500, Scott Mann wrote: >> Hi, >> >> I started getting this message on 1 system in a 2 node hb >> cluster AFTER installing 2.1.2 via the fc8 rpms (yum install >> heartbeat*, so both heartbeat and heartbeat-devel). I actually >> installed the rpms on two freshly installed FC8 systems. Also >> installed: libnet and glib-devel. I basically did the same >> thing a few weeks ago when these systems were FC7 (but got hb >> 2.0.8 via the rpms). >> >> I found an earlier email from Alan R regarding this and 2.0.5, >> but could find no resolution. I'm certainly a newbie with this >> product and it may be something I'm doing. I've written an app >> to the API that seems to be working on 2.0.8. It uses >> "azClient" as its "signon" name. The problem didn't appear on >> wiley-coyote until after I'd started the app (although, it >> could be that I simply did not see the messages until after the >> app started). The problem DID NOT and still does not appear on >> the other node, beauregard. I ran the app on it also, and it >> signed on properly, etc. >> >> Having said all that, when starting heartbeat, here are the messages in the >> log file: >> >> Nov 25 12:31:59 wiley-coyote heartbeat: [26165]: info: Version 2 support: no >> Nov 25 12:31:59 wiley-coyote heartbeat: [26165]: WARN: Logging daemon is >> disabled --enabling logging daemon is recommended >> Nov 25 12:31:59 wiley-coyote heartbeat: [26165]: info: >> ************************** >> Nov 25 12:31:59 wiley-coyote heartbeat: [26165]: info: Configuration >> validated. Starting heartbeat 2.1.2 >> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: heartbeat: version >> 2.1.2 >> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: Heartbeat generation: >> 1196015782 >> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: >> G_main_add_TriggerHandler: Added signal manual handler >> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: >> G_main_add_TriggerHandler: Added signal manual handler >> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: Removing >> /var/run/heartbeat/rsctmp failed, recreating. >> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: glib: ucast: write >> socket priority set to IPTOS_LOWDELAY on eth0 >> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: glib: ucast: bound >> send socket to device: eth0 >> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: glib: ucast: bound >> receive socket to device: eth0 >> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: glib: ucast: started >> on port 694 interface eth0 to 192.168.0.11 >> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: >> G_main_add_SignalHandler: Added signal handler for signal 17 >> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: Local status now set >> to: 'up' >> Nov 25 12:32:00 wiley-coyote heartbeat: [26166]: info: Link beauregard:eth0 >> up. >> Nov 25 12:32:00 wiley-coyote heartbeat: [26166]: info: Status update for >> node beauregard: status active >> Nov 25 12:32:00 wiley-coyote harc[26173]: info: Running >> /etc/ha.d/rc.d/status status >> Nov 25 12:33:04 wiley-coyote heartbeat: [26166]: info: all clients are now >> paused >> Nov 25 12:33:37 wiley-coyote heartbeat: [26166]: ERROR: Message hist queue >> is filling up (151 messages in queue) >> <above ERROR message continues to repeat> >> >> It is also worth noting that when I execute "cl_status nodestatus >> wiley-coyote" on wiley-coyote I get: >> >> cl_status[26192]: 2007/11/25_12:33:22 ERROR: Cannot signon with heartbeat >> cl_status[26192]: 2007/11/25_12:33:22 ERROR: REASON: hb_api_signon: Can't >> initiate connection to heartbeat > >Strange case. Did you check permissions? HB clients connect >typically through /var/run/heartbeat/register, but that's a >unix domain socket and dynamically created. Anyway, perhaps it >would be worth comparing permissions on both systems.
They're the same on both systems: srwxrwxrwx 1 root root 0 2007-11-26 13:28 /var/run/heartbeat/register >> which seems to indicate a problem with the socket? Or pipe? >> BTW, this command works correctly on beauregard, returning >> "alive" for beauregard and "dead" for wiley-coyote. > >Can you try strace on cl_status too? I ran strace on cl_status on both systems. Let me know if you'd like the entire output, but the difference is that on wiley-coyote (the system that gets the ERROR from cl_status), "connect" returns "Connection refused." : connect(3, {sa_family=AF_FILE, path="/var/run/heartbeat/register"}, 110) = -1 ECONNREFUSED (Connection refused) At this point, I am going to replace wiley-coyote with another system and see if that resolves the problem. > >Thanks, > >Dejan Thank you! > > Anyway, please point me to whatever you think appropriate for > me to look at (especially source as I'd like to learn more). My > config file is simple and is below (comments mostly removed). > Also, the only resource I'm managing is an IP address. I'm not > using CRM, so I've got an haresources file which contains > exactly: > > wiley-coyote 192.168.0.98/24/eth0 > > > Any help would be greatly appreciated! > TIA > > Scott Mann > Sr Software Engineer > Aztek Networks > > ha.cf (identical on both systems except for the change in ucast) > ---------------------------------------------------------------- > # Facility to use for syslog()/logger > # > logfacility local0 > # > # > keepalive 2 > # > # > deadtime 30 > # > # > warntime 10 > # > # > initdead 120 > # > # > udpport 694 > # > # beauregard > ucast eth0 192.168.0.11 > # wiley-coyote > #ucast eth0 192.168.0.31 > # > # > #auto_failback on > > auto_failback off > > # > > node wiley-coyote > node beauregard > # > #apiauth client-name gid=gidlist uid=uidlist > #apiauth ipfail gid=haclient uid=hacluster > apiauth azClient uid=root,smann > > # > #compression_threshold 2 > crm no > > > <end> > > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
<<winmail.dat>>
_______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems