Dejan,
 
>>On Sun, Nov 25, 2007 at 02:50:34PM -0500, Scott Mann wrote:
>> Hi,
>> 
>> I started getting this message on 1 system in a 2 node hb
>> cluster AFTER installing 2.1.2 via the fc8 rpms (yum install
>> heartbeat*, so both heartbeat and heartbeat-devel). I actually
>> installed the rpms on two freshly installed FC8 systems. Also
>> installed: libnet and glib-devel. I basically did the same
>> thing a few weeks ago when these systems were FC7 (but got hb
>> 2.0.8 via the rpms).
>> 
>> I found an earlier email from Alan R regarding this and 2.0.5,
>> but could find no resolution. I'm certainly a newbie with this
>> product and it may be something I'm doing. I've written an app
>> to the API that seems to be working on 2.0.8. It uses
>> "azClient" as its "signon" name. The problem didn't appear on
>> wiley-coyote until after I'd started the app (although, it
>> could be that I simply did not see the messages until after the
>> app started). The problem DID NOT and still does not appear on
>> the other node, beauregard. I ran the app on it also, and it
>> signed on properly, etc.
>> 
>> Having said all that, when starting heartbeat, here are the messages in the 
>> log file:
>> 
>> Nov 25 12:31:59 wiley-coyote heartbeat: [26165]: info: Version 2 support: no
>> Nov 25 12:31:59 wiley-coyote heartbeat: [26165]: WARN: Logging daemon is 
>> disabled --enabling logging daemon is recommended
>> Nov 25 12:31:59 wiley-coyote heartbeat: [26165]: info: 
>> **************************
>> Nov 25 12:31:59 wiley-coyote heartbeat: [26165]: info: Configuration 
>> validated. Starting heartbeat 2.1.2
>> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: heartbeat: version 
>> 2.1.2
>> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: Heartbeat generation: 
>> 1196015782
>> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: 
>> G_main_add_TriggerHandler: Added signal manual handler
>> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: 
>> G_main_add_TriggerHandler: Added signal manual handler
>> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: Removing 
>> /var/run/heartbeat/rsctmp failed, recreating.
>> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: glib: ucast: write 
>> socket priority set to IPTOS_LOWDELAY on eth0
>> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: glib: ucast: bound 
>> send socket to device: eth0
>> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: glib: ucast: bound 
>> receive socket to device: eth0
>> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: glib: ucast: started 
>> on port 694 interface eth0 to 192.168.0.11
>> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: 
>> G_main_add_SignalHandler: Added signal handler for signal 17
>> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: Local status now set 
>> to: 'up'
>> Nov 25 12:32:00 wiley-coyote heartbeat: [26166]: info: Link beauregard:eth0 
>> up.
>> Nov 25 12:32:00 wiley-coyote heartbeat: [26166]: info: Status update for 
>> node beauregard: status active
>> Nov 25 12:32:00 wiley-coyote harc[26173]: info: Running 
>> /etc/ha.d/rc.d/status status
>> Nov 25 12:33:04 wiley-coyote heartbeat: [26166]: info: all clients are now 
>> paused
>> Nov 25 12:33:37 wiley-coyote heartbeat: [26166]: ERROR: Message hist queue 
>> is filling up (151 messages in queue)
>> <above ERROR message continues to repeat>
>> 
>> It is also worth noting that when I execute "cl_status nodestatus 
>> wiley-coyote" on wiley-coyote I get:
>> 
>> cl_status[26192]: 2007/11/25_12:33:22 ERROR: Cannot signon with heartbeat
>> cl_status[26192]: 2007/11/25_12:33:22 ERROR: REASON: hb_api_signon: Can't 
>> initiate connection  to heartbeat
>
>Strange case. Did you check permissions? HB clients connect
>typically through /var/run/heartbeat/register, but that's a 
>unix domain socket and dynamically created. Anyway, perhaps it
>would be worth comparing permissions on both systems.

They're the same on both systems:

srwxrwxrwx 1 root root 0 2007-11-26 13:28 /var/run/heartbeat/register

>> which seems to indicate a problem with the socket? Or pipe?
>> BTW, this command works correctly on beauregard, returning
>> "alive" for beauregard and "dead" for wiley-coyote.
>
>Can you try strace on cl_status too?

I ran strace on cl_status on both systems. Let me know if you'd like the entire 
output, but the difference is that on wiley-coyote (the system that gets the 
ERROR from cl_status), "connect" returns "Connection refused." :

connect(3, {sa_family=AF_FILE, path="/var/run/heartbeat/register"}, 110) = -1 
ECONNREFUSED (Connection refused)

At this point, I am going to replace wiley-coyote with another system and see 
if that resolves the problem.

>
>Thanks,
>
>Dejan

Thank you!

> 
> Anyway, please point me to whatever you think appropriate for
> me to look at (especially source as I'd like to learn more). My
> config file is simple and is below (comments mostly removed).
> Also, the only resource I'm managing is an IP address. I'm not
> using CRM, so I've got an haresources file which contains
> exactly:
> 
> wiley-coyote    192.168.0.98/24/eth0
> 
> 
> Any help would be greatly appreciated!
> TIA
> 
> Scott Mann
> Sr Software Engineer
> Aztek Networks
> 
> ha.cf (identical on both systems except for the change in ucast)
> ----------------------------------------------------------------
> #       Facility to use for syslog()/logger 
> #
> logfacility     local0
> #
> #
> keepalive 2
> #
> #
> deadtime 30
> #
> #
> warntime 10
> #
> #
> initdead 120
> #
> #
> udpport 694
> #
> # beauregard
> ucast eth0 192.168.0.11
> # wiley-coyote
> #ucast eth0 192.168.0.31
> #
> #
> #auto_failback on
> 
> auto_failback off
> 
> #
> 
> node wiley-coyote
> node beauregard
> #
> #apiauth client-name gid=gidlist uid=uidlist
> #apiauth ipfail gid=haclient uid=hacluster
> apiauth azClient uid=root,smann
> 
> #
> #compression_threshold 2
> crm no
> 
> 
> <end>
> 
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

<<winmail.dat>>

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to