Hi,

 

Sorry for my bad english, i'm a french people.

 

I'm heartbeat to have a 3 nodes cluster server with a VIP.

I have a big problem. In bcast mode, heartbeat fall down the network.

I have put the ucast method to resolv the problem, and it works.

 

But If the first server fail, the VIP goes on the second one, and the
network is slow

on the client application.

 

authkeys :

auth 1

1 sha1 net-cluster

2 md5 "cluster"

3 crc

 

ha.cf 

use_logd        on

#debugfile        /var/log/ha-debug

#logfile        /var/log/ha-log

 

keepalive       1

warntime        2

deadtime        3

initdead        60

 

#bcast         eth1
ucast          eth1 192.168.1.1
ucast          eth1 192.168.1.2
ucast          eth1 192.168.1.3

 

udpport         694

 

node            SVR01

node            SVR02

node            SVR03

 

auto_failback   on

crm             on

 

I had a power crash yesterday, and this problem appears (slow network).
Some logs :

 

d22312/HBWRITE]

heartbeat[16404]: 2008/07/23_15:11:16 info: cl_malloc stats: 446/5809982
42328/

20020 [pid22312/HBWRITE]

heartbeat[16404]: 2008/07/23_15:11:16 info: RealMalloc stats: 50916
total malloc

 bytes. pid [22312/HBWRITE]

heartbeat[16404]: 2008/07/23_15:11:16 info: Current arena value: 0

heartbeat[16404]: 2008/07/23_15:11:16 info: MSG stats: 0/0 ms age
1314784650 [pi

d22313/HBREAD]

heartbeat[16404]: 2008/07/23_15:11:16 info: cl_malloc stats:
447/22018891  42412

/20064 [pid22313/HBREAD]

heartbeat[16404]: 2008/07/23_15:11:16 info: RealMalloc stats: 50584
total malloc

 bytes. pid [22313/HBREAD]

heartbeat[16404]: 2008/07/23_15:11:16 info: Current arena value: 0

heartbeat[16404]: 2008/07/23_15:11:16 info: These are nothing to worry
about.

heartbeat[16404]: 2008/07/23_15:13:23 WARN: 2 lost packet(s) for [svr03]
[4233

315:4233318]

heartbeat[16404]: 2008/07/23_15:13:23 WARN: Late heartbeat: Node svr03:
interv

al 3000 ms

heartbeat[16404]: 2008/07/23_15:13:23 info: No pkts missing from svr03!

heartbeat[16404]: 2008/07/23_15:13:24 WARN: node svr02: is dead

heartbeat[16404]: 2008/07/23_15:13:24 info: Link svr02:eth1 dead.

heartbeat[16404]: 2008/07/23_15:13:24 CRIT: Cluster node svr02 returning
after

 partition.

heartbeat[16404]: 2008/07/23_15:13:24 info: For information on cluster
partition

s, See URL: http://linux-ha.org/SplitBrain

heartbeat[16404]: 2008/07/23_15:13:24 WARN: Deadtime value may be too
small.

heartbeat[16404]: 2008/07/23_15:13:24 info: See FAQ for information on
tuning deadtime.

heartbeat[16404]: 2008/07/23_15:13:24 info: URL:
http://linux-ha.org/FAQ#heavy_load

heartbeat[16404]: 2008/07/23_15:13:24 info: Link svr02:eth1 up.

heartbeat[16404]: 2008/07/23_15:13:24 WARN: Late heartbeat: Node svr02:
interval 4000 ms

heartbeat[16404]: 2008/07/23_15:13:24 info: Status update for node
svr02: status active

heartbeat[16404]: 2008/07/23_15:13:26 WARN: 1 lost packet(s) for [svr03]
[4233322:4233324]

heartbeat[16404]: 2008/07/23_15:13:26 info: No pkts missing from svr03!

heartbeat[16404]: 2008/07/23_16:06:24 WARN: 2 lost packet(s) for [svr03]
[4236562:4236565]

heartbeat[16404]: 2008/07/23_16:06:24 WARN: Late heartbeat: Node svr03:
interval 3000 ms

heartbeat[16404]: 2008/07/23_16:06:24 info: No pkts missing from svr03!

heartbeat[16404]: 2008/07/23_16:10:49 WARN: 1 lost packet(s) for [svr02]
[4236937:4236939]

heartbeat[16404]: 2008/07/23_16:10:49 WARN: 1 lost packet(s) for [svr03]
[4236828:4236830]

heartbeat[16404]: 2008/07/23_16:10:49 info: No pkts missing from svr02!

heartbeat[16404]: 2008/07/23_16:10:50 info: No pkts missing from svr03!

heartbeat[16404]: 2008/07/23_16:10:57 WARN: 1 lost packet(s) for [svr03]
[4236835:4236837]

h

 

How should I resolve the problem ?

 

I must have a specific equipment to use the multicast method ?

 

---
Reza ISSANY

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to