Hi, I got no answer on the ports@ list, therefore I hope someone here has an idea. I am having difficulties to get multicast communication running on the heartbeat (http://www.linux-ha.org) port. When I configure it for multicast and startup the cluster node, I see the following in /var/log/messages:
Oct 25 09:02:51 defiant heartbeat: [7910]: ERROR: glib: Unable to send mcast packet [-1]: Host is down Oct 25 09:02:51 defiant heartbeat: [7910]: ERROR: write failure on mcast fxp0.: Host is down Oct 25 09:02:51 defiant heartbeat: [7910]: ERROR: glib: Unable to send mcast packet [-1]: Host is down Oct 25 09:02:51 defiant heartbeat: [7910]: ERROR: write failure on mcast fxp0.: Host is down Oct 25 09:02:51 defiant heartbeat: [7910]: ERROR: glib: Unable to send mcast packet [-1]: Host is down Oct 25 09:02:51 defiant heartbeat: [7910]: ERROR: write failure on mcast fxp0.: Host is down Oct 25 09:02:53 defiant heartbeat: [7910]: ERROR: glib: Unable to send mcast packet [-1]: Host is down Oct 25 09:02:53 defiant heartbeat: [7910]: ERROR: write failure on mcast fxp0.: Host is down and tcpdump sees this: Oct 25 09:05:08.038580 00:0e:7b:fc:c0:a0 01:00:5e:00:00:01 0800 42: 10.0.0.5 > 239.0.0.1: igmp nreport 239.0.0.1 [ttl 1] Oct 25 09:05:12.063762 00:0e:7b:fc:c0:a0 01:00:5e:00:00:01 0800 42: 10.0.0.5 > 239.0.0.1: igmp nreport 239.0.0.1 [ttl 1] that's all on multicast communication. Above was on a i386 machine, on another i386 machine the same happens. one with a rl0, one with a fxp0 card. Then I started a second node on a sparc64, tcpdump sees this: # tcpdump -n -i hme0 multicast tcpdump: listening on hme0, link-type EN10MB 10:40:09.218991 10.0.0.24 > 239.0.0.1: igmp nreport 239.0.0.1 [ttl 1] Bus error Nevertheless, despite of the some outgoing multicast packets, the cluster nodes do not see each other. I found this part of the heartbeat code where the error message comes from: mcast_write(struct hb_media* hbm, void *pkt, int len) { struct mcast_private * mcp; int rc; MCASTASSERT(hbm); mcp = (struct mcast_private *) hbm->pd; if ((rc=sendto(mcp->wsocket, pkt, len, 0 , (struct sockaddr *)&mcp->addr , sizeof(struct sockaddr))) != len) { PILCallLog(LOG, PIL_CRIT, "Unable to send mcast packet [%d]: %s" , rc, strerror(errno)); return(HA_FAIL); } does anybody has an idea what the problem here could be? The same compiled on Linux works well. Maybe anyone else porting a multicast based application had to fiddle around with similar problems? any idea is greatly appreciated. Sebastian